Re: [Gluster-users] Problem with glusterd locks on gluster 3.6.1

2016-06-15 Thread Atin Mukherjee


On 06/16/2016 10:49 AM, B.K.Raghuram wrote:
> 
> 
> On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee  > wrote:
> 
> 
> 
> On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
> > Hi,
> >
> > We're using gluster 3.6.1 and we periodically find that gluster commands
> > fail saying the it could not get the lock on one of the brick machines.
> > The logs on that machine then say something like :
> >
> > [2016-06-15 08:17:03.076119] E
> > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management: Unable to
> > acquire lock for vol2
> 
> This is a possible case if concurrent volume operations are run. Do you
> have any script which checks for volume status on an interval from all
> the nodes, if so then this is an expected behavior.
> 
> 
> Yes, I do have a couple of scripts that check on volume and quota
> status.. Given this, I do get a "Another transaction is in progress.."
> message which is ok. The problem is that sometimes I get the volume lock
> held message which never goes away. This sometimes results in glusterd
> consuming a lot of memory and CPU and the problem can only be fixed with
> a reboot. The log files are huge so I'm not sure if its ok to attach
> them to an email.

Ok, so this is known. We have fixed lots of stale lock issues in 3.7
branch and some of them if not all were also backported to 3.6 branch.
The issue is you are using 3.6.1 which is quite old. If you can upgrade
to latest versions of 3.7 or at worst of 3.6 I am confident that this
will go away.

~Atin
> 
> >
> > After sometime, glusterd then seems to give up and die..
> 
> Do you mean glusterd shuts down or segfaults, if so I am more interested
> in analyzing this part. Could you provide us the glusterd log,
> cmd_history log file along with core (in case of SEGV) from all the
> nodes for the further analysis?
> 
> 
> There is no segfault. glusterd just shuts down. As I said above,
> sometimes this happens and sometimes it just continues to hog a lot of
> memory and CPU..
> 
> 
> >
> > Interestingly, I also find the following line in the beginning of
> > etc-glusterfs-glusterd.vol.log and I dont know if this has any
> > significance to the issue :
> >
> > [2016-06-14 06:48:57.282290] I
> > [glusterd-store.c:2063:glusterd_restore_op_version] 0-management:
> > Detected new install. Setting op-version to maximum : 30600
> >
> 
> 
> What does this line signify?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Problem with glusterd locks on gluster 3.6.1

2016-06-15 Thread B.K.Raghuram
On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee  wrote:

>
>
> On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
> > Hi,
> >
> > We're using gluster 3.6.1 and we periodically find that gluster commands
> > fail saying the it could not get the lock on one of the brick machines.
> > The logs on that machine then say something like :
> >
> > [2016-06-15 08:17:03.076119] E
> > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management: Unable to
> > acquire lock for vol2
>
> This is a possible case if concurrent volume operations are run. Do you
> have any script which checks for volume status on an interval from all
> the nodes, if so then this is an expected behavior.
>

Yes, I do have a couple of scripts that check on volume and quota status..
Given this, I do get a "Another transaction is in progress.." message which
is ok. The problem is that sometimes I get the volume lock held message
which never goes away. This sometimes results in glusterd consuming a lot
of memory and CPU and the problem can only be fixed with a reboot. The log
files are huge so I'm not sure if its ok to attach them to an email.

>
> > After sometime, glusterd then seems to give up and die..
>
> Do you mean glusterd shuts down or segfaults, if so I am more interested
> in analyzing this part. Could you provide us the glusterd log,
> cmd_history log file along with core (in case of SEGV) from all the
> nodes for the further analysis?
>

There is no segfault. glusterd just shuts down. As I said above, sometimes
this happens and sometimes it just continues to hog a lot of memory and
CPU..

>
> >
> > Interestingly, I also find the following line in the beginning of
> > etc-glusterfs-glusterd.vol.log and I dont know if this has any
> > significance to the issue :
> >
> > [2016-06-14 06:48:57.282290] I
> > [glusterd-store.c:2063:glusterd_restore_op_version] 0-management:
> > Detected new install. Setting op-version to maximum : 30600
> >
>

What does this line signify?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Bit rot disabled as default

2016-06-15 Thread Gandalf Corvotempesta
2016-06-15 21:29 GMT+02:00 Дмитрий Глушенок :
> Sharding almost solves the problem (for inactive blocks), but it was
> considered as stable just today :)

I've predicted the future :)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Bit rot disabled as default

2016-06-15 Thread Дмитрий Глушенок
Sharding almost solves the problem (for inactive blocks), but it was considered 
as stable just today :)

http://blog.gluster.org/2016/06/glusterfs-3-8-released/ 

- Sharding is now stable for VM image storage.

--
Dmitry Glushenok
Jet Infosystems

> 15 июня 2016 г., в 19:42, Gandalf Corvotempesta 
>  написал(а):
> 
> 2016-06-15 18:12 GMT+02:00 Дмитрий Глушенок :
>> Hello.
>> 
>> May be because of current implementation of rotten bits detection - one hash
>> for whole file. Imagine 40 GB VM image - few parts of the image are modified
>> continuously (VM log files and application data are constantly changing).
>> Those writes making checksum invalid and BitD has to recalculate it
>> endlessly. As the result - checksum of VM image can never be verified.
> 
> I think you are right
> But what about sharding? In this case, the hash should be created for
> each shard and not the whole file.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Rsync - should I rsync from mount point or vol directory

2016-06-15 Thread John Lewis
Hi :)
I have /glusterdata dir that mounted to /var/www/mydir

rsync seems slow reading from /var/www/mydir so I think I will use
/glusterdata as a rsync source dir.

My questions:

1. Is that ok? Is it safe? :)
2. I noticed I can exclude the .glusterfs and .trashcan dir. Is that
correct? Is there any dirs I can safely exclude?

Thanks a lot!

John.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Bit rot disabled as default

2016-06-15 Thread Gandalf Corvotempesta
2016-06-15 18:12 GMT+02:00 Дмитрий Глушенок :
> Hello.
>
> May be because of current implementation of rotten bits detection - one hash
> for whole file. Imagine 40 GB VM image - few parts of the image are modified
> continuously (VM log files and application data are constantly changing).
> Those writes making checksum invalid and BitD has to recalculate it
> endlessly. As the result - checksum of VM image can never be verified.

I think you are right
But what about sharding? In this case, the hash should be created for
each shard and not the whole file.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Bit rot disabled as default

2016-06-15 Thread Дмитрий Глушенок
Hello.

May be because of current implementation of rotten bits detection - one hash 
for whole file. Imagine 40 GB VM image - few parts of the image are modified 
continuously (VM log files and application data are constantly changing). Those 
writes making checksum invalid and BitD has to recalculate it endlessly. As the 
result - checksum of VM image can never be verified.

> 15 июня 2016 г., в 9:37, Gandalf Corvotempesta 
>  написал(а):
> 
> I was looking at docs.
> why bit rot protection is disabled by defaults?
> with huge files like a qcow image a bit rot could lead to the whole image 
> corrupted and replicated to the whole cluster
> 
> Any drawbacks with bit rot detection to explain the default to off?
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

--
Dmitry Glushenok
Jet Infosystems

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster volume listening on multiple IP address/networks

2016-06-15 Thread ML mail
Hello
In order to avoid losing performance/latency I would like to have my Gluster 
volumes available through one IP address on each of my networks/VLANs. So that 
the gluster client and server are available on the same network. My clients 
mount the volume using native gluster protocol.
So my question here is how can I have a gluster volume listening to more than 
one single network or IP addresses respectively? Is this possible?
RegardsML

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster weekly community meeting 15-Jun-2016

2016-06-15 Thread Kaushal M
Thanks again, all the attendees of today's meeting. The meeting logs
can be found at the following links,
Minutes: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-06-15/weekly_community_meeting_15-jun-2016.2016-06-15-11.59.html
Minutes (text):
https://meetbot.fedoraproject.org/gluster-meeting/2016-06-15/weekly_community_meeting_15-jun-2016.2016-06-15-11.59.txt
Log: 
https://meetbot.fedoraproject.org/gluster-meeting/2016-06-15/weekly_community_meeting_15-jun-2016.2016-06-15-11.59.log.html

Next weeks meeting will be held at the same time and in the same
place. See you all next week.

Thanks,
Kaushal

Meeting summary
---
* Rollcall  (kshlm, 11:59:48)

* GlusterFS-3.9  (kshlm, 12:06:05)
  * ACTION: ndevos will call for 3.9 release-maintainers on the
maintainers list  (kshlm, 12:15:18)

* GlusterFS-3.8  (kshlm, 12:15:40)

* GlusterFS-3.7  (kshlm, 12:32:15)
  * ACTION: kshlm to start seperate thread for maintainer feedback on
3.7.12rc  (kshlm, 12:37:08)

* GlusterFS-3.6  (kshlm, 12:37:34)
  * LINK:
http://download.gluster.org/pub/gluster/glusterfs/download-stats.html
(kkeithley, 12:47:25)
  * ACTION: Start a mailing list discussion on EOLing 3.6  (kshlm,
12:51:33)

* GlusterFS-3.5  (kshlm, 12:53:57)
  * LINK: https://en.wikipedia.org/wiki/File:Taps_on_bugle.ogg
(jdarcy, 12:55:24)

* NFS-Ganesha  (kshlm, 12:56:23)

* Samba  (kshlm, 12:59:13)

* Last weeks AIs  (kshlm, 13:00:04)

* rastar to look at 3.6 builds failures on BSD  (kshlm, 13:00:49)

* Open floor  (kshlm, 13:05:03)
  * Bug self triage. When you open a bug for yourself, assign it (to
yourself) and add the keyword "Triaged"  (kshlm, 13:07:48)
  * If it's not for yourself, but you know who it does belong to, assign
it to them and add the keyword "Triaged"  (kshlm, 13:07:48)
  * If you submit a patch for a bug, set the bug state to POST.  (kshlm,
13:07:48)
  * If your patch gets commited/merged, and the commiter forgets, set
the bug state to MODIFIED  (kshlm, 13:07:48)
  * LINK:
http://gluster.readthedocs.io/en/latest/Contributors-Guide/Bug-Triage/
(ndevos, 13:09:02)
  * LINK:

http://gluster.readthedocs.io/en/latest/Contributors-Guide/Bug-report-Life-Cycle/
(ndevos, 13:09:25)
  * ACTION: kkeithley Saravanakmr with nigelb will set up Coverity,
clang, etc on public  facing machine and run it regularly  (kshlm,
13:10:37)

Meeting ended at 13:12:49 UTC.




Action Items

* ndevos will call for 3.9 release-maintainers on the maintainers list
* kshlm to start seperate thread for maintainer feedback on 3.7.12rc
* Start a mailing list discussion on EOLing 3.6
* kkeithley Saravanakmr with nigelb will set up Coverity, clang, etc on
  public  facing machine and run it regularly




Action Items, by person
---
* kkeithley
  * kkeithley Saravanakmr with nigelb will set up Coverity, clang, etc
on public  facing machine and run it regularly
* kshlm
  * kshlm to start seperate thread for maintainer feedback on 3.7.12rc
* ndevos
  * ndevos will call for 3.9 release-maintainers on the maintainers list
* nigelb
  * kkeithley Saravanakmr with nigelb will set up Coverity, clang, etc
on public  facing machine and run it regularly
* **UNASSIGNED**
  * Start a mailing list discussion on EOLing 3.6




People Present (lines said)
---
* kshlm (139)
* ndevos (59)
* kkeithley (27)
* post-factum (22)
* jdarcy (13)
* rastar_ (8)
* ira_ (8)
* glusterbot (6)
* atinm (5)
* zodbot (3)
* jiffin (2)
* nigelb (2)
* skoduri (1)
* kotreshhr (1)
* aravindavk (1)
* samikshan (1)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] [Gluster-devel] Glusterfs 3.7.11 with LibGFApi in Qemu on Ubuntu Xenial does not work

2016-06-15 Thread André Bauer
Hi Prasanna,

Am 15.06.2016 um 12:09 schrieb Prasanna Kalever:

>
> I think you have missed enabling bind insecure which is needed by
> libgfapi access, please try again after following below steps
>
> => edit /etc/glusterfs/glusterd.vol by add "option
> rpc-auth-allow-insecure on" #(on all nodes)
> => gluster vol set $volume server.allow-insecure on
> => systemctl restart glusterd #(on all nodes)
>

No, thats not the case. All services are up and runnig correctly,
allow-insecure is set and the volume works fine with libgfapi access
from my Ubuntu 14.04 KVM/Qemu servers.

Just the server which was updated to Ubuntu 16.04 can't access the
volume via libgfapi anmyore (fuse mount still works).

GlusterFS logs are empty when trying to access the GlusterFS nodes so i
think the requests are blocked on the client side.

Maybe apparmor again?

Regards
André

>
> --
> Prasanna
>
>>
>> I don't see anything in the apparmor logs when setting everything to
>> complain or audit.
>>
>> It also seems GlusterFS servers don't get any request because brick logs
>> are not complaining anything.
>>
>> Any hints?
>>
>>
>> --
>> Regards
>> André Bauer
>>
>> ___
>> Gluster-devel mailing list
>> gluster-de...@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>


-- 
Mit freundlichen Grüßen
André Bauer

MAGIX Software GmbH
André Bauer
Administrator
August-Bebel-Straße 48
01219 Dresden
GERMANY

tel.: 0351 41884875
e-mail: aba...@magix.net
aba...@magix.net 
www.magix.com 

Geschäftsführer | Managing Directors: Dr. Arnd Schröder, Klaus Schmidt
Amtsgericht | Commercial Register: Berlin Charlottenburg, HRB 127205

Find us on:

 
 
--
The information in this email is intended only for the addressee named
above. Access to this email by anyone else is unauthorized. If you are
not the intended recipient of this message any disclosure, copying,
distribution or any action taken in reliance on it is prohibited and
may be unlawful. MAGIX does not warrant that any attachments are free
from viruses or other defects and accepts no liability for any losses
resulting from infected email transmissions. Please note that any
views expressed in this email may be those of the originator and do
not necessarily represent the agenda of the company.
--

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS over S3FS

2016-06-15 Thread Niklaas Baudet von Gersdorff
Vincent Miszczak [2016-06-15 10:27 +] :

> I would like to combine Glusterfs with S3FS.
[...]
> I also have the idea to test this with Swift object storage. Advises are 
> welcome.

Never tried this before. Maybe S3QL [1] works since it "is
a standard conforming, full featured UNIX file system that is
conceptually indistinguishable from any local file system".

1: https://bitbucket.org/nikratio/s3ql/

The entire approach sounds a bit hackish to me though. :-)

Niklaas


signature.asc
Description: PGP signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Problem with glusterd locks on gluster 3.6.1

2016-06-15 Thread Atin Mukherjee


On 06/15/2016 04:24 PM, B.K.Raghuram wrote:
> Hi,
> 
> We're using gluster 3.6.1 and we periodically find that gluster commands
> fail saying the it could not get the lock on one of the brick machines.
> The logs on that machine then say something like :
> 
> [2016-06-15 08:17:03.076119] E
> [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management: Unable to
> acquire lock for vol2

This is a possible case if concurrent volume operations are run. Do you
have any script which checks for volume status on an interval from all
the nodes, if so then this is an expected behavior.
> 
> After sometime, glusterd then seems to give up and die..

Do you mean glusterd shuts down or segfaults, if so I am more interested
in analyzing this part. Could you provide us the glusterd log,
cmd_history log file along with core (in case of SEGV) from all the
nodes for the further analysis?

> 
> Interestingly, I also find the following line in the beginning of
> etc-glusterfs-glusterd.vol.log and I dont know if this has any
> significance to the issue :
> 
> [2016-06-14 06:48:57.282290] I
> [glusterd-store.c:2063:glusterd_restore_op_version] 0-management:
> Detected new install. Setting op-version to maximum : 30600
> 
> Any idea what the problem may be?
> 
> Thanks,
> -Ram
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] GlusterFS over S3FS

2016-06-15 Thread Vincent Miszczak
Hello,


I would like to combine Glusterfs with S3FS.

The idea is to create a tiered volume composed of local network storage and 
remote object storage (something similar to vendor tiering solutions, but with 
some tiers being remote).


I have successfully setup local Glusterfs part on one side, and S3FS mount on 
another side:

-local volume is OK (create, start)

-creating a Gluster volume on S3FS is OK

-starting the volume on S3FS fails


(/gluster/s3fs is symlinked to /s3fs)

root@dfs:/gluster/s3fs# gluster volume create dfs dfs:/s3fs/test
volume create: dfs: success: please start the volume to access data
root@dfs:/gluster/s3fs# gluster volume start dfs
volume start: dfs: failed: Commit failed on localhost. Please check log file 
for details.


>From the log:

[2016-06-15 09:36:13.289768] E [MSGID: 113082] [posix.c:6638:init] 0-dfs-posix: 
/s3fs/test: failed to set gfid [Le fichier existe]
[2016-06-15 09:36:13.289831] E [MSGID: 101019] [xlator.c:433:xlator_init] 
0-dfs-posix: Initialization of volume 'dfs-posix' failed, review your volfile 
again
[2016-06-15 09:36:13.289840] E [graph.c:322:glusterfs_graph_init] 0-dfs-posix: 
initializing translator failed
[2016-06-15 09:36:13.289847] E [graph.c:661:glusterfs_graph_activate] 0-graph: 
init failed

It looks like S3FS is not compliant with Gluster expectation.


Questions :

Has someone tried to do similar things ?

Is there any chance this kind of scenario can work ?


I also have the idea to test this with Swift object storage. Advises are 
welcome.


Vincent


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Error to move files

2016-06-15 Thread Pepe Charli
Hi,

$ gluster vol info cfe-gv1

Volume Name: cfe-gv1
Type: Distributed-Replicate
Volume ID: 70632183-4f26-4f03-9a48-e95f564a9e8c
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: srv-vln-gfsc1n1:/expgfs/cfe/brick1/brick
Brick2: srv-vln-gfsc1n2:/expgfs/cfe/brick1/brick
Brick3: srv-vln-gfsc1n3:/expgfs/cfe/brick1/brick
Brick4: srv-vln-gfsc1n4:/expgfs/cfe/brick1/brick
Options Reconfigured:
performance.readdir-ahead: on
nfs.disable: on
user.cifs: disable
user.smb: disable
user.cifs.disable: on
user.smb.disable: on
client.event-threads: 4
server.event-threads: 4
cluster.lookup-optimize: on
cluster.server-quorum-type: server
cluster.server-quorum-ratio: 51%

I did not see any errors in logs.

I could move the file through an intermediate directory  /tmp (not GlusterFS).
$ mv /u01/2016/03/fichero.xml /tmp
$ mv /tmp/ /u01/procesados/2016/03/

I did not think restart the volume,
What do you think could be the problem?


Thanks,

2016-06-13 14:44 GMT+02:00 Vijay Bellur :
> On Fri, Jun 10, 2016 at 4:48 AM, Pepe Charli  wrote:
>> Hi,
>>
>> I have encountered a strange situation where mv seems to think that two
>> files on the gluster mount are the same file and one of them does not exist.
>>
>> From client (mount with FUSE)
>> 
>> $ ls -li /u01/2016/03/fichero.xml
>> 12240677508402255910 -rw-rw 1 nginx nginx 1797 Mar 23 17:55
>> /u01/2016/03/fichero.xml
>>
>> $ ls -li /u01/procesados/2016/03/fichero.xml
>> ls: cannot access procesados /u01/2016/03/fichero.xml: No such file or 
>> directory
>>
>> $ mv: ‘/u01/2016/03/fichero.xml’ and ‘procesados
>> /u01//2016/03/fichero.xml’ are the same file
>>
>>
>> From server (Gluster 3.7.6)
>> --
>> $ ls -li /brick/u01/02/ticket/procesados/2016/03/fichero.xml
>> ls: cannot access
>> /brick/u01/02/ticket/procesados/2016/03/fichero.xml: No such file
>> or directory
>>
>> $ ls -li /brick/u01/02/ticket/2016/03/fichero.xml
>> 2684355134 -rw-rw 2 nginx nginx 1797 Mar 23 17:55
>> 02/ticket/2016/03/fichero.xml
>>
>> $ getfattr -d -e hex -m.  /brick/u01/02/ticket/2016/03/fichero.xml
>> # file: 02/ticket/2016/03/fichero.xml
>> trusted.afr.dirty=0x
>> trusted.bit-rot.version=0x020056c2c3090001217f
>> trusted.gfid=0xaee7e1be15cd4ef7a9df9f570a6e1826
>>
>> $ stat /brick/u01/02/ticket/2016/03/fichero.xml
>>   File: ‘02/ticket/2016/03/fichero.xml’
>>   Size: 1797Blocks: 8  IO Block: 4096   regular file
>> Device: fc04h/64516dInode: 2684355134  Links: 2
>> Access: (0660/-rw-rw)  Uid: (  997/   nginx)   Gid: (  994/   nginx)
>> Access: 2016-03-23 17:55:41.885494971 +0100
>> Modify: 2016-03-23 17:55:42.104494830 +0100
>> Change: 2016-06-10 08:53:16.849313580 +0200
>>  Birth: -
>>
>> Could anyone help me?
>>
>
>
> Can you provide more details about your volume configuration and your
> client log file?
>
> Do you encounter a similar problem if you restart the volume?
>
> Regards,
> Vijay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Problem with glusterd locks on gluster 3.6.1

2016-06-15 Thread B.K.Raghuram
Hi,

We're using gluster 3.6.1 and we periodically find that gluster commands
fail saying the it could not get the lock on one of the brick machines. The
logs on that machine then say something like :

[2016-06-15 08:17:03.076119] E [glusterd-op-sm.c:3058:glusterd_op_ac_lock]
0-management: Unable to acquire lock for vol2

After sometime, glusterd then seems to give up and die..

Interestingly, I also find the following line in the beginning of
etc-glusterfs-glusterd.vol.log and I dont know if this has any significance
to the issue :

[2016-06-14 06:48:57.282290] I
[glusterd-store.c:2063:glusterd_restore_op_version] 0-management: Detected
new install. Setting op-version to maximum : 30600

Any idea what the problem may be?

Thanks,
-Ram
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [RESOLVED] issues recovering machine in gluster

2016-06-15 Thread Arif Ali
On 15 June 2016 at 08:55, Arif Ali  wrote:

>
> On 15 June 2016 at 08:09, Atin Mukherjee  wrote:
>
>>
>>
>> On 06/15/2016 12:14 PM, Arif Ali wrote:
>> >
>> > On 15 June 2016 at 06:48, Atin Mukherjee > > > wrote:
>> >
>> >
>> >
>> > On 06/15/2016 11:06 AM, Gandalf Corvotempesta wrote:
>> > > Il 15 giu 2016 07:09, "Atin Mukherjee" > 
>> > > >> ha
>> scritto:
>> > >> To get rid of this situation you'd need to stop all the running
>> glusterd
>> > >> instances and go into /var/lib/glusterd/peers folder on all the
>> nodes
>> > >> and manually correct the UUID file names and their content if
>> required.
>> > >
>> > > If i understood properly the only way to fix this is by bringing
>> the
>> > > whole cluster down? "you'd need to stop all the running glusterd
>> instances"
>> > >
>> > > I hope you are referring to all instances on the failed node...
>> >
>> > No, since the configuration are synced across all the nodes, any
>> > incorrect data gets replicated through out. So in this case to be
>> on the
>> > safer side and validate the correctness all glusterd instances on
>> *all*
>> > the nodes should be brought down. Having said that, this doesn't
>> impact
>> > I/O as the management path is different than I/O.
>> >
>> >
>> > As a sanity, one of the things I did last night, was to reboot the whole
>> > gluster system, when I had downtime arranged. I thought this is
>> > something would be asked, as I had seen similar requests on the mailing
>> > list previously
>> >
>> > Unfortunately though, it didn't fix the problem.
>>
>> Only reboot is not going to solve the problem. You'd need to correct the
>> configuration as I explained earlier in this thread. If it doesn't
>> please send the me the content of /var/lib/glusterd/peers/ &
>> /var/lib/glusterd/glusterd.info file from all the nodes where glusterd
>> instances are running. I'll take a look and correct them and send it
>> back to you.
>>
>
> Thanks Atin,
>
> Apologies, I missed your mail, as I was travelling
>
> I have checked the relevant files you have mentioned, and they seem to
> look correct to me, but I have attached it for sanity, maybe you can spot
> something, that I have not seen
>

I have been discussing the issue with Atin on IRC, and we have resolved the
problem. Thanks Atin, it was much appreciated

For the purpose of this list. I had the UUID file matching the host in
/var/lib/glusterd/peers for the host itself. This was not required. Once I
removed the UUID based on the node where glusterd was running, the node was
able function correctly
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Glusterfs 3.7.11 with LibGFApi in Qemu on Ubuntu Xenial does not work

2016-06-15 Thread Prasanna Kalever
On Wed, Jun 15, 2016 at 2:41 PM, André Bauer  wrote:
>
> Hi Lists,
>
> i just updated on of my Ubuntu KVM Servers from 14.04 (Trusty) to 16.06
> (Xenial).
>
> I use the Glusterfs packages from the officail Ubuntu PPA and my own
> Qemu packages (
> https://launchpad.net/~monotek/+archive/ubuntu/qemu-glusterfs-3.7 )
> which have libgfapi enabled.
>
> On Ubuntu 14.04 everything is working fine. I only had to add the
> following lines to the Apparmor config in
> /etc/apparmor.d/abstractions/libvirt-qemu to get it work:
>
> # for glusterfs
> /proc/sys/net/ipv4/ip_local_reserved_ports r,
> /usr/lib/@{multiarch}/glusterfs/**.so mr,
> /tmp/** rw,
>
> In Ubuntu 16.04 i'm not able to start the my VMs via libvirt or to
> create new images via qemu-img using libgfapi.
>
> Mounting the volume via fuse does work without problems.
>
> Examples:
>
> qemu-img create gluster://storage.mydomain/vmimages/kvm2test.img 1G
> Formatting 'gluster://storage.intdmz.h1.mdd/vmimages/kvm2test.img',
> fmt=raw size=1073741824
> [2016-06-15 08:15:26.710665] E [MSGID: 108006]
> [afr-common.c:4046:afr_notify] 0-vmimages-replicate-0: All subvolumes
> are down. Going offline until atleast one of them comes back up.
> [2016-06-15 08:15:26.710736] E [MSGID: 108006]
> [afr-common.c:4046:afr_notify] 0-vmimages-replicate-1: All subvolumes
> are down. Going offline until atleast one of them comes back up.
>
> Libvirtd log:
>
> [2016-06-13 16:53:57.055113] E [MSGID: 104007]
> [glfs-mgmt.c:637:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch
> volume file (key:vmimages) [Invalid argument]
> [2016-06-13 16:53:57.055196] E [MSGID: 104024]
> [glfs-mgmt.c:738:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with
> remote-host: storage.intdmz.h1.mdd (Permission denied) [Permission denied]
> 2016-06-13T16:53:58.049945Z qemu-system-x86_64: -drive
> file=gluster://storage.intdmz.h1.mdd/vmimages/checkbox.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=writeback:
> Gluster connection failed for server=storage.intdmz.h1.mdd port=0
> volume=vmimages image=checkbox.qcow2 transport=tcp: Permission denied

I think you have missed enabling bind insecure which is needed by
libgfapi access, please try again after following below steps

=> edit /etc/glusterfs/glusterd.vol by add "option
rpc-auth-allow-insecure on" #(on all nodes)
=> gluster vol set $volume server.allow-insecure on
=> systemctl restart glusterd #(on all nodes)

In case this does not work,
provide help us with the below, along with the logfiles
# gluster vol info
# gluster vol status
# gluster peer status

--
Prasanna

>
> I don't see anything in the apparmor logs when setting everything to
> complain or audit.
>
> It also seems GlusterFS servers don't get any request because brick logs
> are not complaining anything.
>
> Any hints?
>
>
> --
> Regards
> André Bauer
>
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Glusterfs 3.7.11 with LibGFApi in Qemu on Ubuntu Xenial does not work

2016-06-15 Thread André Bauer
Hi Lists,

i just updated on of my Ubuntu KVM Servers from 14.04 (Trusty) to 16.06
(Xenial).

I use the Glusterfs packages from the officail Ubuntu PPA and my own
Qemu packages (
https://launchpad.net/~monotek/+archive/ubuntu/qemu-glusterfs-3.7 )
which have libgfapi enabled.

On Ubuntu 14.04 everything is working fine. I only had to add the
following lines to the Apparmor config in
/etc/apparmor.d/abstractions/libvirt-qemu to get it work:

# for glusterfs
/proc/sys/net/ipv4/ip_local_reserved_ports r,
/usr/lib/@{multiarch}/glusterfs/**.so mr,
/tmp/** rw,

In Ubuntu 16.04 i'm not able to start the my VMs via libvirt or to
create new images via qemu-img using libgfapi.

Mounting the volume via fuse does work without problems.

Examples:

qemu-img create gluster://storage.mydomain/vmimages/kvm2test.img 1G
Formatting 'gluster://storage.intdmz.h1.mdd/vmimages/kvm2test.img',
fmt=raw size=1073741824
[2016-06-15 08:15:26.710665] E [MSGID: 108006]
[afr-common.c:4046:afr_notify] 0-vmimages-replicate-0: All subvolumes
are down. Going offline until atleast one of them comes back up.
[2016-06-15 08:15:26.710736] E [MSGID: 108006]
[afr-common.c:4046:afr_notify] 0-vmimages-replicate-1: All subvolumes
are down. Going offline until atleast one of them comes back up.

Libvirtd log:

[2016-06-13 16:53:57.055113] E [MSGID: 104007]
[glfs-mgmt.c:637:glfs_mgmt_getspec_cbk] 0-glfs-mgmt: failed to fetch
volume file (key:vmimages) [Invalid argument]
[2016-06-13 16:53:57.055196] E [MSGID: 104024]
[glfs-mgmt.c:738:mgmt_rpc_notify] 0-glfs-mgmt: failed to connect with
remote-host: storage.intdmz.h1.mdd (Permission denied) [Permission denied]
2016-06-13T16:53:58.049945Z qemu-system-x86_64: -drive
file=gluster://storage.intdmz.h1.mdd/vmimages/checkbox.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=writeback:
Gluster connection failed for server=storage.intdmz.h1.mdd port=0
volume=vmimages image=checkbox.qcow2 transport=tcp: Permission denied

I don't see anything in the apparmor logs when setting everything to
complain or audit.

It also seems GlusterFS servers don't get any request because brick logs
are not complaining anything.

Any hints?


-- 
Regards
André Bauer

___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] issues recovering machine in gluster

2016-06-15 Thread Arif Ali
On 15 June 2016 at 08:09, Atin Mukherjee  wrote:

>
>
> On 06/15/2016 12:14 PM, Arif Ali wrote:
> >
> > On 15 June 2016 at 06:48, Atin Mukherjee  > > wrote:
> >
> >
> >
> > On 06/15/2016 11:06 AM, Gandalf Corvotempesta wrote:
> > > Il 15 giu 2016 07:09, "Atin Mukherjee"  
> > > >> ha
> scritto:
> > >> To get rid of this situation you'd need to stop all the running
> glusterd
> > >> instances and go into /var/lib/glusterd/peers folder on all the
> nodes
> > >> and manually correct the UUID file names and their content if
> required.
> > >
> > > If i understood properly the only way to fix this is by bringing
> the
> > > whole cluster down? "you'd need to stop all the running glusterd
> instances"
> > >
> > > I hope you are referring to all instances on the failed node...
> >
> > No, since the configuration are synced across all the nodes, any
> > incorrect data gets replicated through out. So in this case to be on
> the
> > safer side and validate the correctness all glusterd instances on
> *all*
> > the nodes should be brought down. Having said that, this doesn't
> impact
> > I/O as the management path is different than I/O.
> >
> >
> > As a sanity, one of the things I did last night, was to reboot the whole
> > gluster system, when I had downtime arranged. I thought this is
> > something would be asked, as I had seen similar requests on the mailing
> > list previously
> >
> > Unfortunately though, it didn't fix the problem.
>
> Only reboot is not going to solve the problem. You'd need to correct the
> configuration as I explained earlier in this thread. If it doesn't
> please send the me the content of /var/lib/glusterd/peers/ &
> /var/lib/glusterd/glusterd.info file from all the nodes where glusterd
> instances are running. I'll take a look and correct them and send it
> back to you.
>

Thanks Atin,

Apologies, I missed your mail, as I was travelling

I have checked the relevant files you have mentioned, and they seem to look
correct to me, but I have attached it for sanity, maybe you can spot
something, that I have not seen


gluster_debug.tgz
Description: GNU Zip compressed data
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] issues recovering machine in gluster

2016-06-15 Thread Atin Mukherjee


On 06/15/2016 12:14 PM, Arif Ali wrote:
> 
> On 15 June 2016 at 06:48, Atin Mukherjee  > wrote:
> 
> 
> 
> On 06/15/2016 11:06 AM, Gandalf Corvotempesta wrote:
> > Il 15 giu 2016 07:09, "Atin Mukherjee"  
> > >> ha scritto:
> >> To get rid of this situation you'd need to stop all the running 
> glusterd
> >> instances and go into /var/lib/glusterd/peers folder on all the nodes
> >> and manually correct the UUID file names and their content if required.
> >
> > If i understood properly the only way to fix this is by bringing the
> > whole cluster down? "you'd need to stop all the running glusterd 
> instances"
> >
> > I hope you are referring to all instances on the failed node...
> 
> No, since the configuration are synced across all the nodes, any
> incorrect data gets replicated through out. So in this case to be on the
> safer side and validate the correctness all glusterd instances on *all*
> the nodes should be brought down. Having said that, this doesn't impact
> I/O as the management path is different than I/O.
> 
> 
> As a sanity, one of the things I did last night, was to reboot the whole
> gluster system, when I had downtime arranged. I thought this is
> something would be asked, as I had seen similar requests on the mailing
> list previously
> 
> Unfortunately though, it didn't fix the problem.

Only reboot is not going to solve the problem. You'd need to correct the
configuration as I explained earlier in this thread. If it doesn't
please send the me the content of /var/lib/glusterd/peers/ &
/var/lib/glusterd/glusterd.info file from all the nodes where glusterd
instances are running. I'll take a look and correct them and send it
back to you.

> 
> Any other suggestions are welcome
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] issues recovering machine in gluster

2016-06-15 Thread Arif Ali
On 15 June 2016 at 06:48, Atin Mukherjee  wrote:

>
>
> On 06/15/2016 11:06 AM, Gandalf Corvotempesta wrote:
> > Il 15 giu 2016 07:09, "Atin Mukherjee"  > > ha scritto:
> >> To get rid of this situation you'd need to stop all the running glusterd
> >> instances and go into /var/lib/glusterd/peers folder on all the nodes
> >> and manually correct the UUID file names and their content if required.
> >
> > If i understood properly the only way to fix this is by bringing the
> > whole cluster down? "you'd need to stop all the running glusterd
> instances"
> >
> > I hope you are referring to all instances on the failed node...
>
> No, since the configuration are synced across all the nodes, any
> incorrect data gets replicated through out. So in this case to be on the
> safer side and validate the correctness all glusterd instances on *all*
> the nodes should be brought down. Having said that, this doesn't impact
> I/O as the management path is different than I/O.
>
>
As a sanity, one of the things I did last night, was to reboot the whole
gluster system, when I had downtime arranged. I thought this is something
would be asked, as I had seen similar requests on the mailing list
previously

Unfortunately though, it didn't fix the problem.

Any other suggestions are welcome
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Add bricks to sharded replicated volume

2016-06-15 Thread Gandalf Corvotempesta
Docs says that when adding bricks to a replicated volume, bricks must be a
multiple of the replica count
so if i have replica 3 i have to add 3 bricks every time

What if i have to add a new node with multiple bricks?
I have to add 3 nodes every time to preserve redundancy for each replica
set?

Any plan to change this allowing single brick to be added (with no exact
order like now) having gluster to manage the distribution by itself like
ceph does?
Ceph doesn't require any particular order when adding nodes/osd. It manage
automatically the redundancy by distributing object on multiple nodes.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Bit rot disabled as default

2016-06-15 Thread Gandalf Corvotempesta
I was looking at docs.
why bit rot protection is disabled by defaults?
with huge files like a qcow image a bit rot could lead to the whole image
corrupted and replicated to the whole cluster

Any drawbacks with bit rot detection to explain the default to off?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users