Re: [Gluster-users] Makefile:90: *** missing separator (did you mean TAB instead of 8 spaces?). Stop.

2016-12-07 Thread Anoop C S
On Wed, 2016-12-07 at 12:02 -0500, mabi wrote:
> Hi,
> 
> I would like to compile GlusterFS 3.8.6 manually on a Linux Debian 8 server. 
> The configure step
> works fine but a make immediately fails with the following error:
> 
> Makefile:90: *** missing separator (did you mean TAB instead of 8 spaces?).  
> Stop.
> 

Can you be more specific(as in which Makefile is causing the error)?

> Any ideas what is wrong here and how to fix? I suspect something is going 
> wrong when the Makefile
> gets generated...
> 
> Regards
> M.
> 
> 
> 
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replica brick not working

2016-12-07 Thread Atin Mukherjee
On Thu, Dec 8, 2016 at 11:11 AM, Ravishankar N 
wrote:

> On 12/08/2016 10:43 AM, Atin Mukherjee wrote:
>
>> From the log snippet:
>>
>> [2016-12-07 09:15:35.677645] I [MSGID: 106482]
>> [glusterd-brick-ops.c:442:__glusterd_handle_add_brick] 0-management:
>> Received add brick req
>> [2016-12-07 09:15:35.677708] I [MSGID: 106062]
>> [glusterd-brick-ops.c:494:__glusterd_handle_add_brick] 0-management:
>> replica-count is 2
>> [2016-12-07 09:15:35.677735] E [MSGID: 106291]
>> [glusterd-brick-ops.c:614:__glusterd_handle_add_brick] 0-management:
>>
>> The last log entry indicates that we hit the code path in
>> gd_addbr_validate_replica_count ()
>>
>> if (replica_count == volinfo->replica_count) {
>> if (!(total_bricks % volinfo->dist_leaf_count)) {
>> ret = 1;
>> goto out;
>> }
>> }
>>
>>
> It seems unlikely that this snippet was hit because we print the E [MSGID:
> 106291] in the above message only if ret==-1.
> gd_addbr_validate_replica_count() returns -1 and yet not populates
> err_str only when in volinfo->type doesn't match any of the known volume
> types, so volinfo->type is corrupted perhaps?
>

You are right, I missed that ret is set to 1 here in the above snippet.

@Milos - Can you please provide us the volume info file from
/var/lib/glusterd/vols// from all the three nodes to continue the
analysis?


>
> -Ravi
>
> @Pranith, Ravi - Milos was trying to convert a dist (1 X 1) volume to a
>> replicate (1 X 2) using add brick and hit this issue where add-brick
>> failed. The cluster is operating with 3.7.6. Could you help on what
>> scenario this code path can be hit? One straight forward issue I see here
>> is missing err_str in this path.
>>
>>
>>
>


-- 

~ Atin (atinm)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replica brick not working

2016-12-07 Thread Ravishankar N

On 12/08/2016 10:43 AM, Atin Mukherjee wrote:

From the log snippet:

[2016-12-07 09:15:35.677645] I [MSGID: 106482] 
[glusterd-brick-ops.c:442:__glusterd_handle_add_brick] 0-management: 
Received add brick req
[2016-12-07 09:15:35.677708] I [MSGID: 106062] 
[glusterd-brick-ops.c:494:__glusterd_handle_add_brick] 0-management: 
replica-count is 2
[2016-12-07 09:15:35.677735] E [MSGID: 106291] 
[glusterd-brick-ops.c:614:__glusterd_handle_add_brick] 0-management:


The last log entry indicates that we hit the code path in 
gd_addbr_validate_replica_count ()


if (replica_count == volinfo->replica_count) {
if (!(total_bricks % volinfo->dist_leaf_count)) {
ret = 1;
goto out;
}
}



It seems unlikely that this snippet was hit because we print the E 
[MSGID: 106291] in the above message only if ret==-1.
gd_addbr_validate_replica_count() returns -1 and yet not populates 
err_str only when in volinfo->type doesn't match any of the known volume 
types, so volinfo->type is corrupted perhaps?


-Ravi
@Pranith, Ravi - Milos was trying to convert a dist (1 X 1) volume to 
a replicate (1 X 2) using add brick and hit this issue where add-brick 
failed. The cluster is operating with 3.7.6. Could you help on what 
scenario this code path can be hit? One straight forward issue I see 
here is missing err_str in this path.





___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Replica brick not working

2016-12-07 Thread Atin Mukherjee
>From the log snippet:

[2016-12-07 09:15:35.677645] I [MSGID: 106482]
[glusterd-brick-ops.c:442:__glusterd_handle_add_brick] 0-management:
Received add brick req
[2016-12-07 09:15:35.677708] I [MSGID: 106062]
[glusterd-brick-ops.c:494:__glusterd_handle_add_brick] 0-management:
replica-count is 2
[2016-12-07 09:15:35.677735] E [MSGID: 106291]
[glusterd-brick-ops.c:614:__glusterd_handle_add_brick] 0-management:

The last log entry indicates that we hit the code path in
gd_addbr_validate_replica_count ()

if (replica_count == volinfo->replica_count)
{

if (!(total_bricks % volinfo->dist_leaf_count))
{
ret =
1;
goto
out;

}
}

@Pranith, Ravi - Milos was trying to convert a dist (1 X 1) volume to a
replicate (1 X 2) using add brick and hit this issue where add-brick
failed. The cluster is operating with 3.7.6. Could you help on what
scenario this code path can be hit? One straight forward issue I see here
is missing err_str in this path.



On Wed, Dec 7, 2016 at 7:56 PM, Miloš Čučulović - MDPI 
wrote:

> Sure Atin, logs are attached.
>
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculo...@mdpi.com
> Skype: milos.cuculovic.mdpi
>
> On 07.12.2016 11:32, Atin Mukherjee wrote:
>
>> Milos,
>>
>> Giving snippets wouldn't help much, could you get me all the log files
>> (/var/log/glusterfs/*) from both the nodes?
>>
>> On Wed, Dec 7, 2016 at 3:54 PM, Miloš Čučulović - MDPI
>> > wrote:
>>
>> Thanks, here is the log after volume force:
>>
>> [2016-12-07 10:23:39.157234] I [MSGID: 115036]
>> [server.c:552:server_rpc_notify] 0-storage-server: disconnecting
>> connection from
>> storage2-23175-2016/12/07-10:14:56:951307-storage-client-0-0-0
>> [2016-12-07 10:23:39.157301] I [MSGID: 101055]
>> [client_t.c:419:gf_client_unref] 0-storage-server: Shutting down
>> connection
>> storage2-23175-2016/12/07-10:14:56:951307-storage-client-0-0-0
>> [2016-12-07 10:23:40.187805] I [login.c:81:gf_auth] 0-auth/login:
>> allowed user names: ef4e608d-487b-49a3-85dd-0b36b3554312
>> [2016-12-07 10:23:40.187848] I [MSGID: 115029]
>> [server-handshake.c:612:server_setvolume] 0-storage-server: accepted
>> client from
>> storage2-23679-2016/12/07-10:23:40:160327-storage-client-0-0-0
>> (version: 3.7.6)
>> [2016-12-07 10:23:52.817529] E [MSGID: 113001]
>> [posix-helpers.c:1177:posix_handle_pair] 0-storage-posix:
>> /data/data-cluster/dms/submissions/User - 226485:
>> key:glusterfs.preop.parent.keyflags: 1 length:22 [Operation not
>> supported]
>> [2016-12-07 10:23:52.817598] E [MSGID: 113001]
>> [posix.c:1384:posix_mkdir] 0-storage-posix: setting xattrs on
>> /data/data-cluster/dms/submissions/User - 226485 failed [Operation
>> not supported]
>> [2016-12-07 10:23:52.821388] E [MSGID: 113001]
>> [posix-helpers.c:1177:posix_handle_pair] 0-storage-posix:
>> /data/data-cluster/dms/submissions/User -
>> 226485/815a39ccc2cb41dadba45fe7c1e226d4:
>> key:glusterfs.preop.parent.keyflags: 1 length:22 [Operation not
>> supported]
>> [2016-12-07 10:23:52.821434] E [MSGID: 113001]
>> [posix.c:1384:posix_mkdir] 0-storage-posix: setting xattrs on
>> /data/data-cluster/dms/submissions/User -
>> 226485/815a39ccc2cb41dadba45fe7c1e226d4 failed [Operation not
>> supported]
>>
>> - Kindest regards,
>>
>> Milos Cuculovic
>> IT Manager
>>
>> ---
>> MDPI AG
>> Postfach, CH-4020 Basel, Switzerland
>> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
>> Tel. +41 61 683 77 35
>> Fax +41 61 302 89 18
>> Email: cuculo...@mdpi.com 
>> Skype: milos.cuculovic.mdpi
>>
>> On 07.12.2016 11:19, Atin Mukherjee wrote:
>>
>> You are referring to wrong log file which is for self heal
>> daemon. You'd
>> need to get back with the brick log file.
>>
>> On Wed, Dec 7, 2016 at 3:45 PM, Miloš Čučulović - MDPI
>> 
>> >> wrote:
>>
>> This is the log file after force command:
>>
>>
>> [2016-12-07 10:14:55.945937] W
>> [glusterfsd.c:1236:cleanup_and_exit]
>> (-->/lib/x86_64-linux-gnu/libpthread.so.0(+0x770a)
>> [0x7fe9d905570a]
>> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xdd) [0x40810d]
>> -->/usr/sbin/glusterfs(cleanup_and_exit+0x4d) [0x407f8d] )
>> 0-:
>> received signum (15), shutting down
>> [2016-12-07 10:14:56.960573] I [MSGID: 100030]
>> 

Re: [Gluster-users] RE : Frequent connect and disconnect messages flooded in logs

2016-12-07 Thread Mohammed Rafi K C
Hi Micha,

This is great. I will provide you one debug build which has two fixes
which I possible suspect for a frequent disconnect issue, though I don't
have much data to validate my theory. So I will take one more day to dig
in to that.

Thanks for your support, and opensource++ 

Regards

Rafi KC

On 12/07/2016 05:02 AM, Micha Ober wrote:
> Hi,
>
> thank you for your answer and even more for the question!
> Until now, I was using FUSE. Today I changed all mounts to NFS using
> the same 3.7.17 version.
>
> But: The problem is still the same. Now, the NFS logfile contains
> lines like these:
>
> [2016-12-06 15:12:29.006325] C
> [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired] 0-gv0-client-7:
> server X.X.18.62:49153 has not responded in the last 42 seconds,
> disconnecting.
>
> Interestingly enough,  the IP address X.X.18.62 is the same machine!
> As I wrote earlier, each node serves both as a server and a client, as
> each node contributes bricks to the volume. Every server is connecting
> to itself via its hostname. For example, the fstab on the node
> "giant2" looks like:
>
> #giant2:/gv0/shared_dataglusterfs   defaults,noauto 0   0
> #giant2:/gv2/shared_slurm   glusterfs   defaults,noauto 0   0
>
> giant2:/gv0 /shared_datanfs
> defaults,_netdev,vers=3 0   0
> giant2:/gv2 /shared_slurm   nfs
> defaults,_netdev,vers=3 0   0
>
> So I understand the disconnects even less.
>
> I don't know if it's possible to create a dummy cluster which exposes
> the same behaviour, because the disconnects only happen when there are
> compute jobs running on those nodes - and they are GPU compute jobs,
> so that's something which cannot be easily emulated in a VM.
>
> As we have more clusters (which are running fine with an ancient 3.4
> version :-)) and we are currently not dependent on this particular
> cluster (which may stay like this for this month, I think) I should be
> able to deploy the debug build on the "real" cluster, if you can
> provide a debug build.
>
> Regards and thanks,
> Micha
>
>
>
> Am 06.12.2016 um 08:15 schrieb Mohammed Rafi K C:
>>
>>
>>
>> On 12/03/2016 12:56 AM, Micha Ober wrote:
>>> ** Update: ** I have downgraded from 3.8.6 to 3.7.17 now, but the
>>> problem still exists.
>>>
>>> Client log: http://paste.ubuntu.com/23569065/
>>> Brick log: http://paste.ubuntu.com/23569067/
>>>
>>> Please note that each server has two bricks.
>>> Whereas, according to the logs, one brick loses the connection to
>>> all other hosts:
>>> [2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 
>>> 0-tcp.gv0-server: writev on X.X.X.219:49121 failed (Broken pipe)
>>> [2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 
>>> 0-tcp.gv0-server: writev on X.X.X.62:49118 failed (Broken pipe)
>>> [2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 
>>> 0-tcp.gv0-server: writev on X.X.X.107:49121 failed (Broken pipe)
>>> [2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 
>>> 0-tcp.gv0-server: writev on X.X.X.206:49120 failed (Broken pipe)
>>> [2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 
>>> 0-tcp.gv0-server: writev on X.X.X.58:49121 failed (Broken pipe)
>>>
>>> The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
>>> As I said, the network connection is fine and the disks are idle.
>>> The CPU always has 2 free cores.
>>>
>>> It looks like I have to downgrade to 3.4 now in order for the disconnects 
>>> to stop.
>>
>> Hi Micha,
>>
>> Thanks for the update and sorry for what happened with gluster higher
>> versions. I can understand the need for downgrade as it is a
>> production setup.
>>
>> Can you tell me the clients used here ? whether it is a
>> fuse,nfs,nfs-ganesha, smb or libgfapi ?
>>
>> Since I'm not able to reproduce the issue (I have been trying from
>> last 3days) and the logs are not much helpful here (we don't have
>> much logs in socket layer), Could you please create a dummy cluster
>> and try to reproduce the issue? If then we can play with that volume
>> and I could provide some debug build which we can use for further
>> debugging?
>>
>> If you don't have bandwidth for this, please leave it ;).
>>
>> Regards
>> Rafi KC
>>
>>> - Micha
>>>
>>> Am 30.11.2016 um 06:57 schrieb Mohammed Rafi K C:

 Hi Micha,

 I have changed the thread and subject so that your original thread
 remain same for your query. Let's try to fix the problem what you
 observed with 3.8.4, So I have started a new thread to discuss the
 frequent disconnect problem.

 *If any one else has experienced the same problem, please respond
 to the mail.*

 It would be very helpful if you could give us some more logs from
 clients and bricks.  Also any reproducible steps will surely help
 to chase the problem further.

 Regards

 Rafi KC

 On 11/30/2016 04:44 AM, Micha Ober wrote:
> I had opened another thread on this mailing list 

[Gluster-users] Makefile:90: *** missing separator (did you mean TAB instead of 8 spaces?). Stop.

2016-12-07 Thread mabi
Hi,

I would like to compile GlusterFS 3.8.6 manually on a Linux Debian 8 server. 
The configure step works fine but a make immediately fails with the following 
error:

Makefile:90: *** missing separator (did you mean TAB instead of 8 spaces?). 
Stop.

Any ideas what is wrong here and how to fix? I suspect something is going wrong 
when the Makefile gets generated...

Regards
M.___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] How to properly set ACLs in GlusterFS?

2016-12-07 Thread Alexandr Porunov
Hello,

I am trying to use ACLs but it seems that it doesn't recognize user names
but user IDs.
I.e. I have 2 machines with next users: user1, user2.
On the first machine I have created users like this:
useradd user1
useradd user2

On the second machine I have created users like this:
useradd user2
useradd user1

Now I see id's of the users. Here is what I see:

Machine 1:
# id test1
uid=1002(test1) gid=1003(test1) groups=1003(test1)
# id test2
uid=1003(test2) gid=1004(test2) groups=1004(test2)

Machine 2:
# id test1
uid=1003(test1) gid=1004(test1) groups=1004(test1)
# id test2
uid=1002(test2) gid=1003(test2) groups=1003(test2)

So, on the machine1 test1 user has 1002 ID and on the machine2 test1 user
has 1003

Now If on the machine1 I set a permission a on file like this:
setfacl -R -m u:test1:rwx /repositories/test

On the machine2 test1 user won't have any access to the file but the user
test2 will! How to set permissions based on the user/group ID?

Here is how I mount a gluster volume:
mount -t glusterfs -o acl 192.168.0.120:/gv0 /repositories/

Sincerely,
Alexandr
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Cancelled: Weekly Community Meeting 2016-12-07

2016-12-07 Thread Kaushal M
Hi All,

This weeks meeting has been cancelled. There was neither enough
attendance (is it the holidays already?), nor any topics for
discussion.

The next meeting is still on track for next week. The meeting pad [1]
will be carried over to the next week. Please add updates and topics
to it.

Thanks,
Kaushal

[1]: https://bit.ly/gluster-community-meetings
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Dispersed volume and auto-heal

2016-12-07 Thread Serkan Çoban
No, you should replace the brick.

On Wed, Dec 7, 2016 at 1:02 PM, Cedric Lemarchand  wrote:
> Hello,
>
> Is gluster able to auto-heal when some bricks are lost ? by auto-heal I mean 
> that losted parity are re-generated on bricks that are still available in 
> order to recover the level of redundancy without replacing the failed bricks.
>
> I am in the learning curve, apologies if the question is trivial.
>
> Cheers,
>
> Cédric
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Dispersed volume and auto-heal

2016-12-07 Thread Cedric Lemarchand
Hello,

Is gluster able to auto-heal when some bricks are lost ? by auto-heal I mean 
that losted parity are re-generated on bricks that are still available in order 
to recover the level of redundancy without replacing the failed bricks.

I am in the learning curve, apologies if the question is trivial.

Cheers,

Cédric


___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Replica brick not working

2016-12-07 Thread Atin Mukherjee
On Tue, Dec 6, 2016 at 10:08 PM, Miloš Čučulović - MDPI 
wrote:

> Dear All,
>
> I have two servers, storage and storage2.
> The storage2 had a volume called storage.
> I then decided to add a replica brick (storage).
>
> I did this in the following way:
>
> 1. sudo gluster peer probe storage (on storage server2)
> 2. sudo gluster volume add-brick storage replica 2
> storage:/data/data-cluster
>
> Then I was getting the following error:
> volume add-brick: failed: Operation failed
>
> But, it seems the brick was somehow added, as when checking on storage2:
> sudo gluster volume info storage
> I am getting:
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: storage2:/data/data-cluster
> Brick2: storage:/data/data-cluster
>

> So, seems ok here, however, when doing:
> sudo gluster volume heal storage info
> I am getting:
> Volume storage is not of type replicate/disperse
> Volume heal failed.
>
>
> Also, when doing
> sudo gluster volume status all
>
> I am getting:
> Status of volume: storage
> Gluster process   TCP Port  RDMA Port  Online  Pid
> 
> --
> Brick storage2:/data/data-cluster49152 0  Y   2160
> Brick storage:/data/data-cluster N/A   N/AN   N/A
> Self-heal Daemon on localhostN/A   N/AY   7906
> Self-heal Daemon on storage  N/A   N/AN   N/A
>
> Task Status of Volume storage
> 
> --
>
> Any idea please?
>

It looks like the brick didn't come up during an add-brick. Could you share
cmd_history, glusterd and the new brick log file from both the nodes? As a
workaround, could you try 'gluster volume start storage force' and see if
the issue persists?


>
> --
> - Kindest regards,
>
> Milos Cuculovic
> IT Manager
>
> ---
> MDPI AG
> Postfach, CH-4020 Basel, Switzerland
> Office: St. Alban-Anlage 66, 4052 Basel, Switzerland
> Tel. +41 61 683 77 35
> Fax +41 61 302 89 18
> Email: cuculo...@mdpi.com
> Skype: milos.cuculovic.mdpi
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 

~ Atin (atinm)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users