Re: [Gluster-users] server.allow-insecure doesn't work in 3.4.2?

2014-04-21 Thread Mingfan Lu
yes. After I restart the volume, it works.  But it should be a workaround
for sometimes it is impossible to restart the volume in proeuction
envriment.


On Tue, Apr 22, 2014 at 2:09 PM, Humble Devassy Chirammal <
humble.deva...@gmail.com> wrote:

> Hi Mingfan,
>
> Can you please try to restart the subjected volume [1]  and find the
> result?
>
> http://review.gluster.org/#/c/7412/7/doc/release-notes/3.5.0.md
>
> --Humble
>
>
> On Tue, Apr 22, 2014 at 10:46 AM, Mingfan Lu  wrote:
>
>> I saw something in
>> https://forge.gluster.org/gluster-docs-project/pages/GlusterFS_34_Release_Notes
>>  I wonder whether I should restart the glusterd?
>> Known Issues:
>>
>>-
>>
>>The following configuration changes are necessary for qemu and samba
>>integration with libgfapi to work seamlessly:
>>
>> 1) gluster volume set  server.allow-insecure on
>>
>> 2) Edit /etc/glusterfs/glusterd.vol to contain this line:
>>option rpc-auth-allow-insecure on
>>
>>Post 2), restarting glusterd would be necessary.
>>
>>
>>
>>
>>
>> On Tue, Apr 22, 2014 at 11:55 AM, Mingfan Lu wrote:
>>
>>> I have created a volume named test_auth and set server.allow-insecure on
>>>
>>> Volume Name: test_auth
>>> Type: Distribute
>>> Volume ID: d9bdc43e-15ce-4072-8d89-a34063e82427
>>> Status: Started
>>> Number of Bricks: 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: server1:/mnt/xfsd/test_auth
>>> Brick2: server2:/mnt/xfsd/test_auth
>>> Brick3: server3:/mnt/xfsd/test_auth
>>> Options Reconfigured:
>>> server.allow-insecure: on
>>>
>>> and then, I tried to mount the volume using client-bind-insecure option,
>>> but failed to mount.
>>>
>>> /usr/sbin/glusterfs --volfile-id=test_auth --volfile-server=server1
>>> /mnt/test_auth_bind_insecure --client-bind-insecure
>>>
>>> I got the error message in servers' logs:
>>> server1 : [2014-04-22 03:44:52.817165] E [addr.c:143:gf_auth]
>>> 0-auth/addr: client is bound to port 37756 which is not privileged
>>> server2: [2014-04-22 03:44:52.810565] E [addr.c:143:gf_auth]
>>> 0-auth/addr: client is bound to port 16852 which is not privileged
>>> server3: [2014-04-22 03:44:52.811844] E [addr.c:143:gf_auth]
>>> 0-auth/addr: client is bound to port 17733 which is not privileged
>>>
>>> I got the error messages like:
>>>
>>> [2014-04-22 03:43:59.757024] W
>>> [client-handshake.c:1365:client_setvolume_cbk] 0-test_auth-client-1: failed
>>> to set the volume (Permission denied)
>>> [2014-04-22 03:43:59.757024] W
>>> [client-handshake.c:1391:client_setvolume_cbk] 0-test_auth-client-1: failed
>>> to get 'process-uuid' from reply dict
>>> [2014-04-22 03:43:59.757102] E
>>> [client-handshake.c:1397:client_setvolume_cbk] 0-test_auth-client-1:
>>> SETVOLUME on remote-host failed: Authentication failed
>>> [2014-04-22 03:43:59.757109] I
>>> [client-handshake.c:1483:client_setvolume_cbk] 0-test_auth-client-1:
>>> sending AUTH_FAILED event
>>> [2014-04-22 03:43:59.757116] E [fuse-bridge.c:4834:notify] 0-fuse:
>>> Server authenication failed. Shutting down.
>>>
>>>
>>> Could anyone give some comments on this issue?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] server.allow-insecure doesn't work in 3.4.2?

2014-04-21 Thread Humble Devassy Chirammal
Hi Mingfan,

Can you please try to restart the subjected volume [1]  and find the result?

http://review.gluster.org/#/c/7412/7/doc/release-notes/3.5.0.md

--Humble


On Tue, Apr 22, 2014 at 10:46 AM, Mingfan Lu  wrote:

> I saw something in
> https://forge.gluster.org/gluster-docs-project/pages/GlusterFS_34_Release_Notes
> I wonder whether I should restart the glusterd?
> Known Issues:
>
>-
>
>The following configuration changes are necessary for qemu and samba
>integration with libgfapi to work seamlessly:
>
> 1) gluster volume set  server.allow-insecure on
>
> 2) Edit /etc/glusterfs/glusterd.vol to contain this line:
>option rpc-auth-allow-insecure on
>
>Post 2), restarting glusterd would be necessary.
>
>
>
>
>
> On Tue, Apr 22, 2014 at 11:55 AM, Mingfan Lu  wrote:
>
>> I have created a volume named test_auth and set server.allow-insecure on
>>
>> Volume Name: test_auth
>> Type: Distribute
>> Volume ID: d9bdc43e-15ce-4072-8d89-a34063e82427
>> Status: Started
>> Number of Bricks: 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: server1:/mnt/xfsd/test_auth
>> Brick2: server2:/mnt/xfsd/test_auth
>> Brick3: server3:/mnt/xfsd/test_auth
>> Options Reconfigured:
>> server.allow-insecure: on
>>
>> and then, I tried to mount the volume using client-bind-insecure option,
>> but failed to mount.
>>
>> /usr/sbin/glusterfs --volfile-id=test_auth --volfile-server=server1
>> /mnt/test_auth_bind_insecure --client-bind-insecure
>>
>> I got the error message in servers' logs:
>> server1 : [2014-04-22 03:44:52.817165] E [addr.c:143:gf_auth]
>> 0-auth/addr: client is bound to port 37756 which is not privileged
>> server2: [2014-04-22 03:44:52.810565] E [addr.c:143:gf_auth] 0-auth/addr:
>> client is bound to port 16852 which is not privileged
>> server3: [2014-04-22 03:44:52.811844] E [addr.c:143:gf_auth] 0-auth/addr:
>> client is bound to port 17733 which is not privileged
>>
>> I got the error messages like:
>>
>> [2014-04-22 03:43:59.757024] W
>> [client-handshake.c:1365:client_setvolume_cbk] 0-test_auth-client-1: failed
>> to set the volume (Permission denied)
>> [2014-04-22 03:43:59.757024] W
>> [client-handshake.c:1391:client_setvolume_cbk] 0-test_auth-client-1: failed
>> to get 'process-uuid' from reply dict
>> [2014-04-22 03:43:59.757102] E
>> [client-handshake.c:1397:client_setvolume_cbk] 0-test_auth-client-1:
>> SETVOLUME on remote-host failed: Authentication failed
>> [2014-04-22 03:43:59.757109] I
>> [client-handshake.c:1483:client_setvolume_cbk] 0-test_auth-client-1:
>> sending AUTH_FAILED event
>> [2014-04-22 03:43:59.757116] E [fuse-bridge.c:4834:notify] 0-fuse: Server
>> authenication failed. Shutting down.
>>
>>
>> Could anyone give some comments on this issue?
>>
>>
>>
>>
>>
>>
>>
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] server.allow-insecure doesn't work in 3.4.2?

2014-04-21 Thread Mingfan Lu
I saw something in
https://forge.gluster.org/gluster-docs-project/pages/GlusterFS_34_Release_Notes
I wonder whether I should restart the glusterd?
Known Issues:

   -

   The following configuration changes are necessary for qemu and samba
   integration with libgfapi to work seamlessly:

1) gluster volume set  server.allow-insecure on

2) Edit /etc/glusterfs/glusterd.vol to contain this line:
   option rpc-auth-allow-insecure on

   Post 2), restarting glusterd would be necessary.





On Tue, Apr 22, 2014 at 11:55 AM, Mingfan Lu  wrote:

> I have created a volume named test_auth and set server.allow-insecure on
>
> Volume Name: test_auth
> Type: Distribute
> Volume ID: d9bdc43e-15ce-4072-8d89-a34063e82427
> Status: Started
> Number of Bricks: 3
> Transport-type: tcp
> Bricks:
> Brick1: server1:/mnt/xfsd/test_auth
> Brick2: server2:/mnt/xfsd/test_auth
> Brick3: server3:/mnt/xfsd/test_auth
> Options Reconfigured:
> server.allow-insecure: on
>
> and then, I tried to mount the volume using client-bind-insecure option,
> but failed to mount.
>
> /usr/sbin/glusterfs --volfile-id=test_auth --volfile-server=server1
> /mnt/test_auth_bind_insecure --client-bind-insecure
>
> I got the error message in servers' logs:
> server1 : [2014-04-22 03:44:52.817165] E [addr.c:143:gf_auth] 0-auth/addr:
> client is bound to port 37756 which is not privileged
> server2: [2014-04-22 03:44:52.810565] E [addr.c:143:gf_auth] 0-auth/addr:
> client is bound to port 16852 which is not privileged
> server3: [2014-04-22 03:44:52.811844] E [addr.c:143:gf_auth] 0-auth/addr:
> client is bound to port 17733 which is not privileged
>
> I got the error messages like:
>
> [2014-04-22 03:43:59.757024] W
> [client-handshake.c:1365:client_setvolume_cbk] 0-test_auth-client-1: failed
> to set the volume (Permission denied)
> [2014-04-22 03:43:59.757024] W
> [client-handshake.c:1391:client_setvolume_cbk] 0-test_auth-client-1: failed
> to get 'process-uuid' from reply dict
> [2014-04-22 03:43:59.757102] E
> [client-handshake.c:1397:client_setvolume_cbk] 0-test_auth-client-1:
> SETVOLUME on remote-host failed: Authentication failed
> [2014-04-22 03:43:59.757109] I
> [client-handshake.c:1483:client_setvolume_cbk] 0-test_auth-client-1:
> sending AUTH_FAILED event
> [2014-04-22 03:43:59.757116] E [fuse-bridge.c:4834:notify] 0-fuse: Server
> authenication failed. Shutting down.
>
>
> Could anyone give some comments on this issue?
>
>
>
>
>
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow healing times on large cinder and nova volumes

2014-04-21 Thread Schmid, Larry
Thanks for your reply.  I will attach logs as soon as I can.

We are on 3.4.1 and I followed this process: http://goo.gl/hFwCcB

which essentially details how to set attributes so the new disk space is 
identified as the brick, after which full heal commands are issued on the 
volumes to start replication.

This is the same process I used to successfully heal several other volumes 
critical to our openstack environment.

In addition, I bumped up the background heal threads to 32 per volume (the 
exact parameter name escapes me at the moment.)  This has been a try-and-see 
exercise while I watch disk I/O on the surviving node.  I'm not sure how high 
this parameter can go before bad things happen.

Thx,
Larry Schmid

IO

M +1.602.316.8639
E   lsch...@io.com | io.com


From: Pranith Kumar Karampuri 
Sent: Monday, April 21, 2014 6:43 PM
To: Schmid, Larry
Cc: gluster-users@gluster.org
Subject: Re: [Gluster-users] Slow healing times on large cinder and nova volumes

Could you attach log files please.
You said the bricks are replaced. In case of brick-replacement, index based 
self-heal doesn't work so full self-heal needs to be triggered using "gluster 
volume heal  full". Could you confirm if that command is issued?

Pranith
- Original Message -
> From: "Larry Schmid" 
> To: gluster-users@gluster.org
> Sent: Tuesday, April 22, 2014 4:07:39 AM
> Subject: [Gluster-users] Slow healing times on large cinder and nova volumes
>
>
>
> Hi guys,
>
>
>
> x-posted from irc.
>
>
>
> We're having an issue on our prod openstack environment, which is backed by
> gluster using two replicas (I know. I wasn't given a choice.)
>
>
>
> We lost storage on one of the replica servers and so had to replace failed
> bricks. The heal operation on Cinder and Nova volumes is coming up on the
> two-week mark and it seems as if it will never catch up and finish.
>
>
>
> Nova heal info shows a constantly fluctuating list with multiple heals on
> many of the files, as if it's trying to keep up with deltas. It’s at 860GB
> of 1.1TB.
>
>
>
> Cinder doesn't really seem to progress. It's at about 1.9T out of 6T
> utilized, though the total sparse file size totals about 30T. It also has
> done multiple heals on the some files.
>
>
>
> I seem to be down to just watching it spin. Any help or tips?
>
>
>
> Thanks,
>
>
>
> Larry Schmid | Principal Cloud Engineer
>
>
>
> IO
>
>
>
> M +1.602.316.8639 | O +1.602.273.5431
>
> E lsch...@io.com | io.com
>
>
>
>
> Founded in 2007, IO is a worldwide leader in software defined data center
> technology, services and solutions that enable businesses and governments to
> intelligently control their information.
>
> The communication contained in this e-mail is confidential and is intended
> only for the named recipient(s) and may contain information that is
> privileged, proprietary, attorney work product or exempt from disclosure
> under applicable law. If you have received this message in error, or are not
> the named recipient(s), please note that any form of distribution, copying
> or use of this communication or the information in it is strictly prohibited
> and may be unlawful. Please immediately notify the sender of the error, and
> delete this communication including any attached files from your system.
> Thank you for your cooperation.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users


Founded in 2007, IO is a worldwide leader in software defined data center 
technology, services and solutions that enable businesses and governments to 
intelligently control their information.

The communication contained in this e-mail is confidential and is intended only 
for the named recipient(s) and may contain information that is privileged, 
proprietary, attorney work product or exempt from disclosure under applicable 
law. If you have received this message in error, or are not the named 
recipient(s), please note that any form of distribution, copying or use of this 
communication or the information in it is strictly prohibited and may be 
unlawful. Please immediately notify the sender of the error, and delete this 
communication including any attached files from your system. Thank you for your 
cooperation.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] server.allow-insecure doesn't work in 3.4.2?

2014-04-21 Thread Mingfan Lu
I have created a volume named test_auth and set server.allow-insecure on

Volume Name: test_auth
Type: Distribute
Volume ID: d9bdc43e-15ce-4072-8d89-a34063e82427
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: server1:/mnt/xfsd/test_auth
Brick2: server2:/mnt/xfsd/test_auth
Brick3: server3:/mnt/xfsd/test_auth
Options Reconfigured:
server.allow-insecure: on

and then, I tried to mount the volume using client-bind-insecure option,
but failed to mount.

/usr/sbin/glusterfs --volfile-id=test_auth --volfile-server=server1
/mnt/test_auth_bind_insecure --client-bind-insecure

I got the error message in servers' logs:
server1 : [2014-04-22 03:44:52.817165] E [addr.c:143:gf_auth] 0-auth/addr:
client is bound to port 37756 which is not privileged
server2: [2014-04-22 03:44:52.810565] E [addr.c:143:gf_auth] 0-auth/addr:
client is bound to port 16852 which is not privileged
server3: [2014-04-22 03:44:52.811844] E [addr.c:143:gf_auth] 0-auth/addr:
client is bound to port 17733 which is not privileged

I got the error messages like:

[2014-04-22 03:43:59.757024] W
[client-handshake.c:1365:client_setvolume_cbk] 0-test_auth-client-1: failed
to set the volume (Permission denied)
[2014-04-22 03:43:59.757024] W
[client-handshake.c:1391:client_setvolume_cbk] 0-test_auth-client-1: failed
to get 'process-uuid' from reply dict
[2014-04-22 03:43:59.757102] E
[client-handshake.c:1397:client_setvolume_cbk] 0-test_auth-client-1:
SETVOLUME on remote-host failed: Authentication failed
[2014-04-22 03:43:59.757109] I
[client-handshake.c:1483:client_setvolume_cbk] 0-test_auth-client-1:
sending AUTH_FAILED event
[2014-04-22 03:43:59.757116] E [fuse-bridge.c:4834:notify] 0-fuse: Server
authenication failed. Shutting down.


Could anyone give some comments on this issue?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow healing times on large cinder and nova volumes

2014-04-21 Thread Pranith Kumar Karampuri
Could you attach log files please.
You said the bricks are replaced. In case of brick-replacement, index based 
self-heal doesn't work so full self-heal needs to be triggered using "gluster 
volume heal  full". Could you confirm if that command is issued?

Pranith
- Original Message -
> From: "Larry Schmid" 
> To: gluster-users@gluster.org
> Sent: Tuesday, April 22, 2014 4:07:39 AM
> Subject: [Gluster-users] Slow healing times on large cinder and nova volumes
> 
> 
> 
> Hi guys,
> 
> 
> 
> x-posted from irc.
> 
> 
> 
> We're having an issue on our prod openstack environment, which is backed by
> gluster using two replicas (I know. I wasn't given a choice.)
> 
> 
> 
> We lost storage on one of the replica servers and so had to replace failed
> bricks. The heal operation on Cinder and Nova volumes is coming up on the
> two-week mark and it seems as if it will never catch up and finish.
> 
> 
> 
> Nova heal info shows a constantly fluctuating list with multiple heals on
> many of the files, as if it's trying to keep up with deltas. It’s at 860GB
> of 1.1TB.
> 
> 
> 
> Cinder doesn't really seem to progress. It's at about 1.9T out of 6T
> utilized, though the total sparse file size totals about 30T. It also has
> done multiple heals on the some files.
> 
> 
> 
> I seem to be down to just watching it spin. Any help or tips?
> 
> 
> 
> Thanks,
> 
> 
> 
> Larry Schmid | Principal Cloud Engineer
> 
> 
> 
> IO
> 
> 
> 
> M +1.602.316.8639 | O +1.602.273.5431
> 
> E lsch...@io.com | io.com
> 
> 
> 
> 
> Founded in 2007, IO is a worldwide leader in software defined data center
> technology, services and solutions that enable businesses and governments to
> intelligently control their information.
> 
> The communication contained in this e-mail is confidential and is intended
> only for the named recipient(s) and may contain information that is
> privileged, proprietary, attorney work product or exempt from disclosure
> under applicable law. If you have received this message in error, or are not
> the named recipient(s), please note that any form of distribution, copying
> or use of this communication or the information in it is strictly prohibited
> and may be unlawful. Please immediately notify the sender of the error, and
> delete this communication including any attached files from your system.
> Thank you for your cooperation.
> 
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Slow healing times on large cinder and nova volumes

2014-04-21 Thread Schmid, Larry
Hi guys,

x-posted from irc.

We're having an issue on our prod openstack environment, which is backed by 
gluster using two replicas (I know.  I wasn't given a choice.)

We lost storage on one of the replica servers and so had to replace failed 
bricks.  The heal operation on Cinder and Nova volumes is coming up on the 
two-week mark and it seems as if it will never catch up and finish.

Nova heal info shows a constantly fluctuating list with multiple heals on many 
of the files, as if it's trying to keep up with deltas.  It's at 860GB of 1.1TB.

Cinder doesn't really seem to progress. It's at about 1.9T out of 6T utilized, 
though the total sparse file size totals about 30T.  It also has done multiple 
heals on the some files.

I seem to be down to just watching it spin.  Any help or tips?

Thanks,

Larry Schmid | Principal Cloud Engineer

IO

M +1.602.316.8639  |  O +1.602.273.5431
E   lsch...@io.com | io.com



Founded in 2007, IO is a worldwide leader in software defined data center 
technology, services and solutions that enable businesses and governments to 
intelligently control their information.

The communication contained in this e-mail is confidential and is intended only 
for the named recipient(s) and may contain information that is privileged, 
proprietary, attorney work product or exempt from disclosure under applicable 
law. If you have received this message in error, or are not the named 
recipient(s), please note that any form of distribution, copying or use of this 
communication or the information in it is strictly prohibited and may be 
unlawful. Please immediately notify the sender of the error, and delete this 
communication including any attached files from your system. Thank you for your 
cooperation.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] New volume with existing glusterfs data?

2014-04-21 Thread Peter B.
Hello,

I'm looking for a way to migrate an existing glusterfs volume from a KVM
client to the KVM host.
The data on the disks was written and used by glusterfs previously, so I
guess it should be possible to just use it as-is.
Unfortunately I can't find any documentation on how to "re-create" that
volume using that data - somewhat like "importing" it?

If anyone could point me into the right direction, I'd be very grateful.

Thanks in advance,
Pb
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] libgfapi failover problem on replica bricks

2014-04-21 Thread Paul Penev
2014-04-21 21:12 GMT+02:00 Joe Julian :
> qemu outputs the glusterfs log to stdout:
> https://github.com/qemu/qemu/blob/stable-1.7/block/gluster.c#L211

I modified that line so that it would log to a file instead (as stdout
was not available in my case).

At the end of the logs, there is a pointer for afr trying to heal an
already healed vm image (?). That is not possible because the brick
was killed.

Start KVM

[2014-04-21 20:18:09.219715] I [socket.c:3480:socket_init] 0-gfapi:
SSL support is NOT enabled
[2014-04-21 20:18:09.219758] I [socket.c:3495:socket_init] 0-gfapi:
using system polling thread
[2014-04-21 20:18:09.227508] I [socket.c:3480:socket_init]
0-gtest-client-1: SSL support is NOT enabled
[2014-04-21 20:18:09.227530] I [socket.c:3495:socket_init]
0-gtest-client-1: using system polling thread
[2014-04-21 20:18:09.228142] I [socket.c:3480:socket_init]
0-gtest-client-0: SSL support is NOT enabled
[2014-04-21 20:18:09.228155] I [socket.c:3495:socket_init]
0-gtest-client-0: using system polling thread
[2014-04-21 20:18:09.228178] I [glfs-master.c:92:notify] 0-gfapi: New
graph 7631342d-3334-3636-3131-2d323031342f (0) coming up
[2014-04-21 20:18:09.228193] I [client.c:2155:notify]
0-gtest-client-0: parent translators are ready, attempting connect on
transport
[2014-04-21 20:18:09.228646] I [client.c:2155:notify]
0-gtest-client-1: parent translators are ready, attempting connect on
transport
[2014-04-21 20:18:09.229356] I [rpc-clnt.c:1676:rpc_clnt_reconfig]
0-gtest-client-0: changing port to 49153 (from 0)
[2014-04-21 20:18:09.229416] W [socket.c:514:__socket_rwv]
0-gtest-client-0: readv failed (No data available)
[2014-04-21 20:18:09.229960] I
[client-handshake.c:1659:select_server_supported_programs]
0-gtest-client-0: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2014-04-21 20:18:09.230060] I [rpc-clnt.c:1676:rpc_clnt_reconfig]
0-gtest-client-1: changing port to 49153 (from 0)
[2014-04-21 20:18:09.230099] W [socket.c:514:__socket_rwv]
0-gtest-client-1: readv failed (No data available)
[2014-04-21 20:18:09.230380] I
[client-handshake.c:1456:client_setvolume_cbk] 0-gtest-client-0:
Connected to 10.0.0.14:49153, attached to remote volume
'/var/gtest/brick'.
[2014-04-21 20:18:09.230395] I
[client-handshake.c:1468:client_setvolume_cbk] 0-gtest-client-0:
Server and Client lk-version numbers are not same, reopening the fds
[2014-04-21 20:18:09.230443] I [afr-common.c:3698:afr_notify]
0-gtest-replicate-0: Subvolume 'gtest-client-0' came back up; going
online.
[2014-04-21 20:18:09.230598] I
[client-handshake.c:450:client_set_lk_version_cbk] 0-gtest-client-0:
Server lk version = 1
[2014-04-21 20:18:09.230922] I
[client-handshake.c:1659:select_server_supported_programs]
0-gtest-client-1: Using Program GlusterFS 3.3, Num (1298437), Version
(330)
[2014-04-21 20:18:09.231290] I
[client-handshake.c:1456:client_setvolume_cbk] 0-gtest-client-1:
Connected to 10.0.0.15:49153, attached to remote volume
'/var/gtest/brick'.
[2014-04-21 20:18:09.231311] I
[client-handshake.c:1468:client_setvolume_cbk] 0-gtest-client-1:
Server and Client lk-version numbers are not same, reopening the fds
[2014-04-21 20:18:09.247979] I
[client-handshake.c:450:client_set_lk_version_cbk] 0-gtest-client-1:
Server lk version = 1
[2014-04-21 20:18:09.248451] I
[afr-common.c:2057:afr_set_root_inode_on_first_lookup]
0-gtest-replicate-0: added root inode
[2014-04-21 20:18:09.249045] I [afr-common.c:2120:afr_discovery_cbk]
0-gtest-replicate-0: selecting local read_child gtest-client-0
[2014-04-21 20:18:09.249125] I
[glfs-resolve.c:788:__glfs_active_subvol] 0-gtest: switched to graph
7631342d-3334-3636-3131-2d323031342f (0)


Kill glusterfsd:

[2014-04-21 20:21:35.291871] W [socket.c:514:__socket_rwv]
0-gtest-client-0: readv failed (No data available)
[2014-04-21 20:21:35.291914] W
[socket.c:1962:__socket_proto_state_machine] 0-gtest-client-0: reading
from socket failed. Error (No data available), peer (10.0.0.14:49153)
[2014-04-21 20:21:35.291972] I [client.c:2098:client_rpc_notify]
0-gtest-client-0: disconnected
[2014-04-21 20:21:46.242636] I [rpc-clnt.c:1676:rpc_clnt_reconfig]
0-gtest-client-0: changing port to 49153 (from 0)
[2014-04-21 20:21:46.242721] W [socket.c:514:__socket_rwv]
0-gtest-client-0: readv failed (No data available)
[2014-04-21 20:21:46.243047] E [socket.c:2157:socket_connect_finish]
0-gtest-client-0: connection to 10.0.0.14:49153 failed (Connection
refused)
[2014-04-21 20:21:46.243073] W [socket.c:514:__socket_rwv]
0-gtest-client-0: readv failed (No data available)
[2014-04-21 20:21:50.243299] W [socket.c:514:__socket_rwv]
0-gtest-client-0: readv failed (No data available)

On restart of gluster server:

[2014-04-21 20:23:26.266552] W [socket.c:514:__socket_rwv]
0-gtest-client-0: readv failed (No data available)
[2014-04-21 20:23:27.566449] W [socket.c:514:__socket_rwv] 0-gfapi:
readv failed (No data available)
[2014-04-21 20:23:27.566483] W
[socket.c:1962:__socket_proto_state_machine] 0-gfapi: reading from
socket f

Re: [Gluster-users] Scaling for repository purposes

2014-04-21 Thread Joop

Peter Milanese wrote:

Greeting-

 I'm relatively new to the Gluster community, and would like to 
investigate Gluster as a solution to augment our current storage 
systems. My use of Gluster has been limited to nitch use cases. Is 
there anybody in the Library/Digital Repository space that has 
implemented this for mass storage (multi-petabyte). I'd be interested 
in having a discussion via email if that's ok.


I attended a gluster meeting and there was someone who was into Digital 
Archiving and using gluster for that. The following is a copy of part of 
the email with slides from those talks. I think it was one of the last 
two but the others might be interesting too.


---

*Presentations during the Gluster Community Seminar*

*The State of the Gluster Community 
*

/John Mark Walker, Gluster Community Leader, Red Hat/*

*GlusterFS for SysAdmins 
*

*/Niels de Vos, Senior Software Maintenance Engineer, Red Hat/

*Network PVR for real-time broadcast TV to 1 mln subscribers homes, 
powered by GlusterFS 
*

/Tycho Klitsee, Technical Consultant and Co-owner, Kratz Business Solutions/

*Gluster Forge Demos presentation 1  
*

/Fred van Zwieten, Technical Engineer, VX Company &// //

/*Gluster Forge Demos presentation 2 
*/

/Marcel Hergaarden, Solution Architect Storage, Red Hat

---

Joop

//
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Scaling for repository purposes

2014-04-21 Thread James
On Mon, Apr 21, 2014 at 9:54 AM, Peter Milanese  wrote:
> Greeting-
Hey,

>
>  I'm relatively new to the Gluster community, and would like to investigate
> Gluster as a solution to augment our current storage systems. My use of
> Gluster has been limited to nitch use cases. Is there anybody in the
> Library/Digital Repository space that has implemented this for mass storage
> (multi-petabyte). I'd be interested in having a discussion via email if
> that's ok.

TBH, the best way to learn about Gluster is to start playing with it.
It's fairly easy to do by hand, but there is also a Puppet-Gluster
module to automate it, and it integrates with Vagrant. Disclaimer: I'm
the author of this code, so of course, I think it's a great idea to
use it.
When I first started testing gluster, this enabled me to try different
configurations and get familiar with how it works.

# the code
https://github.com/purpleidea/puppet-gluster

# some related articles
https://ttboj.wordpress.com/

Try out GlusterFS, and once you've used it for a week, come back with
the harder questions :)
When you want professional support, you can also go to RedHat and get
support too! (Red Hat Storage)

HTH,
James

>
> Thanks.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 3.3 to 3.4.3 now can't add bricks

2014-04-21 Thread Brandon Mackie
A conversation with myself, but JoeJulian was kind enough to point out that it 
was the info file and not the vol file that wasn’t matched. Confirmed that new 
servers have two extra lines:

op-version=2
client-op-version=2

Which of course would cause a mismatch. I’ll take down everything tonight and 
see if that corrects it as it does correct it on some test servers.

From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Brandon Mackie
Sent: April 21, 2014 1:13 PM
To: gluster-users@gluster.org
Subject: Re: [Gluster-users] Upgrade from 3.3 to 3.4.3 now can't add bricks

Replace “bricks” with “servers” as IRC just informed me my vocab was crossed. 
Anyway, I wanted to add that following 
http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected
 I can get one brick to see it as not rejected (the brick that I peer on step 4 
but the rest still see it as rejected.

From: 
gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Brandon Mackie
Sent: April 21, 2014 12:40 PM
To: gluster-users@gluster.org
Subject: [Gluster-users] Upgrade from 3.3 to 3.4.3 now can't add bricks

Good afternoon folks,

Not sure if this is an occurrence of 
https://bugzilla.redhat.com/show_bug.cgi?id=1072720 but cannot add new bricks 
to my existing cluster running under Ubuntu 12. New bricks are completely blank.

Peer probed new from a trusted member. Both sides say “Peer Rejected 
(Connected)” but does not migrate the peer to the other bricks. If I then clear 
out the newly created /var/lib/glusterd on the new brick and then restart the 
server then it migrates to all bricks but all say rejected (connected) instead 
of just one. Confirmed not firewall as I can open a tcp port in both directions.

A quick snap of the log on the new brick:
http://pastie.org/9098095

I’ve heard this means the vol file doesn’t match but this is a brand new clean 
server and therefore no vol file.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] libgfapi failover problem on replica bricks

2014-04-21 Thread Joe Julian
As was pointed out to me by Andy5_ on the IRC channel, qemu outputs the 
glusterfs log on stdout: 
https://github.com/qemu/qemu/blob/stable-1.7/block/gluster.c#L211


On 04/21/2014 09:51 AM, Paul Penev wrote:

I sent the brick logs earlier. But I'm not able to produce logs from
events in KVM. I can't find any logging or debugging interface. It is
somewhat weird.

Paul

2014-04-21 18:30 GMT+02:00 Joe Julian :

I don't expect much from the bricks either, but in combination with the
client log they might tell us something.

On April 21, 2014 9:21:56 AM PDT, Paul Penev  wrote:

Joe,
it will take some time for redo the logs (I'm doing them now).
While waiting for the heal to complete I did some research on the
version history of qemu.

There's a patch made two months ago that intrigues me:


https://github.com/qemu/qemu/commit/adccfbcd6020e928db93b2b4faf0dbd05ffbe016

Also the changelog
https://github.com/qemu/qemu/commits/master/block/gluster.c seems to
support some work done recently in gluster.c

keep tuned for the logs from the bricks (but I don't expect much there).


--
Sent from my Android device with K-9 Mail. Please excuse my brevity.


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Upgrade from 3.3 to 3.4.3 now can't add bricks

2014-04-21 Thread Brandon Mackie
Replace “bricks” with “servers” as IRC just informed me my vocab was crossed. 
Anyway, I wanted to add that following 
http://www.gluster.org/community/documentation/index.php/Resolving_Peer_Rejected
 I can get one brick to see it as not rejected (the brick that I peer on step 4 
but the rest still see it as rejected.

From: gluster-users-boun...@gluster.org 
[mailto:gluster-users-boun...@gluster.org] On Behalf Of Brandon Mackie
Sent: April 21, 2014 12:40 PM
To: gluster-users@gluster.org
Subject: [Gluster-users] Upgrade from 3.3 to 3.4.3 now can't add bricks

Good afternoon folks,

Not sure if this is an occurrence of 
https://bugzilla.redhat.com/show_bug.cgi?id=1072720 but cannot add new bricks 
to my existing cluster running under Ubuntu 12. New bricks are completely blank.

Peer probed new from a trusted member. Both sides say “Peer Rejected 
(Connected)” but does not migrate the peer to the other bricks. If I then clear 
out the newly created /var/lib/glusterd on the new brick and then restart the 
server then it migrates to all bricks but all say rejected (connected) instead 
of just one. Confirmed not firewall as I can open a tcp port in both directions.

A quick snap of the log on the new brick:
http://pastie.org/9098095

I’ve heard this means the vol file doesn’t match but this is a brand new clean 
server and therefore no vol file.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Upgrade from 3.3 to 3.4.3 now can't add bricks

2014-04-21 Thread Brandon Mackie
Good afternoon folks,

Not sure if this is an occurrence of 
https://bugzilla.redhat.com/show_bug.cgi?id=1072720 but cannot add new bricks 
to my existing cluster running under Ubuntu 12. New bricks are completely blank.

Peer probed new from a trusted member. Both sides say “Peer Rejected 
(Connected)” but does not migrate the peer to the other bricks. If I then clear 
out the newly created /var/lib/glusterd on the new brick and then restart the 
server then it migrates to all bricks but all say rejected (connected) instead 
of just one. Confirmed not firewall as I can open a tcp port in both directions.

A quick snap of the log on the new brick:
http://pastie.org/9098095

I’ve heard this means the vol file doesn’t match but this is a brand new clean 
server and therefore no vol file.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] libgfapi failover problem on replica bricks

2014-04-21 Thread Paul Penev
I sent the brick logs earlier. But I'm not able to produce logs from
events in KVM. I can't find any logging or debugging interface. It is
somewhat weird.

Paul

2014-04-21 18:30 GMT+02:00 Joe Julian :
> I don't expect much from the bricks either, but in combination with the
> client log they might tell us something.
>
> On April 21, 2014 9:21:56 AM PDT, Paul Penev  wrote:
>>
>> Joe,
>> it will take some time for redo the logs (I'm doing them now).
>> While waiting for the heal to complete I did some research on the
>> version history of qemu.
>>
>> There's a patch made two months ago that intrigues me:
>>
>>
>> https://github.com/qemu/qemu/commit/adccfbcd6020e928db93b2b4faf0dbd05ffbe016
>>
>> Also the changelog
>> https://github.com/qemu/qemu/commits/master/block/gluster.c seems to
>> support some work done recently in gluster.c
>>
>> keep tuned for the logs from the bricks (but I don't expect much there).
>
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Conflicting entries for symlinks between bricks (Trusted.gfid not consistent)

2014-04-21 Thread PEPONNET, Cyril N (Cyril)
Hi,

I will try to reproduce this in a vagrant-cluster environnement.

If it can help, here is the time line of the event.

t0: 2 serveurs in replicate mode no issue
t1: power down server1 due to hardware issue
t2: server2 still continue to serve files through NFS and fuse, and still 
continue to being updated by automated build process / copy from other places
t3: power up server1 which has been fixed, auto healing start for files, in the 
mean time we had a Jenkins job deploying files and remove/create symlinks to 
the proper target (through NFS and fuse).
t4: Heal failed on some directory, in fact some symlinks in those directory has 
not been updated from those located on server2 and we could not access those 
directory anymore (I/O Error).
t5: Removing the symlinks from server1 (directly on the brick), start a new 
replicate from server2, the symlink are now consistent between the two servers.

We didn’t add/remove brick during this process.

As I can see, the split-mount help to remove the bad files from bricks in a 
split-brain like event. (That what we have done by hand so far).

Thanks

On Apr 19, 2014, at 9:42 AM, Joe Julian 
mailto:j...@julianfamily.org>> wrote:

What would really help is a clear list of steps to reproduce this issue. It 
sounds like a bug but I can't repro.

In your questions you ask in relation to adding or removing bricks whether you 
can continue to read and write. My understanding is that you're not doing that 
(gluster volume (add|remove)-brick) but rather just shutting down. If my 
understanding is correct, then yes. You should be able to continue normal 
operation.
Repairing this issue is the same as healing split-brain. The easiest way is to 
use splitmount[1] to delete one of them.

[1] https://forge.gluster.org/splitmount



On April 17, 2014 4:37:32 PM PDT, "PEPONNET, Cyril (Cyril)" 
mailto:cyril.pepon...@alcatel-lucent.com>> 
wrote:
Hi gluster people !

I would like some help regarding an issue we have with our early production 
glusterfs setup.

Our Topology:

2 Bricks in Replicate mode:

[root@myBrick1 /]# cat /etc/redhat-release
CentOS release 6.5 (Final)
[root@myBrick1 /]# glusterfs --version
glusterfs 3.4.2 built on Jan  3 2014 12:38:05
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. 


[root@myBrick1 /]# gluster volume info

Volume Name: myVol
Type: Replicate
Volume ID: 58f5d775-acb5-416d-bee6-5209f7b20363
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: myBrick1.company.lan:/export/raid/myVol
Brick2: myBrick2.company.lan:/export/raid/myVol
Options Reconfigured:
nfs.enable-ino32: on

The issue:

We power down a brick (myBrick1) for hardware maintenance, when we power it up, 
issues starts with some files (symlinks in fact), auto healing seems not 
working fine for all the files…

Let's take a look with one faulty symlink:

Using fuse.glusterfs (sometimes it works sometimes not)

[root@myBrick2 /]mount
...
myBrick2.company.lan:/myVol on /images type fuse.glusterfs 
(rw,default_permissions,allow_other,max_read=131072)
...

[root@myBrick2 /]# stat /images/myProject1/2.1_stale/current
  File: `/images/myProject1/2.1_stale/current' -> `current-59a77422'
  Size: 16 Blocks: 0  IO Block: 131072 symbolic link
Device: 13h/19d Inode: 11422905275486058235  Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (  499/ testlab)   Gid: (  499/ testlab)
Access: 2014-04-17 14:05:54.488238322 -0700
Modify: 2014-04-16 19:46:05.033299589 -0700
Change: 2014-04-17 14:05:54.487238322 -0700

[root@myBrick2 /]# stat /images/myProject1/2.1_stale/current
stat: cannot stat `/images/myProject1/2.1_stale/current': Input/output error

I type the above commands with few seconds between them.

Let's try with the other brick

[root@myBrick1 ~]mount
...
myBrick1.company.lan:/myVol on /images type fuse.glusterfs 
(rw,default_permissions,allow_other,max_read=131072)
...

[root@myBrick1 ~]# stat /images/myProject1/2.1_stale/current
stat: cannot stat `/images/myProject1/2.1_stale/current': Input/output error

With this one it always fail… (myBrick1 is the server we powered up after 
maintenance).

Using nfs:

It never works (tested with two bricks)

[root@station-localdomain myProject1]# mount
...
myBrick1:/myVol on /images type nfs 
(rw,relatime,vers=3,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp,timeo=14,retrans=2,sec=sys,mountaddr=10.0.0.57,mountvers=3,mountport=38465,mountproto=tcp,local_lock=none,addr=10.0.0.57)
...

[root@station-localdomain myProject1]# ls 2.1_stale
ls: cannot access 2.1_stale: Input/output error

In both cases here are the logs:

==> /var/log/glusterfs/glustershd.log <==
[2014-04-17 10:20:25.861003] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 
0-myVol-replicate-0: : Performing 
conservative merge
[2014-04-17 10:20:25.895143] I [afr-self-heal-entry.c:2253:afr_sh_entry_fix] 
0-myVol-replicate-0: : Performing 
conservative merge
[2014-04-17 10:20:25.949176] I [afr-se

Re: [Gluster-users] libgfapi failover problem on replica bricks

2014-04-21 Thread Paul Penev
Sorry, I forgot to mention that at this point, restarting gluster on
server 15 leads to reconnect of the KVM client.

[2014-04-21 16:29:40.634713] I
[server-handshake.c:567:server_setvolume] 0-gtest-server: accepted
client from s15-213620-2014/04/21-09:53:17:688030-gtest-client-1-0
(version: 3.4.3)

>From this traces, I'm inclined to point towards KVM's handling of the errors.

The 1.7 trunk that I'm using (stable development) is here:

https://github.com/qemu/qemu/blob/stable-1.7/block/gluster.c

I don't see anything dealing with connection problems and reconnects.
This is why I assumed that libgfapi is responsible for maintaining
connections to the bricks and to reestablish them as needed (makes
sense, but feel free to prove me wrong).

Paul
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] libgfapi failover problem on replica bricks

2014-04-21 Thread Paul Penev
Here is the session log from the testing. Unfortunately there is
little in the logs.

1. KVM running on server 15. bricks are on server 14 and 15. Killing
glusterfsd on server 14
1.1. killall -KILL glusterfsd

It has no change to log anything, so the logs below are from the
gluster server restart.
The KVM machine has process PID 213620


[2014-04-21 16:01:13.434638] I [glusterfsd.c:1910:main]
0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version
3.4.3 (/usr/sbin/glusterfsd -s s14 --volfile-id
gtest.s14.var-gtest-brick -p
/var/lib/glusterd/vols/gtest/run/s14-var-gtest-brick.pid -S
/var/run/358a7d1bdc05c7a60f4379b9e54fac65.socket --brick-name
/var/gtest/brick -l /var/log/glusterfs/bricks/var-gtest-brick.log
--xlator-option
*-posix.glusterd-uuid=84eb547a-6086-4795-a543-e9dad6fe78d1
--brick-port 49153 --xlator-option gtest-server.listen-port=49153)
[2014-04-21 16:01:13.436112] I [socket.c:3480:socket_init]
0-socket.glusterfsd: SSL support is NOT enabled
[2014-04-21 16:01:13.436144] I [socket.c:3495:socket_init]
0-socket.glusterfsd: using system polling thread
[2014-04-21 16:01:13.436245] I [socket.c:3480:socket_init]
0-glusterfs: SSL support is NOT enabled
[2014-04-21 16:01:13.436256] I [socket.c:3495:socket_init]
0-glusterfs: using system polling thread
[2014-04-21 16:01:16.982974] I [graph.c:239:gf_add_cmdline_options]
0-gtest-server: adding option 'listen-port' for volume 'gtest-server'
with value '49153'
[2014-04-21 16:01:16.982992] I [graph.c:239:gf_add_cmdline_options]
0-gtest-posix: adding option 'glusterd-uuid' for volume 'gtest-posix'
with value '84eb547a-6086-4795-a543-e9dad6fe78d1'
[2014-04-21 16:01:16.984083] W [options.c:848:xl_opt_validate]
0-gtest-server: option 'listen-port' is deprecated, preferred is
'transport.socket.listen-port', continuing with correction
[2014-04-21 16:01:16.984113] I [socket.c:3480:socket_init]
0-tcp.gtest-server: SSL support is NOT enabled
[2014-04-21 16:01:16.984121] I [socket.c:3495:socket_init]
0-tcp.gtest-server: using system polling thread
Given volfile:
+--+
  1: volume gtest-posix
  2: type storage/posix
  3: option volume-id 083fea96-47e7-4904-a3d3-500e226786d2
  4: option directory /var/gtest/brick
  5: end-volume
  6:
  7: volume gtest-access-control
  8: type features/access-control
  9: subvolumes gtest-posix
 10: end-volume
 11:
 12: volume gtest-locks
 13: type features/locks
 14: subvolumes gtest-access-control
 15: end-volume
 16:
 17: volume gtest-io-threads
 18: type performance/io-threads
 19: subvolumes gtest-locks
 20: end-volume
 21:
 22: volume gtest-index
 23: type features/index
 24: option index-base /var/gtest/brick/.glusterfs/indices
 25: subvolumes gtest-io-threads
 26: end-volume
 27:
 28: volume gtest-marker
 29: type features/marker
 30: option quota off
 31: option xtime off
 32: option timestamp-file /var/lib/glusterd/vols/gtest/marker.tstamp
 33: option volume-uuid 083fea96-47e7-4904-a3d3-500e226786d2
 34: subvolumes gtest-index
 35: end-volume
 36:
 37: volume /var/gtest/brick
 38: type debug/io-stats
 39: option count-fop-hits off
 40: option latency-measurement off
 41: option log-level INFO
 42: subvolumes gtest-marker
 43: end-volume
 44:
 45: volume gtest-server
 46: type protocol/server
 47: option statedump-path /tmp
 48: option auth.addr./var/gtest/brick.allow *
 49: option
auth.login.734ec716-f516-4b95-93b0-2714373f25c4.password
700527a5-a1d1-4c14-97ac-0c640c9467b1
 50: option auth.login./var/gtest/brick.allow
734ec716-f516-4b95-93b0-2714373f25c4
 51: option transport-type tcp
 52: subvolumes /var/gtest/brick
 53: end-volume

+--+
[2014-04-21 16:01:17.142466] I
[server-handshake.c:567:server_setvolume] 0-gtest-server: accepted
client from s15-213620-2014/04/21-09:53:17:688030-gtest-client-0-0
(version: 3.4.3)

This is all. I now wait for heal to finish.

2. Heal is finished. I kill the glusterfsd on server 15:

This kills the KVM machine. nothing is in the logs. Dmesg from the KVM is:

Write-error on swap-device (252:0:96627104)
Write-error on swap-device (252:0:96627112)
Write-error on swap-device (252:0:96627120)
Write-error on swap-device (252:0:96627128)
Write-error on swap-device (252:0:96627136)
Write-error on swap-device (252:0:96627144)
Write-error on swap-device (252:0:96627152)
Write-error on swap-device (252:0:96627160)
Write-error on swap-device (252:0:96627168)
Write-error on swap-device (252:0:96627176)
Write-error on swap-device (252:0:96627184)
Write-error on swap-device (252:0:96627192)
Write-error on swap-device (252:0:96627200)
Write-error on swap-device (252:0:96627208)
Write-error on swap-device (252:0:96627216)
Write-error on swap-device (252:0:96627224)
Write-error on swap-device (252:0:96627232)
Write-error on swap-devic

[Gluster-users] Scaling for repository purposes

2014-04-21 Thread Peter Milanese
Greeting-

 I'm relatively new to the Gluster community, and would like to investigate
Gluster as a solution to augment our current storage systems. My use of
Gluster has been limited to nitch use cases. Is there anybody in the
Library/Digital Repository space that has implemented this for mass storage
(multi-petabyte). I'd be interested in having a discussion via email if
that's ok.

Thanks.

-- 
Peter J. Milanese, Lead Systems Engineer
NYPL Technology
The New York Public Library 
petermilan...@nypl.org - 212.621.0203
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] libgfapi failover problem on replica bricks

2014-04-21 Thread Joe Julian

Now that you have a simple repro, lets get clean logs for this failure.

Truncate logs, produce the error, post logs. Let's see if it's already 
telling us what the problem is.


On 4/21/2014 12:47 AM, Paul Penev wrote:

Ok, here is one more hint that point in the direction oflibgfapi  *client*  not
re-establishing the connections to the bricks after they come back
online: if I migrate the KVM machine (live) from one node to another
after the bricks are back online, and I kill the second brick, the KVM
will not suffer from disk problems. It is obvious that during
migration, the new process on the new node is forced to reconnect to
the gluster volume, hence reestablishing both links. After this it is
ready to loose one of the links without problems.

Steps to replicate:

1. Start KVM VM and boot from a replicated volume
2. killall -KILL glusterfsd on one brick (brick1). Verify the the kvm
is still working.
3. Bring back the glusterfsd on brick1.
4. heal the volume (gluster vol heal ) and wait until gluster vol
heal  info shows no self-heal backlog.
5. Now migrate the KVM from one node to another node.
6. killall -KILL glusterfsd on the second brick (brick2).
7. Verify that KVM is still working (!) It would die from disk errors
before, if step 5 was not executed.
8. Bring back glusterfsd on brick2, heal and enjoy.
9. repeat at will: The KVM will never die again, provided you migrate
is once before brick failure.

What this means to me: there's a problem in libgfapi, gluster 3.4.2
and 3.4.3 (at least) and/or kvm 1.7.1 (I'm running the latest 1.7
source tree in production).

Joe: we're in your hands. I hope you find the problem somewhere.

Paul.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] libgfapi failover problem on replica bricks

2014-04-21 Thread Samuli Heinonen
Hi,

Could you send output of gluster volume info and also exact command you are 
using to start VM's and what cache settings you are using with KVM?

-samuli
 
Paul Penev  kirjoitti 21.4.2014 kello 10.47:

> Ok, here is one more hint that point in the direction of libgfapi not
> re-establishing the connections to the bricks after they come back
> online: if I migrate the KVM machine (live) from one node to another
> after the bricks are back online, and I kill the second brick, the KVM
> will not suffer from disk problems. It is obvious that during
> migration, the new process on the new node is forced to reconnect to
> the gluster volume, hence reestablishing both links. After this it is
> ready to loose one of the links without problems.
> 
> Steps to replicate:
> 
> 1. Start KVM VM and boot from a replicated volume
> 2. killall -KILL glusterfsd on one brick (brick1). Verify the the kvm
> is still working.
> 3. Bring back the glusterfsd on brick1.
> 4. heal the volume (gluster vol heal ) and wait until gluster vol
> heal  info shows no self-heal backlog.
> 5. Now migrate the KVM from one node to another node.
> 6. killall -KILL glusterfsd on the second brick (brick2).
> 7. Verify that KVM is still working (!) It would die from disk errors
> before, if step 5 was not executed.
> 8. Bring back glusterfsd on brick2, heal and enjoy.
> 9. repeat at will: The KVM will never die again, provided you migrate
> is once before brick failure.
> 
> What this means to me: there's a problem in libgfapi, gluster 3.4.2
> and 3.4.3 (at least) and/or kvm 1.7.1 (I'm running the latest 1.7
> source tree in production).
> 
> Joe: we're in your hands. I hope you find the problem somewhere.
> 
> Paul.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users

___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] libgfapi failover problem on replica bricks

2014-04-21 Thread Paul Penev
Ok, here is one more hint that point in the direction of libgfapi not
re-establishing the connections to the bricks after they come back
online: if I migrate the KVM machine (live) from one node to another
after the bricks are back online, and I kill the second brick, the KVM
will not suffer from disk problems. It is obvious that during
migration, the new process on the new node is forced to reconnect to
the gluster volume, hence reestablishing both links. After this it is
ready to loose one of the links without problems.

Steps to replicate:

1. Start KVM VM and boot from a replicated volume
2. killall -KILL glusterfsd on one brick (brick1). Verify the the kvm
is still working.
3. Bring back the glusterfsd on brick1.
4. heal the volume (gluster vol heal ) and wait until gluster vol
heal  info shows no self-heal backlog.
5. Now migrate the KVM from one node to another node.
6. killall -KILL glusterfsd on the second brick (brick2).
7. Verify that KVM is still working (!) It would die from disk errors
before, if step 5 was not executed.
8. Bring back glusterfsd on brick2, heal and enjoy.
9. repeat at will: The KVM will never die again, provided you migrate
is once before brick failure.

What this means to me: there's a problem in libgfapi, gluster 3.4.2
and 3.4.3 (at least) and/or kvm 1.7.1 (I'm running the latest 1.7
source tree in production).

Joe: we're in your hands. I hope you find the problem somewhere.

Paul.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users