Re: [Gluster-users] usage of harddisks: each hdd a brick? raid?

2019-01-09 Thread Serkan Çoban
We ara also using 10TB disks, heal takes 7-8 days.
You can play with "cluster.shd-max-threads" setting. It is default 1 I
think. I am using it with 4.
Below you can find more info:
https://access.redhat.com/solutions/882233

On Thu, Jan 10, 2019 at 9:53 AM Hu Bert  wrote:
>
> Hi Mike,
>
> > We have similar setup, and I do not test restoring...
> > How many volumes do you have - one volume on one (*3) disk 10 TB in size
> >   - then 4 volumes?
>
> Testing could be quite easy: reset-brick start, then delete&re-create
> partition/fs/etc., reset-brick commit force - and then watch.
>
> We only have 1 big volume over all bricks. Details:
>
> Volume Name: shared
> Type: Distributed-Replicate
> Number of Bricks: 4 x 3 = 12
> Brick1: gluster11:/gluster/bricksda1/shared
> Brick2: gluster12:/gluster/bricksda1/shared
> Brick3: gluster13:/gluster/bricksda1/shared
> Brick4: gluster11:/gluster/bricksdb1/shared
> Brick5: gluster12:/gluster/bricksdb1/shared
> Brick6: gluster13:/gluster/bricksdb1/shared
> Brick7: gluster11:/gluster/bricksdc1/shared
> Brick8: gluster12:/gluster/bricksdc1/shared
> Brick9: gluster13:/gluster/bricksdc1/shared
> Brick10: gluster11:/gluster/bricksdd1/shared
> Brick11: gluster12:/gluster/bricksdd1_new/shared
> Brick12: gluster13:/gluster/bricksdd1_new/shared
>
> Didn't think about creating more volumes (in order to split data),
> e.g. 4 volumes with 3*10TB each, or 2 volumes with 6*10TB each.
>
> Just curious: after splitting into 2 or more volumes - would that make
> the volume with the healthy/non-restoring disks better accessable? And
> only the volume with the once faulty and now restoring disk would be
> in a "bad mood"?
>
> > > Any opinions on that? Maybe it would be better to use more servers and
> > > smaller disks, but this isn't possible at the moment.
> > Also interested. We can swap SSDs to HDDs for RAID10, but is it worthless?
>
> Yeah, would be interested in how the glusterfs professionsals deal
> with faulty disks, especially when these are as big as our ones.
>
>
> Thx
> Hubert
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] usage of harddisks: each hdd a brick? raid?

2019-01-09 Thread Hu Bert
Hi Mike,

> We have similar setup, and I do not test restoring...
> How many volumes do you have - one volume on one (*3) disk 10 TB in size
>   - then 4 volumes?

Testing could be quite easy: reset-brick start, then delete&re-create
partition/fs/etc., reset-brick commit force - and then watch.

We only have 1 big volume over all bricks. Details:

Volume Name: shared
Type: Distributed-Replicate
Number of Bricks: 4 x 3 = 12
Brick1: gluster11:/gluster/bricksda1/shared
Brick2: gluster12:/gluster/bricksda1/shared
Brick3: gluster13:/gluster/bricksda1/shared
Brick4: gluster11:/gluster/bricksdb1/shared
Brick5: gluster12:/gluster/bricksdb1/shared
Brick6: gluster13:/gluster/bricksdb1/shared
Brick7: gluster11:/gluster/bricksdc1/shared
Brick8: gluster12:/gluster/bricksdc1/shared
Brick9: gluster13:/gluster/bricksdc1/shared
Brick10: gluster11:/gluster/bricksdd1/shared
Brick11: gluster12:/gluster/bricksdd1_new/shared
Brick12: gluster13:/gluster/bricksdd1_new/shared

Didn't think about creating more volumes (in order to split data),
e.g. 4 volumes with 3*10TB each, or 2 volumes with 6*10TB each.

Just curious: after splitting into 2 or more volumes - would that make
the volume with the healthy/non-restoring disks better accessable? And
only the volume with the once faulty and now restoring disk would be
in a "bad mood"?

> > Any opinions on that? Maybe it would be better to use more servers and
> > smaller disks, but this isn't possible at the moment.
> Also interested. We can swap SSDs to HDDs for RAID10, but is it worthless?

Yeah, would be interested in how the glusterfs professionsals deal
with faulty disks, especially when these are as big as our ones.


Thx
Hubert
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Input/output error on FUSE log

2019-01-09 Thread Matt Waymack
Has anyone any other ideas where to look?  This is only affecting FUSE clients. 
 SMB clients are unaffected by this problem.

Thanks!

From: gluster-users-boun...@gluster.org  On 
Behalf Of Matt Waymack
Sent: Monday, January 7, 2019 1:19 PM
To: Raghavendra Gowdappa 
Cc: gluster-users@gluster.org List 
Subject: Re: [Gluster-users] Input/output error on FUSE log

Attached are the logs from when a failure occurred with diagnostics set to 
trace.

Thank you!

From: Raghavendra Gowdappa mailto:rgowd...@redhat.com>>
Sent: Saturday, January 5, 2019 8:32 PM
To: Matt Waymack mailto:mwaym...@nsgdv.com>>
Cc: gluster-users@gluster.org List 
mailto:gluster-users@gluster.org>>
Subject: Re: [Gluster-users] Input/output error on FUSE log



On Sun, Jan 6, 2019 at 7:58 AM Raghavendra Gowdappa 
mailto:rgowd...@redhat.com>> wrote:


On Sun, Jan 6, 2019 at 4:19 AM Matt Waymack 
mailto:mwaym...@nsgdv.com>> wrote:

Hi all,



I'm having a problem writing to our volume.  When writing files larger than 
about 2GB, I get an intermittent issue where the write will fail and return 
Input/Output error.  This is also shown in the FUSE log of the client (this is 
affecting all clients).  A snip of a client log is below:

[2019-01-05 22:39:44.581371] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51040978: WRITE => -1 
gfid=82a0b5c4-7ef3-43c2-ad86-41e16673d7c2 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:44.598392] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51040979: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:47.420920] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041266: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949809b7f8 (Input/output error)

[2019-01-05 22:39:47.433377] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041267: FLUSH() ERR => -1 (Input/output error)

[2019-01-05 22:39:50.441531] W [fuse-bridge.c:2474:fuse_writev_cbk] 
0-glusterfs-fuse: 51041548: WRITE => -1 
gfid=0e8e1e13-97a5-478a-bc58-e81ddf3698a3 fd=0x7f949839a368 (Input/output error)

[2019-01-05 22:39:50.451914] W [fuse-bridge.c:1441:fuse_err_cbk] 
0-glusterfs-fuse: 51041549: FLUSH() ERR => -1 (Input/output error)

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1311504267" repeated 1721 times between 
[2019-01-05 22:39:33.906241] and [2019-01-05 22:39:44.598371]

The message "E [MSGID: 101046] [dht-common.c:1502:dht_lookup_dir_cbk] 
0-gv1-dht: dict is null" repeated 1714 times between [2019-01-05 
22:39:33.925981] and [2019-01-05 22:39:50.451862]

The message "W [MSGID: 109011] [dht-layout.c:163:dht_layout_search] 0-gv1-dht: 
no subvolume for hash (value) = 1137142622" repeated 1707 times between 
[2019-01-05 22:39:39.636552] and [2019-01-05 22:39:50.451895]

This looks to be a DHT issue. Some questions:
* Are all subvolumes of DHT up and client is connected to them? Particularly 
the subvolume which contains the file in question.
* Can you get all extended attributes of parent directory of the file from all 
bricks?
* set diagnostics.client-log-level to TRACE, capture these errors again and 
attach the client log file.

I spoke a bit early. dht_writev doesn't search hashed subvolume as its already 
been looked up in lookup. So, these msgs looks to be of a different issue - not 
 writev failure.


This is intermittent for most files, but eventually if a file is large enough 
it will not write.  The workflow is SFTP tot he client which then writes to the 
volume over FUSE.  When files get to a certain point,w e can no longer write to 
them.  The file sizes are different as well, so it's not like they all get to 
the same size and just stop either.  I've ruled out a free space issue, our 
files at their largest are only a few hundred GB and we have tens of terrabytes 
free on each brick.  We are also sharding at 1GB.

I'm not sure where to go from here as the error seems vague and I can only see 
it on the client log.  I'm not seeing these errors on the nodes themselves.  
This is also seen if I mount the volume via FUSE on any of the nodes as well 
and it is only reflected in the FUSE log.

Here is the volume info:
Volume Name: gv1
Type: Distributed-Replicate
Volume ID: 1472cc78-e2a0-4c3f-9571-dab840239b3c
Status: Started
Snapshot Count: 0
Number of Bricks: 8 x (2 + 1) = 24
Transport-type: tcp
Bricks:
Brick1: tpc-glus4:/exp/b1/gv1
Brick2: tpc-glus2:/exp/b1/gv1
Brick3: tpc-arbiter1:/exp/b1/gv1 (arbiter)
Brick4: tpc-glus2:/exp/b2/gv1
Brick5: tpc-glus4:/exp/b2/gv1
Brick6: tpc-arbiter1:/exp/b2/gv1 (arbiter)
Brick7: tpc-glus4:/exp/b3/gv1
Brick8: tpc-glus2:/exp/b3/gv1
Brick9: tpc-arbiter1:/exp/b3/gv1 (arbiter)
Brick10: tpc-glus4:/exp/b4/gv1
Brick11: tpc-glus2:/exp/b4/gv1
Brick12: tpc-arbiter1:/exp/b4/gv1 (arbiter)
Brick13: tpc-glus1:/exp/b5/gv1
Brick14: tpc-glus3:/exp/b5/gv1
Brick15: tpc-arbiter2:/exp/b5/gv1 (arbiter)
Brick16: tpc-glus1:/exp/b6/gv1
Brick17: tpc-glus3:/exp/b6/gv1

Re: [Gluster-users] usage of harddisks: each hdd a brick? raid?

2019-01-09 Thread Mike

09.01.2019 17:38, Hu Bert пишет:

Hi @all,

we have 3 servers, 4 disks (10TB) each, in a replicate 3 setup. We're
having some problems after a disk failed; the restore via reset-brick
takes way too long (way over a month)


terrible.

We have similar setup, and I do not test restoring...
How many volumes do you have - one volume on one (*3) disk 10 TB in size 
 - then 4 volumes?




We were thinking about migrating to 3 servers with a RAID10 (HW or
SW), again in a replicate 3 setup. We would waste a lot of space, but
the idea is that, if a hdd fails:

- the data are still available on the hdd copy
- performance is better than with a failed/restoring hdd
- the restore via SW/HW RAID is faster than the restore via glusterfs


Our setup is worse in wasted space  - we have a 3 10Tb disks in each 
server + 1 SSD "for raid controller cache".
there is no ability to create raid 10, raid 5 is a no-no on HDDS, and 
only viable variant is a 1ADM (3*RAID1 + 1 SSD cache) (for using all disks).


Or 2*RAID1 + 1 HDD for gluster ...


Any opinions on that? Maybe it would be better to use more servers and
smaller disks, but this isn't possible at the moment.


Also interested. We can swap SSDs to HDDs for RAID10, but is it worthless?

And what about right way to restore for configs like replica 3 *10Tb 
volumes?




___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users



___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] A broken file that can not be deleted

2019-01-09 Thread Dmitry Isakbayev
I am seeing a broken file that exists on 2 out of 3 nodes.  The application
trying to use the file throws file permissions error.  ls, rm, mv, touch
all throw "Input/output error"

$ ls -la
ls: cannot access .download_suspensions.memo: Input/output error
drwxrwxr-x. 2 ossadmin ossadmin  4096 Jan  9 08:06 .
drwxrwxr-x. 5 ossadmin ossadmin  4096 Jan  3 11:36 ..
-?? ? ????
.download_suspensions.memo

$ rm ".download_suspensions.memo"
rm: cannot remove ‘.download_suspensions.memo’: Input/output error
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] usage of harddisks: each hdd a brick? raid?

2019-01-09 Thread Hu Bert
Hi @all,

we have 3 servers, 4 disks (10TB) each, in a replicate 3 setup. We're
having some problems after a disk failed; the restore via reset-brick
takes way too long (way over a month), disk utilization is at 100%, it
doesn't get any faster, some params have already been tweaked. Only
about 50GB per day are copied, and for 2.5TB this takes lng...

We were thinking about migrating to 3 servers with a RAID10 (HW or
SW), again in a replicate 3 setup. We would waste a lot of space, but
the idea is that, if a hdd fails:

- the data are still available on the hdd copy
- performance is better than with a failed/restoring hdd
- the restore via SW/HW RAID is faster than the restore via glusterfs

Any opinions on that? Maybe it would be better to use more servers and
smaller disks, but this isn't possible at the moment.

thx
Hubert
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] replace-brick operation issue...

2019-01-09 Thread Anand Malagi
Can I please get some help in understanding the issue mentioned ?

From: Anand Malagi
Sent: Monday, December 31, 2018 1:39 PM
To: 'Anand Malagi' ; gluster-users@gluster.org
Subject: RE: replace-brick operation issue...

Can someone please help here ??

From: 
gluster-users-boun...@gluster.org 
mailto:gluster-users-boun...@gluster.org>> 
On Behalf Of Anand Malagi
Sent: Friday, December 21, 2018 3:44 PM
To: gluster-users@gluster.org
Subject: [Gluster-users] replace-brick operation issue...

Hi Friends,

Please note that, when replace-brick operation was tried for one of the bad 
brick present in distributed disperse EC volume, the command actually failed 
but the brick daemon of new replaced brick came online.
Please help to understand in what situations this issue may arise and proposed 
solution if possible ? :


glusterd.log  :



[2018-12-11 11:04:43.774120] I [MSGID: 106503] 
[glusterd-replace-brick.c:147:__glusterd_handle_replace_brick] 0-management: 
Received replace-brick commit force request.

[2018-12-11 11:04:44.784578] I [MSGID: 106504] 
[glusterd-utils.c:13079:rb_update_dstbrick_port] 0-glusterd: adding dst-brick 
port no 0

...

[2018-12-11 11:04:46.457537] E [MSGID: 106029] 
[glusterd-utils.c:7981:glusterd_brick_signal] 0-glusterd: Unable to open 
pidfile: 
/var/run/gluster/vols/AM6_HyperScale/am6sv0004sds.saipemnet.saipem.intranet-ws-disk3-ws_brick.pid
 [No such file or directory]

[2018-12-11 11:04:53.089810] I [glusterd-utils.c:5876:glusterd_brick_start] 
0-management: starting a fresh brick process for brick /ws/disk15/ws_brick

...

[2018-12-11 11:04:53.117935] W [socket.c:595:__socket_rwv] 0-socket.management: 
writev on 127.0.0.1:864 failed (Broken pipe)

[2018-12-11 11:04:54.014023] I [socket.c:2465:socket_event_handler] 
0-transport: EPOLLERR - disconnecting now

[2018-12-11 11:04:54.273190] I [MSGID: 106005] 
[glusterd-handler.c:6120:__glusterd_brick_rpc_notify] 0-management: Brick 
am6sv0004sds.saipemnet.saipem.intranet:/ws/disk15/ws_brick has disconnected 
from glusterd.

[2018-12-11 11:04:54.297603] E [MSGID: 106116] 
[glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Commit failed on 
am6sv0006sds.saipemnet.saipem.intranet. Please check log file for details.

[2018-12-11 11:04:54.350666] I [MSGID: 106143] 
[glusterd-pmap.c:278:pmap_registry_bind] 0-pmap: adding brick 
/ws/disk15/ws_brick on port 49164

[2018-12-11 11:05:01.137449] E [MSGID: 106123] 
[glusterd-mgmt.c:1519:glusterd_mgmt_v3_commit] 0-management: Commit failed on 
peers

[2018-12-11 11:05:01.137496] E [MSGID: 106123] 
[glusterd-replace-brick.c:660:glusterd_mgmt_v3_initiate_replace_brick_cmd_phases]
 0-management: Commit Op Failed

[2018-12-11 11:06:12.275867] I [MSGID: 106499] 
[glusterd-handler.c:4370:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume AM6_HyperScale

[2018-12-11 13:35:51.529365] I [MSGID: 106499] 
[glusterd-handler.c:4370:__glusterd_handle_status_volume] 0-management: 
Received status volume req for volume AM6_HyperScale



gluster volume replace-brick AM6_HyperScale 
am6sv0004sds.saipemnet.saipem.intranet:/ws/disk3/ws_brick 
am6sv0004sds.saipemnet.saipem.intranet:/ws/disk15/ws_brick commit force
Replace brick failure, brick [/ws/disk3], volume [AM6_HyperScale]

"gluster volume status" now shows a new disk active /ws/disk15

The replacement appears to be successful, looks like healing started

[cid:image001.png@01D4A84C.A15F8680]


Thanks and Regards,
--Anand
Legal Disclaimer
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. We may process
information in the email header of business emails sent and received by us
(including the names of recipient and sender, date and time of the email) for
the purposes of evaluating our existing or prospective business relationship.
The lawful basis we rely on for this processing is our legitimate interests. For
more information about how we use personal information please read our privacy
policy https://www.commvault.com/privacy-policy. Thank you."

Legal Disclaimer
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. We may process
information in the email header of business emails sent and received by us
(including the names of recipient and sender,