Re: [Gluster-users] Proposal: Changes in Gluster Community meetings

2019-05-21 Thread FNU Raghavendra Manjunath
Today's meeting will happen couple of hours from now. i.e. 1PM EST at (
https://bluejeans.com/486278655)

I am not able to see the meeting in my calendar. I am not sure whether this
is the case just for me or is it not visible to others as well.
Either way, I will be waiting at the above mentioned bluejeans link.

Regards,
Raghavendra

On Wed, May 1, 2019 at 8:37 AM Amar Tumballi Suryanarayan <
atumb...@redhat.com> wrote:

>
>
> On Tue, Apr 23, 2019 at 8:47 PM Darrell Budic 
> wrote:
>
>> I was one of the folk who wanted a NA/EMEA scheduled meeting, and I’m
>> going to have to miss it due to some real life issues (clogged sewer I’m
>> going to have to be dealing with at the time). Apologies, I’ll work on
>>  making the next one.
>>
>>
> No problem. We will continue to have these meetings every week (ie,
> bi-weekly in each timezone). Feel free to join when possible. We surely
> like to see more community participation for sure, but everyone would have
> their day jobs, so no pressure :-)
>
> -Amar
>
>
>>   -Darrell
>>
>> On Apr 22, 2019, at 4:20 PM, FNU Raghavendra Manjunath 
>> wrote:
>>
>>
>> Hi,
>>
>> This is the agenda for tomorrow's community meeting for NA/EMEA timezone.
>>
>> https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both
>> 
>>
>>
>>
>> On Thu, Apr 11, 2019 at 4:56 AM Amar Tumballi Suryanarayan <
>> atumb...@redhat.com> wrote:
>>
>>> Hi All,
>>>
>>> Below is the final details of our community meeting, and I will be
>>> sending invites to mailing list following this email. You can add Gluster
>>> Community Calendar so you can get notifications on the meetings.
>>>
>>> We are starting the meetings from next week. For the first meeting, we
>>> need 1 volunteer from users to discuss the use case / what went well, and
>>> what went bad, etc. preferrably in APAC region.  NA/EMEA region, next week.
>>>
>>> Draft Content: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g
>>> 
>>> Gluster Community Meeting
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Previous-Meeting-minutes>Previous
>>> Meeting minutes:
>>>
>>>- http://github.com/gluster/community
>>>
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#DateTime-Check-the-community-calendar>Date/Time:
>>> Check the community calendar
>>> <https://calendar.google.com/calendar/b/1?cid=dmViajVibDBrbnNiOWQwY205ZWg5cGJsaTRAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Bridge>Bridge
>>>
>>>- APAC friendly hours
>>>   - Bridge: https://bluejeans.com/836554017
>>>- NA/EMEA
>>>   - Bridge: https://bluejeans.com/486278655
>>>
>>> --
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Attendance>Attendance
>>>
>>>- Name, Company
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Host>Host
>>>
>>>- Who will host next meeting?
>>>   - Host will need to send out the agenda 24hr - 12hrs in advance
>>>   to mailing list, and also make sure to send the meeting minutes.
>>>   - Host will need to reach out to one user at least who can talk
>>>   about their usecase, their experience, and their needs.
>>>   - Host needs to send meeting minutes as PR to
>>>   http://github.com/gluster/community
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#User-stories>User stories
>>>
>>>- Discuss 1 usecase from a user.
>>>   - How was the architecture derived, what volume type used,
>>>   options, etc?
>>>   - What were the major issues faced ? How to improve them?
>>>   - What worked good?
>>>   - How can we all collaborate well, so it is win-win for the
>>>   community and the user? How can we
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Community>Community
>>>
>>>-
>>>
>>>Any release updates?
>>>-
>>>
>>>Blocker issues across the project?
>>>-
>>>
>>>Metrics
>>>- Number of new bugs since previous meeting. How many are not
>>>   triaged?
>>>   - Number of emails, anything unanswered?
>>>
>>> <https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both#Conferences--Meetups>Conferences
>>> / Meetups
>>>
>>>

Re: [Gluster-users] Meeting Details on footer of the gluster-devel and gluster-user mailing list

2019-05-07 Thread FNU Raghavendra Manjunath
+ 1 to this.

There is also one more thing. For some reason, the community meeting is not
visible in my calendar (especially NA region). I am not sure if anyone else
also facing this issue.

Regards,
Raghavendra

On Tue, May 7, 2019 at 5:19 AM Ashish Pandey  wrote:

> Hi,
>
> While we send a mail on gluster-devel or gluster-user mailing list,
> following content gets auto generated and placed at the end of mail.
>
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
> Gluster-devel mailing list
> gluster-de...@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-devel
>
> In the similar way, is it possible to attach meeting schedule and link at the 
> end of every such mails?
> Like this -
>
> Meeting schedule -
>
>
>- APAC friendly hours
>   - Tuesday 14th May 2019, 11:30AM IST
>   - Bridge: https://bluejeans.com/836554017
>   - NA/EMEA
>   - Tuesday 7th May 2019, 01:00 PM EDT
>   - Bridge: https://bluejeans.com/486278655
>
> Or just a link to meeting minutes details??
>  
> https://github.com/gluster/community/tree/master/meetings
>
> This will help developers and users of the community to know when and where 
> meeting happens and how to attend those meetings.
>
> ---
> Ashish
>
>
>
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Proposal: Changes in Gluster Community meetings

2019-04-22 Thread FNU Raghavendra Manjunath
Hi,

This is the agenda for tomorrow's community meeting for NA/EMEA timezone.

https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g?both




On Thu, Apr 11, 2019 at 4:56 AM Amar Tumballi Suryanarayan <
atumb...@redhat.com> wrote:

> Hi All,
>
> Below is the final details of our community meeting, and I will be sending
> invites to mailing list following this email. You can add Gluster Community
> Calendar so you can get notifications on the meetings.
>
> We are starting the meetings from next week. For the first meeting, we
> need 1 volunteer from users to discuss the use case / what went well, and
> what went bad, etc. preferrably in APAC region.  NA/EMEA region, next week.
>
> Draft Content: https://hackmd.io/OqZbh7gfQe6uvVUXUVKJ5g
> 
> Gluster Community Meeting
> Previous
> Meeting minutes:
>
>- http://github.com/gluster/community
>
>
> Date/Time:
> Check the community calendar
> 
> Bridge
>
>- APAC friendly hours
>   - Bridge: https://bluejeans.com/836554017
>- NA/EMEA
>   - Bridge: https://bluejeans.com/486278655
>
> --
> Attendance
>
>- Name, Company
>
> Host
>
>- Who will host next meeting?
>   - Host will need to send out the agenda 24hr - 12hrs in advance to
>   mailing list, and also make sure to send the meeting minutes.
>   - Host will need to reach out to one user at least who can talk
>   about their usecase, their experience, and their needs.
>   - Host needs to send meeting minutes as PR to
>   http://github.com/gluster/community
>
> User stories
>
>- Discuss 1 usecase from a user.
>   - How was the architecture derived, what volume type used, options,
>   etc?
>   - What were the major issues faced ? How to improve them?
>   - What worked good?
>   - How can we all collaborate well, so it is win-win for the
>   community and the user? How can we
>
> Community
>
>-
>
>Any release updates?
>-
>
>Blocker issues across the project?
>-
>
>Metrics
>- Number of new bugs since previous meeting. How many are not triaged?
>   - Number of emails, anything unanswered?
>
> Conferences
> / Meetups
>
>- Any conference in next 1 month where gluster-developers are going?
>gluster-users are going? So we can meet and discuss.
>
> Developer
> focus
>
>-
>
>Any design specs to discuss?
>-
>
>Metrics of the week?
>- Coverity
>   - Clang-Scan
>   - Number of patches from new developers.
>   - Did we increase test coverage?
>   - [Atin] Also talk about most frequent test failures in the CI and
>   carve out an AI to get them fixed.
>
> RoundTable
>
>- 
>
> 
>
> Regards,
> Amar
>
> On Mon, Mar 25, 2019 at 8:53 PM Amar Tumballi Suryanarayan <
> atumb...@redhat.com> wrote:
>
>> Thanks for the feedback Darrell,
>>
>> The new proposal is to have one in North America 'morning' time. (10AM
>> PST), And another in ASIA day time, which is evening 7pm/6pm in Australia,
>> 9pm Newzealand, 5pm Tokyo, 4pm Beijing.
>>
>> For example, if we choose Every other Tuesday for meeting, and 1st of the
>> month is Tuesday, we would have North America time for 1st, and on 15th it
>> would be ASIA/Pacific time.
>>
>> Hopefully, this way, we can cover all the timezones, and meeting minutes
>> would be committed to github repo, so that way, it will be easier for
>> everyone to be aware of what is happening.
>>
>> Regards,
>> Amar
>>
>> On Mon, Mar 25, 2019 at 8:40 PM Darrell Budic 
>> wrote:
>>
>>> As a user, I’d like to visit more of these, but the time slot is my 3AM.
>>> Any possibility for a rolling schedule (move meeting +6 hours each week
>>> with rolling attendance from maintainers?) or an occasional regional
>>> meeting 12 hours opposed to the one you’re proposing?
>>>
>>>   -Darrell
>>>
>>> On Mar 25, 2019, at 4:25 AM, Amar Tumballi Suryanarayan <
>>> atumb...@redhat.com> wrote:
>>>
>>> All,
>>>
>>> We currently have 3 meetings which are public:
>>>
>>> 1. Maintainer's Meeting
>>>
>>> - Runs once in 2 weeks (on Mondays), and current attendance is around
>>> 3-5 on an avg, and not much is discussed.
>>> - Without majority attendance, we can't take any decisions too.
>>>
>>> 2. Community meeting
>>>
>>> - 

Re: [Gluster-users] Questions and notes to "Simplify recovery steps of corrupted files"

2019-03-04 Thread FNU Raghavendra Manjunath
Hi David,

Doing full heal after deleting the gfid entries (and the bad copy) is fine.
It is not dangerous.

Regards,
Raghavendra

On Mon, Mar 4, 2019 at 9:44 AM David Spisla  wrote:

> Hello Gluster Community,
>
> I have questions and notes concerning the steps mentioned in
> https://github.com/gluster/glusterfs/issues/491
>
> " *2. Delete the corrupted files* ":
> In my experience there are two GFID files if a copy gets corrupted.
> Example:
>
>
>
> *$ find /gluster/brick1/glusterbrick/.glusterfs -name
> fc36e347-53c7-4a0a-8150-c070143d3b34/gluster/brick1/glusterbrick/.glusterfs/quarantine/fc36e347-53c7-4a0a-8150-c070143d3b34/gluster/brick1/glusterbrick/.glusterfs/fc/36/fc36e347-53c7-4a0a-8150-c070143d3b34*
>
> Both GFID files has to be deleted. If a copy is NOT corrupted, there seems
> to be no GFID file in
> *.glusterfs/quarantine .  *Even one executes scub ondemand, the file is
> not there. The file in *.glusterfs/quarantine* occurs if one executes
> "scrub status".
>
> " *3. Restore the file* ":
> One alternatively trigger self heal manually with
> *gluster vo heal VOLNAME*
> But in my experience this is not working. One have to trigger a full heal:
> *gluster vo heal VOLNAME* *full*
>
> Imagine, one will restore a copy with manual self heal. It is neccesary to
> set some VOLUME options (stat-prefetch, dht.force-readdirp and
> performance.force-readdirp disabled) and mount via FUSE with some special
> parameters to heal the file?
> In my experience I do only a full heal after deleting the bad copy and the
> GFID files.
> This seems to be working. Or it is dangerous?
>
> Regards
> David Spisla
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Corrupted File readable via FUSE?

2019-02-05 Thread FNU Raghavendra Manjunath
Hi David,

Do you have any bricks down? Can you please share the output of the
following commands and also the logs of the server and the client nodes?

1) gluster volume info
2) gluster volume status
3) gluster volume bitrot  scrub status

Few more questions

1) How many copies of the file were corrupted? (All? Or Just one?)

2 things I am trying to understand

A) IIUC, if only one copy is corrupted, then the replication module from
the gluster client should serve the data from the
remaining good copy
B) If all the copies were corrupted (or say more than quorum copies were
corrupted which means 2 in case of 3 way replication)
then there will be an error to the application. But the error to be
reported should 'Input/Output Error'. Not 'Transport endpoint not connected'
   'Transport endpoint not connected' error usually comes when a brick
where the operation is being directed to is not connected to the client.



Regards,
Raghavendra

On Mon, Feb 4, 2019 at 6:02 AM David Spisla  wrote:

> Hello Amar,
> sounds good. Until now this patch is only merged into master. I think it
> should be part of the next v5.x patch release!
>
> Regards
> David
>
> Am Mo., 4. Feb. 2019 um 09:58 Uhr schrieb Amar Tumballi Suryanarayan <
> atumb...@redhat.com>:
>
>> Hi David,
>>
>> I guess https://review.gluster.org/#/c/glusterfs/+/21996/ helps to fix
>> the issue. I will leave it to Raghavendra Bhat to reconfirm.
>>
>> Regards,
>> Amar
>>
>> On Fri, Feb 1, 2019 at 8:45 PM David Spisla  wrote:
>>
>>> Hello Gluster Community,
>>> I have got a 4 Node Cluster with a Replica 4 Volume, so each node has a
>>> brick with a copy of a file. Now I tried out the bitrot functionality and
>>> corrupt the copy on the brick of node1. After this I scrub ondemand and the
>>> file is marked correctly as corrupted.
>>>
>>> No I try to read that file from FUSE on node1 (with corrupt copy):
>>> $ cat file1.txt
>>> cat: file1.txt: Transport endpoint is not connected
>>> FUSE log says:
>>>
>>> *[2019-02-01 15:02:19.191984] E [MSGID: 114031]
>>> [client-rpc-fops_v2.c:281:client4_0_open_cbk] 0-archive1-client-0: remote
>>> operation failed. Path: /data/file1.txt
>>> (b432c1d6-ece2-42f2-8749-b11e058c4be3) [Input/output error]*
>>> [2019-02-01 15:02:19.192269] W [dict.c:761:dict_ref]
>>> (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329)
>>> [0x7fc642471329]
>>> -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5)
>>> [0x7fc642682af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58)
>>> [0x7fc64a78d218] ) 0-dict: dict is NULL [Invalid argument]
>>> [2019-02-01 15:02:19.192714] E [MSGID: 108009]
>>> [afr-open.c:220:afr_openfd_fix_open_cbk] 0-archive1-replicate-0: Failed to
>>> open /data/file1.txt on subvolume archive1-client-0 [Input/output error]
>>> *[2019-02-01 15:02:19.193009] W [fuse-bridge.c:2371:fuse_readv_cbk]
>>> 0-glusterfs-fuse: 147733: READ => -1
>>> gfid=b432c1d6-ece2-42f2-8749-b11e058c4be3 fd=0x7fc60408bbb8 (Transport
>>> endpoint is not connected)*
>>> [2019-02-01 15:02:19.193653] W [MSGID: 114028]
>>> [client-lk.c:347:delete_granted_locks_owner] 0-archive1-client-0: fdctx not
>>> valid [Invalid argument]
>>>
>>> And from FUSE on node2 (with heal copy):
>>> $ cat file1.txt
>>> file1
>>>
>>> It seems to be that node1 wants to get the file from its own brick, but
>>> the copy there is broken. Node2 gets the file from its own brick with a
>>> heal copy, so reading the file succeed.
>>> But I am wondering myself because sometimes reading the file from node1
>>> with the broken copy succeed
>>>
>>> What is the expected behaviour here? Is it possibly to read files with a
>>> corrupted copy from any client access?
>>>
>>> Regards
>>> David Spisla
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster snapshot & geo-replication

2018-11-20 Thread FNU Raghavendra Manjunath
Hi Marcus,

/var/log/glusterfs/snaps/urd-gds-volume/snapd.log is the log file of the
snapview daemon that is mainly used for user serviceable snapshots. Are you
using that feature? i.e. are you accessing the snapshots of your volume
from the main volume's mount point?

Few other information that might be helpful are:

1) output of "gluster volume info"
2) log files from the gluster nodes (/var/log/glusterfs)

Can you please provide the above information?

NOTE: If you are not using user serviceable snapshots feature, then you can
turn it off. This will stop the snapview daemon and thus prevent it's log
file from growing.
The command to turn off user serviceable snapshots is "gluster volume set
 features.uss disable"

Regards

On Fri, Nov 16, 2018 at 5:41 PM Marcus Pedersén 
wrote:

> Hi all,
>
> I am using CentOS 7 and Gluster version 4.1.3
>
>
> I am using thin LVM and creates snapshots once a day, of
> cause deleting the oldest ones after a while.
>
> Creating a snap fails every now and then with the following different
> errors:
> Error : Request timed out
>
> or
>
> failed: Brick ops failed on urd-gds-002. changelog notify failed
>
> (Where the server name are different hosts in the gluster cluster all the
> time)
>
>
> I have descovered that the log for snaps grows large, endlessly?
>
> The log:
>
> /var/log/glusterfs/snaps/urd-gds-volume/snapd.log
>
> I now of size 21G and continues to grow.
>
> I removed the file about 2 weeks ago and it was about the same size.
>
> Is this the way it should be?
>
> See a part of the log below.
>
>
>
>
> Second of all I have stopped the geo-replication as I never managed to
> make it work.
>
> Even when it is stopped and you try to pause geo-replication, you still
> get the respond:
>
> Geo-replication paused successfully.
>
> Should there be an error instead?
>
>
> Resuming gives an error:
>
> geo-replication command failed
> Geo-replication session between urd-gds-volume and 
> geouser@urd-gds-geo-001::urd-gds-volume
> is not Paused.
>
>
> This is related to bug 1547446
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1547446
>
> The fix should be present from 4.0 and onwards
>
> Should I report this in the same bug?
>
>
> Thanks alot!
>
>
> Best regards
>
> Marcus Pedersén
>
>
> /var/log/glusterfs/snaps/urd-gds-volume/snapd.log:
>
> [2018-11-13 18:51:16.498206] E
> [server-handshake.c:402:server_first_lookup] 0-urd-gds-volume-server: first
> lookup on subdir (/interbull/common) failed: Invalid argument
> [2018-11-13 18:51:16.498752] I [MSGID: 115036]
> [server.c:483:server_rpc_notify] 0-urd-gds-volume-server: disconnecting
> connection from
> iqn-A003.iqnet.org-2653-2018/08/14-18:53:49:637444-urd-gds-volume-snapd-client-0-1638773
> [2018-11-13 18:51:16.502120] I [MSGID: 101055]
> [client_t.c:444:gf_client_unref] 0-urd-gds-volume-server: Shutting down
> connection
> iqn-A003.iqnet.org-2653-2018/08/14-18:53:49:637444-urd-gds-volume-snapd-client-0-1638773
> [2018-11-13 18:51:16.589263] I [addr.c:55:compare_addr_and_update]
> 0-snapd-urd-gds-volume: allowed = "*", received addr = "192.168.67.118"
> [2018-11-13 18:51:16.589324] I [MSGID: 115029]
> [server-handshake.c:763:server_setvolume] 0-urd-gds-volume-server: accepted
> client from
> iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
> (version: 3.13.1)
> [2018-11-13 18:51:16.593003] E
> [server-handshake.c:385:server_first_lookup] 0-snapd-urd-gds-volume: lookup
> on root failed: Permission denied
> [2018-11-13 18:51:16.593177] E [server-handshake.c:342:do_path_lookup]
> 0-snapd-urd-gds-volume: first lookup on subdir (interbull) failed:
> Permission denied
> [2018-11-13 18:51:16.593206] E
> [server-handshake.c:402:server_first_lookup] 0-urd-gds-volume-server: first
> lookup on subdir (/interbull/home) failed: Invalid argument
> [2018-11-13 18:51:16.593678] I [MSGID: 115036]
> [server.c:483:server_rpc_notify] 0-urd-gds-volume-server: disconnecting
> connection from
> iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
> [2018-11-13 18:51:16.597201] I [MSGID: 101055]
> [client_t.c:444:gf_client_unref] 0-urd-gds-volume-server: Shutting down
> connection
> iqn-D001.iqnet.org-20166-2018/08/14-19:10:55:360137-urd-gds-volume-snapd-client-0-1638735
> [root@urd-gds-001 ~]# tail -n 100
> /var/log/glusterfs/snaps/urd-gds-volume/snapd.log
> [2018-11-13 18:52:09.782058] I [MSGID: 115036]
> [server.c:483:server_rpc_notify] 0-urd-gds-volume-server: disconnecting
> connection from
> iqn-A002.iqnet.org-24786-2018/08/14-18:39:54:890651-urd-gds-volume-snapd-client-0-1638767
> [2018-11-13 18:52:09.785473] I [MSGID: 101055]
> [client_t.c:444:gf_client_unref] 0-urd-gds-volume-server: Shutting down
> connection
> iqn-A002.iqnet.org-24786-2018/08/14-18:39:54:890651-urd-gds-volume-snapd-client-0-1638767
> [2018-11-13 18:52:09.821147] I [addr.c:55:compare_addr_and_update]
> 0-snapd-urd-gds-volume: allowed = "*", received addr = "192.168.67.115"
> [2018-11-13 

Re: [Gluster-users] Bitrot strange behavior

2018-04-18 Thread FNU Raghavendra Manjunath
Hi Cedric,

The 120 seconds is given to allow a window for things to settle. i.e.
imagine the following situation

1) open file (fd1 as file descriptor)
2) modify the file via fd1
3) close the file descriptor (fd1)
4) Again open the file (fd2)
5) modify

In the above set of operations, by the time bitrot daemon tries to
calculate the signature after 1st fd (fd1) is closed, active IO could be
happening again on the new file descriptor (fd2). And The signature
calculated might not be correct while active IO is happening.
So in gluster bitrot daemon waits for 120 seconds to sign the file after
all the file descriptors associated with that file are closed.

So with 120 seconds time what happens is, once all the file descriptors
associated with a file are closed (by the application), then a notification
is sent to bitrot daemon that a object (file to be precise with details
about that file) is modified. When all the file descriptors of a file are
closed a operation called "release" is received by the brick. So the brick
process sends a notification to bitrot daemon about a object (i.e. file)
when release operation is received on that file (means all the file
descriptors are closed). And the bitrot daemon waits for 120 seconds after
receiving the notice. And  before the file is signed (i.e. within the 120
seconds of wait time), if someone again opens it and modifies it, the brick
process will let the bit rot daemon know about it so that bitrot daemon
wont attempt to sign the file (as it is actively being modified).

The above value is configurable. And can be changed to some other value.
You can use the below command to change it to a different value

"gluster volume set  features.expiry-time "

But as you said, currently the comparison of the signature by the scrubber
is local. i.e. while scrubbing, it calculates the checksum of the file,
compares with the stored checksum (as a extended attribute) to determine
whether the object is corrupted or not.
So yes, if the object is corrupted before the signing happens, then as of
now the scrubber does not have the mechanism to know that.

Regards,
Raghavendra


On Wed, Apr 18, 2018 at 2:20 PM, Cedric Lemarchand 
wrote:

> Hi Sweta,
>
> Thanks, this drive me some more questions:
>
> 1. What is the reason of delaying signature creation ?
>
> 2. As a same file (replicated or dispersed) having different signature
> thought bricks is by definition an error, it would be good to triggered it
> during a scrub, or with a different tool. Is something like this planned ?
>
> Cheers
>
> —
> Cédric Lemarchand
>
> On 18 Apr 2018, at 07:53, Sweta Anandpara  wrote:
>
> Hi Cedric,
>
> Any file is picked up for signing by the bitd process after the
> predetermined wait of 120 seconds. This default value is captured in the
> volume option 'features.expiry-time' and is configurable - in your case, it
> can be set to 0 or 1.
>
> Point 2 is correct. A file corrupted before the bitrot signature is
> generated will not be successfully detected by the scrubber. That would
> require admin/manual intervention to explicitly heal the corrupted file.
>
> -Sweta
>
> On 04/16/2018 10:42 PM, Cedric Lemarchand wrote:
>
> Hello,
>
> I am playing around with the bitrot feature and have some questions:
>
> 1. when a file is created, the "trusted.bit-rot.signature” attribute
> seems only created approximatively 120 seconds after its creations
> (the cluster is idle and there is only one file living on it). Why ?
> Is there a way to make this attribute generated at the same time of
> the file creation ?
>
> 2. corrupting a file (adding a 0 locally on a brick) before the
> creation of the "trusted.bit-rot.signature” do not provide any
> warning: its signature is different than the 2 others copies on other
> bricks. Starting a scrub did not show up anything. I would think that
> Gluster compares signature between bricks for this particular use
> cases, but it seems the check is only local, so a file corrupted
> before it’s bitrot signature creation stay corrupted, and thus could
> be served to clients whith bad data ?
>
> Gluster 3.12.8 on Debian Stretch, bricks on ext4.
>
> Volume Name: vol1
> Type: Replicate
> Volume ID: 85ccfaf2-5793-46f2-bd20-3f823b0a2232
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster-01:/data/brick1
> Brick2: gluster-02:/data/brick2
> Brick3: gluster-03:/data/brick3
> Options Reconfigured:
> storage.build-pgfid: on
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> features.bitrot: on
> features.scrub: Active
> features.scrub-throttle: aggressive
> features.scrub-freq: hourly
>
> Cheers,
>
> Cédric
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
> ___
> Gluster-users 

Re: [Gluster-users] Bitrot - Restoring bad file

2018-04-18 Thread FNU Raghavendra Manjunath
Hi,

Patch [1] has been sent for review. The path also prints the brick to which
the corrupted object (file) belongs to.

With the patch the output of the scrub status command looks like this.

"
# gluster volume bitrot repl scrub status

Volume name : repl

State of scrub: Active (Idle)

Scrub impact: lazy

Scrub frequency: biweekly

Bitrot error log location: /var/log/glusterfs/bitd.log

Scrubber error log location: /var/log/glusterfs/scrub.log


=

Node: localhost

Number of Scrubbed files: 0

Number of Skipped files: 0

Last completed scrub time: Scrubber pending to complete.

Duration of last scrub (D:M:H:M:S): 0:0:0:0

Error count: 2

Corrupted object's [GFID]:

0f9818b8-b762-4e5b-b3c9-bdd53b5fb1cb ==> BRICK: /export1/repl

7ce298fb-290f-4c2b-abaf-1dd0fca9bbb1 ==> BRICK: /export2/repl

=
"

Still some effort is needed to find the file to which the gfid (shown in
the corrupted objects field of scrub status command) belongs to.
But in situations where there are multiple bricks running on the same node
for a particular volume, then checking in all the bricks is not
needed.

Please check whether it is something which makes things better compared to
now.

[1] https://review.gluster.org/#/c/19901/1

On Tue, Apr 17, 2018 at 11:59 PM, Aravinda  wrote:

> On 04/17/2018 06:25 PM, Omar Kohl wrote:
>
>> Hi,
>>
>> I have a question regarding bitrot detection.
>>
>> Following the RedHat manual (https://access.redhat.com/doc
>> umentation/en-us/red_hat_gluster_storage/3.3/html/administra
>> tion_guide/bitrot-restore_corrupt_file) I am trying out
>> bad-file-restoration after bitrot.
>>
>> "gluster volume bitrot VOLNAME status" gets me the GFIDs that are corrupt
>> and on which Host this happens.
>>
>> As far as I can tell the preferred way of finding out what file maps to
>> that GFID is using "getfattr" (assuming all Gluster and mount options were
>> set as described in the link).
>>
>> My problem is that "getfattr" does not tell me what Brick contains the
>> corrupt file. It only gives me the path according to the FUSE mount. So how
>> do I find out what brick the file is on?
>>
>> If we assume that every brick is on a distinct host then we have no
>> problem because "bitrot status" gave us the hostname. So we can infer what
>> brick is meant. But in general you can't assume there are not several
>> bricks per host, right?
>>
>> With "find" (as described in the link above) it is possible to find the
>> correct brick. But the command is possibly expensive and I get the feeling
>> that "getfattr" is the recommended way.
>>
>> Any thoughts?
>>
>
> Another getfattr can give brick details where that file is residing
>
> getfattr -n glusterfs.pathinfo 
>
> where path can be /.gfid/ or absolute path of file.
>
>
>> Thanks!
>> Omar
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
> --
> regards
> Aravinda VK
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] pausing scrub crashed scrub daemon on nodes

2017-09-13 Thread FNU Raghavendra Manjunath
Hi Amudhan,

Replies inline.

On Fri, Sep 8, 2017 at 6:37 AM, Amudhan P  wrote:

> Hi,
>
> I am using glusterfs 3.10.1 with 30 nodes each with 36 bricks and 10 nodes
> each with 16 bricks in a single cluster.
>
> By default I have paused scrub process to have it run manually. for the
> first time, i was trying to run scrub-on-demand and it was running fine,
> but after some time, i decided to pause scrub process due to high CPU
> usage and user reporting folder listing taking time.
> But scrub pause resulted below message in some of the nodes.
> Also, i can see that scrub daemon is not showing in volume status for some
> nodes.
>
> Error msg type 1
> --
>
> [2017-09-01 10:04:45.840248] I [bit-rot.c:1683:notify]
> 0-glustervol-bit-rot-0: BitRot scrub ondemand called
> [2017-09-01 10:05:05.094948] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2017-09-01 10:05:06.401792] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2017-09-01 10:05:07.544524] I [MSGID: 118035] 
> [bit-rot-scrub.c:1297:br_scrubber_scale_up]
> 0-glustervol-bit-rot-0: Scaling up scrubbe
> rs [0 => 36]
> [2017-09-01 10:05:07.552893] I [MSGID: 118048] 
> [bit-rot-scrub.c:1547:br_scrubber_log_option]
> 0-glustervol-bit-rot-0: SCRUB TUNABLES::
>  [Frequency: biweekly, Throttle: lazy]
> [2017-09-01 10:05:07.552942] I [MSGID: 118038] 
> [bit-rot-scrub.c:948:br_fsscan_schedule]
> 0-glustervol-bit-rot-0: Scrubbing is schedule
> d to run at 2017-09-15 10:05:07
> [2017-09-01 10:05:07.553457] I [glusterfsd-mgmt.c:1778:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2017-09-01 10:05:20.953815] I [bit-rot.c:1683:notify]
> 0-glustervol-bit-rot-0: BitRot scrub ondemand called
> [2017-09-01 10:05:20.953845] I [MSGID: 118038] 
> [bit-rot-scrub.c:1085:br_fsscan_ondemand]
> 0-glustervol-bit-rot-0: Ondemand Scrubbing s
> cheduled to run at 2017-09-01 10:05:21
> [2017-09-01 10:05:22.216937] I [MSGID: 118044] 
> [bit-rot-scrub.c:615:br_scrubber_log_time]
> 0-glustervol-bit-rot-0: Scrubbing started a
> t 2017-09-01 10:05:22
> [2017-09-01 10:05:22.306307] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2017-09-01 10:05:24.684900] I [glusterfsd-mgmt.c:1778:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2017-09-06 08:37:26.422267] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2017-09-06 08:37:28.351821] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec]
> 0-mgmt: Volume file changed
> [2017-09-06 08:37:30.350786] I [MSGID: 118034] 
> [bit-rot-scrub.c:1342:br_scrubber_scale_down]
> 0-glustervol-bit-rot-0: Scaling down scr
> ubbers [36 => 0]
> pending frames:
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> frame : type(0) op(0)
> patchset: git://git.gluster.org/glusterfs.git
> signal received: 11
> time of crash:
> 2017-09-06 08:37:30
> configuration details:
> argp 1
> backtrace 1
> dlfcn 1
> libpthread 1
> llistxattr 1
> setfsid 1
> spinlock 1
> epoll.h 1
> xattr.h 1
> st_atim.tv_nsec 1
> package-string: glusterfs 3.10.1
> /usr/lib/libglusterfs.so.0(_gf_msg_backtrace_nomem+0x78)[0x7fda0ab0b4f8]
> /usr/lib/libglusterfs.so.0(gf_print_trace+0x324)[0x7fda0ab14914]
> /lib/x86_64-linux-gnu/libc.so.6(+0x36d40)[0x7fda09ef9d40]
> /usr/lib/libglusterfs.so.0(syncop_readv_cbk+0x17)[0x7fda0ab429e7]
> /usr/lib/glusterfs/3.10.1/xlator/protocol/client.so(+
> 0x2db4b)[0x7fda04986b4b]
> /usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x90)[0x7fda0a8d5490]
> /usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x1e7)[0x7fda0a8d5777]
> /usr/lib/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fda0a8d17d3]
> /usr/lib/glusterfs/3.10.1/rpc-transport/socket.so(+0x7194)[0x7fda05826194]
> /usr/lib/glusterfs/3.10.1/rpc-transport/socket.so(+0x9635)[0x7fda05828635]
> /usr/lib/libglusterfs.so.0(+0x83db0)[0x7fda0ab64db0]
> /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7fda0a290182]
> /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fda09fbd47d]
> --
>
> Error msg type 2
>
> [2017-09-01 10:01:20.387248] I [MSGID: 118035] 
> [bit-rot-scrub.c:1297:br_scrubber_scale_up]
> 0-glustervol-bit-rot-0: Scaling up scrubbe
> rs [0 => 36]
> [2017-09-01 10:01:20.392544] I [MSGID: 118048] 
> [bit-rot-scrub.c:1547:br_scrubber_log_option]
> 0-glustervol-bit-rot-0: SCRUB TUNABLES::
>  [Frequency: biweekly, Throttle: lazy]
> [2017-09-01 10:01:20.392571] I [MSGID: 118038] 
> 

Re: [Gluster-users] [Gluster-devel] Fuse client hangs on doing multithreading IO tests

2016-06-24 Thread FNU Raghavendra Manjunath
Hi,

Any idea how big were the files that were being read?

Can you please attach the logs from all the gluster server and client
nodes? (the logs can be found in /var/log/glusterfs)

Also please provide the /var/log/messages from all the server and client
nodes.

Regards,
Raghavendra


On Fri, Jun 24, 2016 at 10:32 AM, 冷波  wrote:

> Hi,
>
>
> We found a problem when doing traffic tests. We created a replicated
> volume with two storage nodes (CentOS 6.5). There was one FUSE client
> (CentOS 6.7) which did multi-threading reads and writes. Most of IOs are
> reads for big files. All machines used 10Gbe NICs. And the typical read
> throught was 4-6Gbps (0.5-1.5GB/s).
>
>
> After the test ran several minutes, the test program hung. The throughput
> suddenly dropped to zero. Then there was no traffic any more. If we ran df,
> df would hang, too. But we could still read or write the volume from other
> clients.
>
>
> We tried several GlusterFS version from 3.7.5 to 3.8.0. Each version had
> this problem. We also tried to restore default GlusterFS options, but the
> problem still existed.
>
>
> The GlusterFS version was 3.7.11 for the following stacks.
>
>
> This was the stack of dd when hanging:
>
> [] wait_answer_interruptible+0x81/0xc0 [fuse]
>
> [] __fuse_request_send+0x1db/0x2b0 [fuse]
>
> [] fuse_request_send+0x12/0x20 [fuse]
>
> [] fuse_statfs+0xda/0x150 [fuse]
>
> [] statfs_by_dentry+0x74/0xa0
>
> [] vfs_statfs+0x1b/0xb0
>
> [] user_statfs+0x47/0xb0
>
> [] sys_statfs+0x2a/0x50
>
> [] system_call_fastpath+0x16/0x1b
>
> [] 0x
>
>
> This was the stack of gluster:
>
> [] futex_wait_queue_me+0xba/0xf0
>
> [] futex_wait+0x1c0/0x310
>
> [] do_futex+0x121/0xae0
>
> [] sys_futex+0x7b/0x170
>
> [] system_call_fastpath+0x16/0x1b
>
> [] 0x
>
>
> This was the stack of the test program:
>
> [] hrtimer_nanosleep+0xc4/0x180
>
> [] sys_nanosleep+0x6e/0x80
>
> [] system_call_fastpath+0x16/0x1b
>
> [] 0x
>
>
> Any clue?
>
> Thanks,
> Paul
>
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] self service snapshot access broken with 3.7.11

2016-04-26 Thread FNU Raghavendra Manjunath
Hi,

Thanks for the snapd log. Can you please attach all the gluster logs? i.e.
the contents of /var/log/glusterfs.

Regards,
Raghavendra


On Mon, Apr 25, 2016 at 11:27 AM, Alastair Neil <ajneil.t...@gmail.com>
wrote:

> attached compressed log
>
> On 22 April 2016 at 20:15, FNU Raghavendra Manjunath <rab...@redhat.com>
> wrote:
>
>>
>> Hi Alastair,
>>
>> Can you please provide the snap daemon logs. It is present in
>> /var/log/glusterfs/snaps/snapd.log.
>>
>> Provide the snapd logs of the node from which you have mounted the volume
>> (i.e. the node whose ip address/hostname you have given while mounting the
>> volume).
>>
>> Regards,
>> Raghavendra
>>
>>
>>
>> On Fri, Apr 22, 2016 at 5:19 PM, Alastair Neil <ajneil.t...@gmail.com>
>> wrote:
>>
>>> I just upgraded my cluster to 3.7.11 from 3.7.10 and access to the
>>> .snaps directories now fail with
>>>
>>> bash: cd: .snaps: Transport endpoint is not connected
>>>
>>>
>>> in the volume log file on the client I see:
>>>
>>> 016-04-22 21:08:28.005854] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
>>>> 2-homes-snapd-client: changing port to 49493 (from 0)
>>>> [2016-04-22 21:08:28.009558] E [socket.c:2278:socket_connect_finish]
>>>> 2-homes-snapd-client: connection to xx.xx.xx.xx.xx:49493 failed (No route
>>>> to host)
>>>
>>>
>>> I'm quite perplexed, now it's not a network issue or DNS as far as I can
>>> tell, the glusterfs client is working fine, and the gluster servers all
>>> resolve ok.  It seems to be happening on all the clients I have tried
>>> different systems with 3.7.8, 3.7.10, and 3.7.11 version clients and see
>>> the same failure on all of them.
>>>
>>> On the servers the snapshots are being taken as expected and they are
>>> started:
>>>
>>> Snapshot  :
>>>> Scheduled-Homes_Hourly-homes_GMT-2016.04.22-16.00.01
>>>> Snap UUID : 91ba50b0-d8f2-4135-9ea5-edfdfe2ce61d
>>>> Created   : 2016-04-22 16:00:01
>>>> Snap Volumes:
>>>> Snap Volume Name  : 5170144102814026a34f8f948738406f
>>>> Origin Volume name: homes
>>>> Snaps taken for homes  : 16
>>>> Snaps available for homes  : 240
>>>> Status: Started
>>>
>>>
>>>
>>> the homes volume is replica 3 all the peers are up and so are all the
>>> bricks and services:
>>>
>>> glv status homes
>>>> Status of volume: homes
>>>> Gluster process TCP Port  RDMA Port  Online
>>>>  Pid
>>>>
>>>> --
>>>> Brick gluster-2:/export/brick2/home 49171 0  Y
>>>>   38298
>>>> Brick gluster0:/export/brick2/home  49154 0  Y
>>>>   23519
>>>> Brick gluster1.vsnet.gmu.edu:/export/brick2
>>>> /home   49154 0  Y
>>>>   23794
>>>> Snapshot Daemon on localhost49486 0  Y
>>>>   23699
>>>> NFS Server on localhost 2049  0  Y
>>>>   23486
>>>> Self-heal Daemon on localhost   N/A   N/AY
>>>>   23496
>>>> Snapshot Daemon on gluster-249261 0  Y
>>>>   38479
>>>> NFS Server on gluster-2 2049  0  Y
>>>>   39640
>>>> Self-heal Daemon on gluster-2   N/A   N/AY
>>>>   39709
>>>> Snapshot Daemon on gluster1 49480 0  Y
>>>>   23982
>>>> NFS Server on gluster1  2049  0  Y
>>>>   23766
>>>> Self-heal Daemon on gluster1N/A   N/AY
>>>>   23776
>>>>
>>>> Task Status of Volume homes
>>>>
>>>> --
>>>> There are no active volume tasks
>>>
>>>
>>> I'd appreciate any ideas about troubleshooting this.  I tried disable
>>> .snaps access on the volume and re-enabling it but is made no difference.
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] gluster 3.7.11 qemu+libgfapi problem

2016-04-26 Thread FNU Raghavendra Manjunath
Hi,

Can you please check if glusterd on the node "192.168.22.28
" is ruuning?

"service glusterd status" or "ps aux | grep glusterd".

Regards,
Raghavendra


On Tue, Apr 26, 2016 at 7:26 AM, Dmitry Melekhov  wrote:

> Hello!
>
> I have 3 servers setup- centos 7 and gluster 3.7.11
> and don't know did it work with previous versions or not...
>
>
> Volume is replicated 3.
>
> If I shutdown port on switch for one of nodes, then qemu can't start,
> because it can't connects to gluster:
>
>
>
> [2016-04-26 10:51:53.881654] I [MSGID: 114057]
> [client-handshake.c:1437:select_server_supported_programs] 0-pool-client-7:
> Using Program GlusterFS 3.3, Num (1298437), Version (330)
> [2016-04-26 10:51:53.882271] I [MSGID: 114046]
> [client-handshake.c:1213:client_setvolume_cbk] 0-pool-client-7: Connected
> to pool-client-7, attached to remote volume '/wall/pool/brick'.
> [2016-04-26 10:51:53.882299] I [MSGID: 114047]
> [client-handshake.c:1224:client_setvolume_cbk] 0-pool-client-7: Server and
> Client lk-version numbers are not same, reopening the fds
> [2016-04-26 10:51:53.882620] I [MSGID: 114035]
> [client-handshake.c:193:client_set_lk_version_cbk] 0-pool-client-7: Server
> lk version = 1
> [2016-04-26 10:51:55.373983] E [socket.c:2279:socket_connect_finish]
> 0-pool-client-8: connection to 192.168.22.28:24007 failed (No route to
> host)
> [2016-04-26 10:51:55.416522] I [MSGID: 108031]
> [afr-common.c:1900:afr_local_discovery_cbk] 0-pool-replicate-0: selecting
> local read_child pool-client-6
> [2016-04-26 10:51:55.416919] I [MSGID: 104041]
> [glfs-resolve.c:869:__glfs_active_subvol] 0-pool: switched to graph
> 66617468-6572-2d35-3334-372d32303136 (0)
> qemu: terminating on signal 15 from pid 9767
> [2016-04-26 10:53:36.418693] I [MSGID: 114021] [client.c:2115:notify]
> 0-pool-client-6: current graph is no longer active, destroying rpc_client
> [2016-04-26 10:53:36.418802] I [MSGID: 114021] [client.c:2115:notify]
> 0-pool-client-7: current graph is no longer active, destroying rpc_client
> [2016-04-26 10:53:36.418840] I [MSGID: 114021] [client.c:2115:notify]
> 0-pool-client-8: current graph is no longer active, destroying rpc_client
> [2016-04-26 10:53:36.418870] I [MSGID: 114018]
> [client.c:2030:client_rpc_notify] 0-pool-client-6: disconnected from
> pool-client-6. Client process will keep trying to connect to glusterd until
> brick's port is avai
> lable
> [2016-04-26 10:53:36.418880] I [MSGID: 114018]
> [client.c:2030:client_rpc_notify] 0-pool-client-7: disconnected from
> pool-client-7. Client process will keep trying to connect to glusterd until
> brick's port is avai
> lable
> [2016-04-26 10:53:36.418949] W [MSGID: 108001]
> [afr-common.c:4090:afr_notify] 0-pool-replicate-0: Client-quorum is not met
> [2016-04-26 10:53:36.419002] E [MSGID: 108006]
> [afr-common.c:4046:afr_notify] 0-pool-replicate-0: All subvolumes are down.
> Going offline until atleast one of them comes back up.
>
>
> 192.168.22.28 is node , which is not available.
>
> I don't see any errors in bricks logs, only
> [2016-04-26 10:53:41.807032] I [dict.c:473:dict_get]
> (-->/lib64/libglusterfs.so.0(default_getxattr_cbk+0xac) [0x7f7405415cbc]
> -->/usr/lib64/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)
> [0x
> 7f73f59da917] -->/lib64/libglusterfs.so.0(dict_get+0xac) [0x7f74054060fc]
> ) 0-dict: !this || key=() [Invalid argument]
>
> But I guess it is not related.
>
>
> Could you tell me what can cause this problem ?
>
> Thank you!
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] self service snapshot access broken with 3.7.11

2016-04-22 Thread FNU Raghavendra Manjunath
Hi Alastair,

Can you please provide the snap daemon logs. It is present in
/var/log/glusterfs/snaps/snapd.log.

Provide the snapd logs of the node from which you have mounted the volume
(i.e. the node whose ip address/hostname you have given while mounting the
volume).

Regards,
Raghavendra



On Fri, Apr 22, 2016 at 5:19 PM, Alastair Neil 
wrote:

> I just upgraded my cluster to 3.7.11 from 3.7.10 and access to the .snaps
> directories now fail with
>
> bash: cd: .snaps: Transport endpoint is not connected
>
>
> in the volume log file on the client I see:
>
> 016-04-22 21:08:28.005854] I [rpc-clnt.c:1847:rpc_clnt_reconfig]
>> 2-homes-snapd-client: changing port to 49493 (from 0)
>> [2016-04-22 21:08:28.009558] E [socket.c:2278:socket_connect_finish]
>> 2-homes-snapd-client: connection to xx.xx.xx.xx.xx:49493 failed (No route
>> to host)
>
>
> I'm quite perplexed, now it's not a network issue or DNS as far as I can
> tell, the glusterfs client is working fine, and the gluster servers all
> resolve ok.  It seems to be happening on all the clients I have tried
> different systems with 3.7.8, 3.7.10, and 3.7.11 version clients and see
> the same failure on all of them.
>
> On the servers the snapshots are being taken as expected and they are
> started:
>
> Snapshot  :
>> Scheduled-Homes_Hourly-homes_GMT-2016.04.22-16.00.01
>> Snap UUID : 91ba50b0-d8f2-4135-9ea5-edfdfe2ce61d
>> Created   : 2016-04-22 16:00:01
>> Snap Volumes:
>> Snap Volume Name  : 5170144102814026a34f8f948738406f
>> Origin Volume name: homes
>> Snaps taken for homes  : 16
>> Snaps available for homes  : 240
>> Status: Started
>
>
>
> the homes volume is replica 3 all the peers are up and so are all the
> bricks and services:
>
> glv status homes
>> Status of volume: homes
>> Gluster process TCP Port  RDMA Port  Online
>>  Pid
>>
>> --
>> Brick gluster-2:/export/brick2/home 49171 0  Y
>> 38298
>> Brick gluster0:/export/brick2/home  49154 0  Y
>> 23519
>> Brick gluster1.vsnet.gmu.edu:/export/brick2
>> /home   49154 0  Y
>> 23794
>> Snapshot Daemon on localhost49486 0  Y
>> 23699
>> NFS Server on localhost 2049  0  Y
>> 23486
>> Self-heal Daemon on localhost   N/A   N/AY
>> 23496
>> Snapshot Daemon on gluster-249261 0  Y
>> 38479
>> NFS Server on gluster-2 2049  0  Y
>> 39640
>> Self-heal Daemon on gluster-2   N/A   N/AY
>> 39709
>> Snapshot Daemon on gluster1 49480 0  Y
>> 23982
>> NFS Server on gluster1  2049  0  Y
>> 23766
>> Self-heal Daemon on gluster1N/A   N/AY
>> 23776
>>
>> Task Status of Volume homes
>>
>> --
>> There are no active volume tasks
>
>
> I'd appreciate any ideas about troubleshooting this.  I tried disable
> .snaps access on the volume and re-enabling it but is made no difference.
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How dynamic are volume options?

2016-04-21 Thread FNU Raghavendra Manjunath
Yes. They should be picked by currently running heals.

Regards,
Raghavendra

On Thu, Apr 21, 2016 at 9:03 AM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> On 21/04/2016 10:44 PM, FNU Raghavendra Manjunath wrote:
>
>> Volume set operations does not require a volume restart. The changes will
>> be recongnised automatically.
>>
>> If you are specifically referring to the options you have mentioned, then
>> it applies to them as well. If you set those options, volume restart is not
>> needed.
>>
>
> Thanks Raghavendra, do you know if they will be picked up by currently
> running heals?
>
> --
> Lindsay Mathieson
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] How dynamic are volume options?

2016-04-21 Thread FNU Raghavendra Manjunath
HI Lindsay,

Volume set operations does not require a volume restart. The changes will
be recongnised automatically.

If you are specifically referring to the options you have mentioned, then
it applies to them as well. If you set those options, volume restart is not
needed.

Regards,
Raghavendra


On Thu, Apr 21, 2016 at 8:36 AM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> As per the subject - will a change be recognised while the volume is
> running or does it have to be restarted?
>
> thinking of attributes like:
>
> cluster.background-self-heal-count
> cluster.heal-wait-queue-length
> cluster.self-heal-window-size
>
> thanks,
>
> --
> Lindsay Mathieson
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] glusterfs-3.6.9 released

2016-03-04 Thread FNU Raghavendra Manjunath
Hi,
glusterfs-3.6.9 has been released and the packages for RHEL/Fedora/Centos
can be found here.http://download.gluster.org/pub/gluster/glusterfs/3.6/LATEST/

Requesting people running 3.6.x to please try it out and let us know if
there are any issues.

This release supposedly fixes the bugs listed below since 3.6.8 was made
available. Thanks to all who submitted patches, reviewed the changes.

1302541 - Problem when enabling quota : Could not start quota auxiliary mount
1302310 - log improvements:- enabling quota on a volume reports
numerous entries of "contribution node
list is empty which is an error" in brick logs

1308806 - tests : Modifying tests for crypt xlator

1304668 - Add missing release-notes on the 3.6 branch
1296931 - Installation of glusterfs-3.6.8 fails on CentOS-7


Regards,

Raghavendra Bhat
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fail of one brick lead to crash VMs

2016-02-09 Thread FNU Raghavendra Manjunath
Hi Dominique,

Thanks for the logs. I will go through the logs. I have also CCed Pranith
who is the maintainer of the replicate feature.


Regards,
Raghavendra


On Tue, Feb 9, 2016 at 11:45 AM, Dominique Roux <dominique.r...@ungleich.ch>
wrote:

> Logs are attached
>
> For claryfication:
> vmhost1-cluster1 -> Brick 1
> vmhost2-cluster2 -> Brick 2
> entrance -> Peer
>
> Time of testing (31.01.2016 16:13)
>
> Thanks for your help
>
> Regards,
> Dominique
>
>
> Werde Teil des modernen Arbeitens im Glarnerland auf www.digitalglarus.ch!
> Lese Neuigkeiten auf Twitter: www.twitter.com/DigitalGlarus
> Diskutiere mit auf Facebook:  www.facebook.com/digitalglarus
>
> On 02/08/2016 04:40 PM, FNU Raghavendra Manjunath wrote:
> > + Pranith
> >
> > In the meantime, can you please provide the logs of all the gluster
> > server machines  and the client machines?
> >
> > Logs can be found in /var/log/glusterfs directory.
> >
> > Regards,
> > Raghavendra
> >
> > On Mon, Feb 8, 2016 at 9:20 AM, Dominique Roux
> > <dominique.r...@ungleich.ch <mailto:dominique.r...@ungleich.ch>> wrote:
> >
> > Hi guys,
> >
> > I faced a problem a week ago.
> > In our environment we have three servers in a quorum. The gluster
> volume
> > is spreaded over two bricks and has the type replicated.
> >
> > We now, for simulating a fail of one brick, isolated one of the two
> > bricks with iptables, so that communication to the other two peers
> > wasn't possible anymore.
> > After that VMs (opennebula) which had I/O in this time crashed.
> > We stopped the glusterfsd hard (kill -9) and restarted it, what made
> > things work again (Certainly we also had to restart the failed VMs).
> But
> > I think this shouldn't happen. Since quorum was not reached (2/3
> hosts
> > were still up and connected).
> >
> > Here some infos of our system:
> > OS: CentOS Linux release 7.1.1503
> > Glusterfs version: glusterfs 3.7.3
> >
> > gluster volume info:
> >
> > Volume Name: cluster1
> > Type: Replicate
> > Volume ID:
> > Status: Started
> > Number of Bricks: 1 x 2 = 2
> > Transport-type: tcp
> > Bricks:
> > Brick1: srv01:/home/gluster
> > Brick2: srv02:/home/gluster
> > Options Reconfigured:
> > cluster.self-heal-daemon: enable
> > cluster.server-quorum-type: server
> > network.remote-dio: enable
> > cluster.eager-lock: enable
> > performance.stat-prefetch: on
> > performance.io-cache: off
> > performance.read-ahead: off
> > performance.quick-read: off
> > server.allow-insecure: on
> > nfs.disable: 1
> >
> > Hope you can help us.
> >
> > Thanks a lot.
> >
> > Best regards
> > Dominique
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
> > http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sparse files and heal full bug fix backport to 3.6.x

2016-02-09 Thread FNU Raghavendra Manjunath
Adding Pranith, maintainer of the replicate feature.


Regards,
Raghavendra


On Tue, Feb 9, 2016 at 3:33 PM, Steve Dainard  wrote:

> There is a thread from 2014 mentioning that the heal process on a
> replica volume was de-sparsing sparse files.(1)
>
> I've been experiencing the same issue on Gluster 3.6.x. I see there is
> a bug closed for a fix on Gluster 3.7 (2) and I'm wondering if this
> fix can be back-ported to Gluster 3.6.x?
>
> My experience has been:
> Replica 3 volume
> 1 brick went offline
> Brought brick back online
> Heal full on volume
> My 500G vm-storage volume went from ~280G used to >400G used.
>
> I've experienced this a couple times previously, and used fallocate to
> re-sparse files but this is cumbersome at best, and lack of proper
> heal support on sparse files could be disastrous if I didn't have
> enough free space and ended up crashing my VM's when my storage domain
> ran out of space.
>
> Seeing as 3.6 is still a supported release, and 3.7 feels too bleeding
> edge for production systems, I think it makes sense to back-port this
> fix if possible.
>
> Thanks,
> Steve
>
>
>
> 1.
> https://www.gluster.org/pipermail/gluster-users/2014-November/019512.html
> 2. https://bugzilla.redhat.com/show_bug.cgi?id=1166020
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Fail of one brick lead to crash VMs

2016-02-08 Thread FNU Raghavendra Manjunath
+ Pranith

In the meantime, can you please provide the logs of all the gluster server
machines  and the client machines?

Logs can be found in /var/log/glusterfs directory.

Regards,
Raghavendra

On Mon, Feb 8, 2016 at 9:20 AM, Dominique Roux 
wrote:

> Hi guys,
>
> I faced a problem a week ago.
> In our environment we have three servers in a quorum. The gluster volume
> is spreaded over two bricks and has the type replicated.
>
> We now, for simulating a fail of one brick, isolated one of the two
> bricks with iptables, so that communication to the other two peers
> wasn't possible anymore.
> After that VMs (opennebula) which had I/O in this time crashed.
> We stopped the glusterfsd hard (kill -9) and restarted it, what made
> things work again (Certainly we also had to restart the failed VMs). But
> I think this shouldn't happen. Since quorum was not reached (2/3 hosts
> were still up and connected).
>
> Here some infos of our system:
> OS: CentOS Linux release 7.1.1503
> Glusterfs version: glusterfs 3.7.3
>
> gluster volume info:
>
> Volume Name: cluster1
> Type: Replicate
> Volume ID:
> Status: Started
> Number of Bricks: 1 x 2 = 2
> Transport-type: tcp
> Bricks:
> Brick1: srv01:/home/gluster
> Brick2: srv02:/home/gluster
> Options Reconfigured:
> cluster.self-heal-daemon: enable
> cluster.server-quorum-type: server
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.stat-prefetch: on
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> server.allow-insecure: on
> nfs.disable: 1
>
> Hope you can help us.
>
> Thanks a lot.
>
> Best regards
> Dominique
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users