[Gluster-users] Gluster Community Newsletter, August 2016

2016-08-28 Thread Amye Scavarda
Important happenings for Gluster this month:
3.7.14 released
3.8.3 released

CFP for Gluster Developer Summit open until August 31st
http://www.gluster.org/pipermail/gluster-devel/2016-August/050435.html

gluster-users:
[Gluster-users] release-3.6 end of life
http://www.gluster.org/pipermail/gluster-users/2016-August/028078.html
- Joe requests a review of the 3.6 EOL proposal

[Gluster-users] The out-of-order GlusterFS 3.8.3 release addresses a
usability regression
http://www.gluster.org/pipermail/gluster-users/2016-August/028155.html
Niels de Vos announces 3.8.3

[Gluster-users] GlusterFS-3.7.14 released
http://www.gluster.org/pipermail/gluster-users/2016-August/027755.html
Kaushal M announces 3.7.14

Gluster-devel:

[Gluster-devel] Proposing a framework to leverage existing Python unit test
standards for our testing
http://www.gluster.org/pipermail/gluster-devel/2016-August/050321.html
Sankarshan Mukhopadhyay

[Gluster-devel] Events API: Adding support for Client Events
http://www.gluster.org/pipermail/gluster-devel/2016-August/050324.html
Aravinda discusses changes in Eventings

[Gluster-devel] Geo-replication: Improving the performance during History
Crawl
http://www.gluster.org/pipermail/gluster-devel/2016-August/050372.html
Aravinda starts a conversation about history and geo-replication

[Gluster-devel] Backup support for GlusterFS
http://www.gluster.org/pipermail/gluster-devel/2016-August/050381.html
Alok Srivastava revives an older thread on backup support for GlusterFS

[Gluster-devel] md-cache improvements
http://www.gluster.org/pipermail/gluster-devel/2016-August/050402.html
- Dan makes suggestions for md-cache improvements

[gluster-devel] Documentation Tooling Review
http://www.gluster.org/pipermail/gluster-devel/2016-August/050418.html
- Amye proposes some documentation tooling review

[Gluster-devel] CFP for Gluster Developer Summit
http://www.gluster.org/pipermail/gluster-devel/2016-August/050435.html
- Call for Proposals for Gluster Developer Summit

gluster-infra
[Gluster-infra] Please welcome Worker Ant
http://www.gluster.org/pipermail/gluster-infra/2016-August/002671.html
Nigel announces a new bugzilla bot

[Gluster-infra] Idea: Failure Trends
http://www.gluster.org/pipermail/gluster-infra/2016-August/002636.html
Nigel proposes a test failure tracking website

-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glusterd daemon dead, but glusterfsd still running

2016-08-28 Thread Atin Mukherjee
On Friday 26 August 2016, Matthew Wade  wrote:

>
>
> HI,
>
> We are currently running a three node cluster, on Gluster 3.6.4.
>
>  On one of our nodes we noticed that the glusterd daemon is dead.
>
> But the glusterfsd daemons are still running, and we believe clients are
> connecting and retrieving data
>
> We noticed that the daemon has been dead for a week, and we didn't see it.
>
> We would like to know are we safe to just go ahead and start the glusterd
> service again.
>

It can be started safely as mgmt & i/o path do not interfere with each
other.

>
>
> If so would this trigger a self-heal on all volumes? As this would cause a
> performance issue.
>

Why would you need a self heal trigger if all of your bricks w(a)are
running.

>
>
> The logs for this node is as follows::
>
> [2016-08-19 18:01:52.804453] E [rpc-clnt.c:362:saved_frames_unwind] (-->
> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f4f3ffca550]
> (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7f4f3fd9f787]
> (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4f3fd9f89e]
> (--> 
> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4f3fd9f951]
> (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7f4f3fd9ff1f]
> ) 0-DAOS-client-4: forced unwinding frame type(GF-DUMP) op(DUMP(1))
> called at 2016-08-19 18:01:51.886737 (xid=0x144a1d)
> [2016-08-19 18:01:52.804480] W 
> [client-handshake.c:1588:client_dump_version_cbk]
> 0-DAOS-client-4: received RPC status error
> [2016-08-19 18:01:52.804504] W [socket.c:620:__socket_rwv] 0-glusterfs:
> readv on 127.0.0.1:24007 failed (No data available)
> [2016-08-19 18:02:02.900863] E [socket.c:2276:socket_connect_finish]
> 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)
>
> If we aren't safe to do so, what else should we do to resolve this?
>
> *Matt Wade*
> IT Operations Analyst
>
> IOP Publishing
> Temple Circus, Temple Way, Bristol
> BS1 6HG, UK
>
> Direct line +44 (0)117 930 1136
>
> ioppublishing.org
>
> --
> This email (and attachments) are confidential and intended for the
> addressee(s) only. If you are not the intended recipient please immediately
> notify the sender, permanently and securely delete any copies and do not
> take action with it or in reliance on it. Any views expressed are the
> author's and do not represent those of IOP, except where specifically
> stated. IOP takes reasonable precautions to protect against viruses but
> accepts no responsibility for loss or damage arising from virus infection.
> For the protection of IOP's systems and staff emails are scanned
> automatically..
>
> Institute of Physics. Registered charity no. 293851 (England & Wales) and
> SCO40092 (Scotland)
> Registered Office:  76 Portland Place, London W1B 1NT
> --
>





-- 
--Atin
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] So what are people using for 10G nics

2016-08-28 Thread Doug Ingham
Switch wise, have a look at the HP FlexFabric 5700-32XGT-8XG-2QSFP+ & Cisco
SG550XG-24T.

For what it's worth, you can minimise your bandwidth whilst maintaining
quorum if you use arbiters.

https://gluster.readthedocs.io/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/

On 26 August 2016 at 16:04, WK  wrote:

> Prices seem to be dropping online at NewEgg etc and going from 2 nodes to
> 3 nodes for a quorum implies a lot more traffic than would be comfortable
> with 1G.
>
> Any NIC/Switch recommendations for RH/Cent 7.x and Ubuntu 16?
>
>
> -wk
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 
Doug
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] glusterd daemon dead, but glusterfsd still running

2016-08-28 Thread Matthew Wade
HI, 

We are currently running a three node cluster, on Gluster 3.6.4.

 On one of our nodes we noticed that the glusterd daemon is dead.

But the glusterfsd daemons are still running, and we believe clients are 
connecting and retrieving data

We noticed that the daemon has been dead for a week, and we didn't see it.

We would like to know are we safe to just go ahead and start the glusterd 
service again.

If so would this trigger a self-heal on all volumes? As this would cause a 
performance issue.

The logs for this node is as follows::

[2016-08-19 18:01:52.804453] E [rpc-clnt.c:362:saved_frames_unwind] (--> 
/usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f4f3ffca550] (--> 
/usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7f4f3fd9f787] (--> 
/usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4f3fd9f89e] (--> 
/usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4f3fd9f951] 
(--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7f4f3fd9ff1f] ) 
0-DAOS-client-4: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called 
at 2016-08-19 18:01:51.886737 (xid=0x144a1d)
[2016-08-19 18:01:52.804480] W 
[client-handshake.c:1588:client_dump_version_cbk] 0-DAOS-client-4: 
received RPC status error
[2016-08-19 18:01:52.804504] W [socket.c:620:__socket_rwv] 0-glusterfs: 
readv on 127.0.0.1:24007 failed (No data available)
[2016-08-19 18:02:02.900863] E [socket.c:2276:socket_connect_finish] 
0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)

If we aren't safe to do so, what else should we do to resolve this?

Matt Wade
IT Operations Analyst

IOP Publishing
Temple Circus, Temple Way, Bristol
BS1 6HG, UK

Direct line +44 (0)117 930 1136

ioppublishing.org

This email (and attachments) are confidential and intended for the addressee(s) 
only. If you are not the intended recipient please notify the sender, 
delete any copies and do not take action in reliance on it. Any views expressed 
are the author's and do not represent those of IOP, except where specifically 
stated. IOP takes reasonable precautions to protect against viruses but accepts 
no responsibility for loss or damage arising from virus infection. 
For the protection of IOP's systems and staff emails are scanned automatically. 

Institute of Physics. Registered charity no. 293851 (England & Wales) and 
SCO40092 (Scotland)
Registered Office:  76 Portland Place, London W1B 1NT  
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] canceling full heal 3.8

2016-08-28 Thread David Gossage
On Sat, Aug 27, 2016 at 11:01 PM, David Gossage  wrote:

> On Sat, Aug 27, 2016 at 9:55 PM, David Gossage <
> dgoss...@carouselchecks.com> wrote:
>
>> On Sat, Aug 27, 2016 at 5:35 PM, David Gossage <
>> dgoss...@carouselchecks.com> wrote:
>>
>>> On Aug 27, 2016 4:37 PM, "Lindsay Mathieson" <
>>> lindsay.mathie...@gmail.com> wrote:
>>> >
>>> > On 28/08/2016 6:07 AM, David Gossage wrote:
>>> >>
>>> >> 7 hours after starting full heal shards still haven't started
>>> healing, and count from heal statistics heal-count has only reached 1800
>>> out of 19000 shards.  shards dir hasn't even been recreated yet.  Creation
>>> of the non sharded stubs (do they have a more official term?) in the
>>> visible mount point was as speedy as expected.  shards are painfully slow.
>>> >
>>> >
>>> >
>>> > Is your CPU usage through the roof?
>>>
>>> Currently it has almost no activity.  First node yesterday got a bit
>>> high.  But 2nd node today that has issues is pretty low.
>>> >
>>> > If you haven't already, I'd suggest
>>> >
>>> > - changing "cluster.data-self-heal-algorithm" to "full"
>>> >
>>> > - And restarting the gluster volume if possible
>>> >
>>>
>>> I'll shut down vm's later tonight and see if that helps at all.
>>>
>>
>> applied  "cluster.data-self-heal-algorithm" to "full"
>>
>> stopped volume - started volume
>>
>> cpu activity barely noticeable. heal count crawling at 1 new addition to
>> list every minute or 2 and stil just building a list it hasnt started
>> making a .shard directory
>>
>
> logging into each linux vm and running from / 'find . | xargs stat ' seeme
> dto make count ump a bit faster. wasnt quite sure best way to repeatthat
> for windows vm's so just ran full system virus scans.
>
> Still after 15+ plus hours now its listed 3600/19000 shards in to be
> healed list and started healing none.
>

24hours and its added 25% of shards to list and not started any.  I sense
this will be a pleasant monday morning tomorrrow.

> >
>>> > I have a suspicion something changed recently with heal, I've noticed
>>> that it takes a long time (hours) to kick in when the diff algorithm is
>>> used. I don't recall it doing this with 3.7.11
>>> >
>>> >
>>> > --
>>> > Lindsay Mathieson
>>> >
>>>
>>
>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users