Re: [Gluster-users] [Gluster-devel] Don't allow data loss via add-brick (was Re: Add single server)

2017-05-01 Thread Pranith Kumar Karampuri
Yeah it is a good idea. I asked him to raise a bug and we can move forward
with it.

On Mon, May 1, 2017 at 9:07 PM, Joe Julian  wrote:

>
> On 04/30/2017 01:13 AM, lemonni...@ulrar.net wrote:
>
>> So I was a little but luck. If I has all the hardware part, probably i
>>> would be firesd after causing data loss by using a software marked as
>>> stable
>>>
>> Yes, we lost our data last year to this bug, and it wasn't a test cluster.
>> We still hear from it from our clients to this day.
>>
>> Is known that this feature is causing data loss and there is no evidence
>>> or
>>> no warning in official docs.
>>>
>>> I was (I believe) the first one to run into the bug, it happens and I
>> knew it
>> was a risk when installing gluster.
>> But since then I didn't see any warnings anywhere except here, I agree
>> with you that it should be mentionned in big bold letters on the site.
>>
>> Might even be worth adding a warning directly on the cli when trying to
>> add bricks if sharding is enabled, to make sure no-one will destroy a
>> whole cluster for a known bug.
>>
>
> I absolutely agree - or, just disable the ability to add-brick with
> sharding enabled. Losing data should never be allowed.
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster Monthly Newsletter, April 2017

2017-05-01 Thread Amye Scavarda
Gluster Monthly Newsletter, April 2017

Release 3.11 has been branched and tagged! More details on the mailing list.

http://lists.gluster.org/pipermail/gluster-users/2017-April/030764.html


Our weekly community meeting has changed: we'll be meeting every other
week instead of weekly, moving the time to 15:00 UTC, and our agenda
is at: https://bit.ly/gluster-community-meetings

We hope this means that more people can join us. Kaushal outlines the
changes on the mailing list:
http://lists.gluster.org/pipermail/gluster-devel/2017-January/051918.html


New meetup! We’re delighted to welcome the first Seattle Storage
meetup, run by our very own Joe Julian.

https://www.meetup.com/Seattle-Storage-Meetup/


Coming to Red Hat Summit?

Come find us at the Gluster Community Booth in our Community Central area!


Upcoming Talks:

Red Hat Summit:

Container-Native Storage for Modern Applications with OpenShift and
Red Hat Gluster Storage

http://bit.ly/2qpLVP0

Architecting and Performance-Tuning Efficient Gluster Storage Pools

http://bit.ly/2qpMgkK


Noteworthy threads:

Gluster-users:

Announcing release 3.11 : Scope, schedule and feature tracking

http://lists.gluster.org/pipermail/gluster-users/2017-April/030561.html

Usability Initiative for Gluster: Documentation

http://lists.gluster.org/pipermail/gluster-users/2017-April/030567.html

How do you oVirt? Here the answers!

http://lists.gluster.org/pipermail/gluster-users/2017-April/030592.html

Revisiting Quota functionality in GlusterFS

http://lists.gluster.org/pipermail/gluster-users/2017-April/030676.html


Gluster-devel:

Back porting guidelines: Change-ID consistency across branches

http://lists.gluster.org/pipermail/gluster-devel/2017-April/052495.html

GlusterFS+NFS-Ganesha longevity cluster

http://lists.gluster.org/pipermail/gluster-devel/2017-April/052503.html

GFID2 - Proposal to add extra byte to existing GFID

http://lists.gluster.org/pipermail/gluster-devel/2017-April/052520.html

[Gluster-Maintainers] Maintainers 2.0 Proposal

http://lists.gluster.org/pipermail/gluster-devel/2017-April/052551.html

Proposal for an extended READDIRPLUS operation via gfAPI

http://lists.gluster.org/pipermail/gluster-devel/2017-April/052596.html


Gluster-infra:

Jenkins Upgrade

http://lists.gluster.org/pipermail/gluster-infra/2017-April/003495.html


Gluster Top 5 Contributors in the last 30 days:

Krutika Dhananjay, Michael Scherer, Kaleb S. Keithley, Nigel Babu,
Xavier Hernandez


Upcoming CFPs:

Open Source Summit North America -
http://events.linuxfoundation.org/events/open-source-summit-north-america/program/cfp
 - May 6

Open Source Summit Europe -

http://events.linuxfoundation.org/events/open-source-summit-europe/program/cfp
- July 8



-- 
Amye Scavarda | a...@redhat.com | Gluster Community Lead
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Don't allow data loss via add-brick (was Re: Add single server)

2017-05-01 Thread Mahdi Adnan
I first encountered this bug about a year ago, and lost more than 100 VM.

Sharding is essential to VM datastores and i think Gluster is't that useful 
without this feature for VMs.

I appreciate all the hard work that the developers putting on this bug, but i 
think a warning in CLI or something would be really helpful for Gluster users 
until the fix is ready.


--

Respectfully
Mahdi A. Mahdi


From: gluster-users-boun...@gluster.org  on 
behalf of Joe Julian 
Sent: Monday, May 1, 2017 6:37:00 PM
To: gluster-users@gluster.org; Gluster Devel
Subject: [Gluster-users] Don't allow data loss via add-brick (was Re: Add 
single server)


On 04/30/2017 01:13 AM, lemonni...@ulrar.net wrote:
>> So I was a little but luck. If I has all the hardware part, probably i
>> would be firesd after causing data loss by using a software marked as stable
> Yes, we lost our data last year to this bug, and it wasn't a test cluster.
> We still hear from it from our clients to this day.
>
>> Is known that this feature is causing data loss and there is no evidence or
>> no warning in official docs.
>>
> I was (I believe) the first one to run into the bug, it happens and I knew it
> was a risk when installing gluster.
> But since then I didn't see any warnings anywhere except here, I agree
> with you that it should be mentionned in big bold letters on the site.
>
> Might even be worth adding a warning directly on the cli when trying to
> add bricks if sharding is enabled, to make sure no-one will destroy a
> whole cluster for a known bug.

I absolutely agree - or, just disable the ability to add-brick with
sharding enabled. Losing data should never be allowed.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Joe Julian



On 05/01/2017 11:47 AM, Pranith Kumar Karampuri wrote:



On Tue, May 2, 2017 at 12:14 AM, Shyam > wrote:


On 05/01/2017 02:42 PM, Pranith Kumar Karampuri wrote:



On Tue, May 2, 2017 at 12:07 AM, Shyam 
>> wrote:

On 05/01/2017 02:23 PM, Pranith Kumar Karampuri wrote:



On Mon, May 1, 2017 at 11:43 PM, Shyam

>
 

Re: [Gluster-users] Add single server

2017-05-01 Thread Joe Julian



On 05/01/2017 11:55 AM, Pranith Kumar Karampuri wrote:



On Tue, May 2, 2017 at 12:20 AM, Gandalf Corvotempesta 
> wrote:


2017-05-01 20:43 GMT+02:00 Shyam >:
> I do agree that for the duration a brick is replaced its
replication count
> is down by 1, is that your concern? In which case I do note that
without (a)
> above, availability is at risk during the operation. Which needs
other
> strategies/changes to ensure tolerance to errors/faults.

Oh, yes, i've forgot this too.

I don't know Ceph, but Lizard, when moving chunks across the cluster,
does a copy, not a movement
During the whole operation you'll end with some files/chunks
replicated more than the requirement.


Replace-brick as a command is implemented with the goal of replacing a 
disk that went bad. So the availability was already less. In 2013-2014 
I proposed that we do it by adding brick to just the replica set and 
increase its replica-count just for that set once heal is complete we 
could remove this brick. But at the point I didn't see any benefit to 
that approach, because availability was already down by 1. But with 
all of this discussion it seems like a good time to revive this idea. 
I saw that Shyam suggested the same in the PR he mentioned before.


I've always been against the idea of being a replica down based on that 
supposition. I've never had to replace-brick because a brick failed. 
It's always been for reconfiguration reasons. Good monitoring and 
analysis can predict drive failures in plenty of time to replace a 
functioning brick.




If you have a replica 3, during the movement, some file get replica 4
In Gluster the same operation will bring you replica 2.

IMHO, this isn't a viable/reliable solution

Any change to change "replace-brick" to increase the replica count
during the operation ?

It can be done. We just need to find time to do this.


--
Pranith


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Gluster and NFS-Ganesha - cluster is down after reboot

2017-05-01 Thread Rudolf
Hi Gluster users,

First, I'd like to thank you all for this amazing open-source! Thank you!

I'm working on home project – three servers with Gluster and NFS-Ganesha.
My goal is to create HA NFS share with three copies of each file on each
server.

My systems are CentOS 7.3 Minimal install with the latest updates and the
most current RPMs from "centos-gluster310" repository.

I followed this tutorial:
http://blog.gluster.org/2015/10/linux-scale-out-nfsv4-using-nfs-ganesha-and-glusterfs-one-step-at-a-time/
(second half that describes multi-node HA setup)

with a few exceptions:

1. All RPMs are from "centos-gluster310" repo that is installed by "yum -y
install centos-release-gluster"
2. I have three nodes (not four) with "replica 3" volume.
3. I created empty ganesha.conf and not empty ganesha-ha.conf in
"/var/run/gluster/shared_storage/nfs-ganesha/" (referenced blog post is
outdated, this is now requirement)
4. ganesha-ha.conf doesn't have "HA_VOL_SERVER" since this isn't needed
anymore.

When I finish configuration, all is good. nfs-ganesha.service is active and
running and from client I can ping all three VIPs and I can mount NFS.
Copied files are replicated to all nodes.

But when I restart nodes (one by one, with 5 min. delay between) then I
cannot ping or mount (I assume that all VIPs are down). So my setup
definitely isn't HA.

I found that:
# pcs status
Error: cluster is not currently running on this node

and nfs-ganesha.service is in inactive state. Btw. I didn't enable
"systemctl enable nfs-ganesha" since I assume that this is something that
Gluster does.

I assume that my issue is that I followed instructions in blog post from
2015/10 that are outdated. Unfortunately I cannot find anything better – I
spent whole day by googling.

Would you be so kind and check the instructions in blog post and let me
know what steps are wrong / outdated? Or please do you have more current
instructions for Gluster+Ganesha setup?

Thank you.

Kind regards,
Adam
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 21:08 GMT+02:00 Vijay Bellur :
> We might also want to start thinking about spare bricks that can be brought
> into a volume based on some policy.  For example, if the posix health
> checker determines that underlying storage stack has problems, we can bring
> a spare brick into the volume to replace the failing brick. More policies
> can be evolved for triggering the action of bringing in a spare brick to a
> volume.

Something similiar to a global hotspare.
if Gluster detects some SMART issues (lot of reallocation, predictive
failures and so on) can
bring the hotspare in action, starting to replace the almost-failed disks.

If disk fails totally during the replace, healing should start from a
middle point and not from start (as some data were already synced
automatically) by using the "spare" disk as automatic replacement.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


[Gluster-users] glustershd: unable to get index-dir on myvolume-client-0

2017-05-01 Thread mabi
Hi,
I have a two nodes GlusterFS 3.8.11 replicated volume and just noticed today in 
the glustershd.log log file a lot of the following warning messages:

[2017-05-01 18:42:18.004747] W [MSGID: 108034] 
[afr-self-heald.c:479:afr_shd_index_sweep] 0-myvolume-replicate-0: unable to 
get index-dir on myvolume-client-0
[2017-05-01 18:52:19.004989] W [MSGID: 108034] 
[afr-self-heald.c:479:afr_shd_index_sweep] 0-myvolume-replicate-0: unable to 
get index-dir on myvolume-client-0
[2017-05-01 19:02:20.004827] W [MSGID: 108034] 
[afr-self-heald.c:479:afr_shd_index_sweep] 0-myvolume-replicate-0: unable to 
get index-dir on myvolume-client-0

Does someone understand what it means and if I should be concerned or not? 
Could it be related that I use ZFS and not XFS as filesystem?

Best regards,
M.___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Questions about the limitations on using Gluster Volume Tiering.

2017-05-01 Thread Jeff Byers
Hello,

We've been thinking about giving GlusterFS Tiering a try, but
had noticed the following limitations documented in the:

Red Hat Gluster Storage 3.2 Administration Guide

Limitations of arbitrated replicated volumes:

Tiering is not compatible with arbitrated replicated volumes.

17.3. Tiering Limitations

In this release, only Fuse and NFSv3 access is supported.
Server Message Block (SMB) and NFSv4 access to tiered
volume is not supported.

I don't quite understand the SMB restriction. Is the restriction
that you cannot use the GlusterFS 'gfapi' vfs interface to Samba,
but you can use Samba layered over a FUSE mount?

Is the problem here that with the 'gfapi' vfs interface, the
'tier-xlator' is not involved, or does not work properly?

BTW, my colleague did a quick test using SMB with 'libgfapi',
configured, and it seemed to work fine, but that doesn't mean
that it was working correctly.

The same questions regarding NFSv3 vs NFSv4. My understanding
is that NFSv3 is supported internally by GlusterFS, but NFSv4
is external. That would make me think that NFSv3 would have a
problem with tiering, but it is NFSv4 that is not supported, but
it is the opposite.

I guess I don't understand what's behind these limitations.

Related question, the tiering operates on volume files, not
brick files, so tiering should be compatible with sharding?

In a scale-out configuration, I assume that the heat
map/counters are shared globally so that no matter where the
client(s) read/write to/from, they get counted properly in the
heat counts, and get the correct file.

There must be some place that stores this meta-data. Is
this meta-data shared between all of the GlusterFS nodes,
does it go on a GlusterFS meta-data volume? I didn't see
any way to specify the storage location, but I suppose it
could go in a bricks .glusterfs/ directory, but isn't that is
per-brick, not per-volume.

Thanks.

~ Jeff Byers ~
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 21:00 GMT+02:00 Shyam :
> So, Gandalf, it will be part of the roadmap, just when we maybe able to pick
> and deliver this is not clear yet (as Pranith puts it as well).

I doesn't matter when. Knowing that adding a single brick will be made
possible is enough (at least for me)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:55 GMT+02:00 Pranith Kumar Karampuri :
> Replace-brick as a command is implemented with the goal of replacing a disk
> that went bad. So the availability was already less. In 2013-2014 I proposed
> that we do it by adding brick to just the replica set and increase its
> replica-count just for that set once heal is complete we could remove this
> brick. But at the point I didn't see any benefit to that approach, because
> availability was already down by 1. But with all of this discussion it seems
> like a good time to revive this idea. I saw that Shyam suggested the same in
> the PR he mentioned before.

Why availability is already less?
replace-brick is usefull for adding a new disks (as we are discussing
here) or if you
have to preventive replace/dismiss a disk.

If you have disks that are getting older and older, you can safely
replace them one by one
with replace-disks. Doing this way will keep you at desidered
redundancy for the whole phase.
If you just remove the older disk and let gluster to heal, you loose
one replica. During the heal
process another disk could fail and so on.

The same like with any RAID. If possible, adding the new disks and the
remove the older one is
better than brutally replace disks. "mdadm", with replace disks (and I
think also ZFS) will add the new
disks keeping full redundancy and after replacement is done, the older
disk is decommissioned.

I don't see any drawback in doing this even with gluster, only advantages.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Vijay Bellur
On Mon, May 1, 2017 at 2:55 PM, Pranith Kumar Karampuri  wrote:

>
>
> On Tue, May 2, 2017 at 12:20 AM, Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com> wrote:
>
>> 2017-05-01 20:43 GMT+02:00 Shyam :
>> > I do agree that for the duration a brick is replaced its replication
>> count
>> > is down by 1, is that your concern? In which case I do note that
>> without (a)
>> > above, availability is at risk during the operation. Which needs other
>> > strategies/changes to ensure tolerance to errors/faults.
>>
>> Oh, yes, i've forgot this too.
>>
>> I don't know Ceph, but Lizard, when moving chunks across the cluster,
>> does a copy, not a movement
>> During the whole operation you'll end with some files/chunks
>> replicated more than the requirement.
>>
>
> Replace-brick as a command is implemented with the goal of replacing a
> disk that went bad. So the availability was already less. In 2013-2014 I
> proposed that we do it by adding brick to just the replica set and increase
> its replica-count just for that set once heal is complete we could remove
> this brick. But at the point I didn't see any benefit to that approach,
> because availability was already down by 1. But with all of this discussion
> it seems like a good time to revive this idea. I saw that Shyam suggested
> the same in the PR he mentioned before.
>
>

The ability to increase and decrease the replication count within a replica
set would be pretty cool. In addition to replace-brick,  workloads that
need elasticity to serve reads can benefit from more replicas to provide
load balancing. Once the load is back to normal, we can cull the temporary
brick.

We might also want to start thinking about spare bricks that can be brought
 into a volume based on some policy.  For example, if the posix health
checker determines that underlying storage stack has problems, we can bring
a spare brick into the volume to replace the failing brick. More policies
can be evolved for triggering the action of bringing in a spare brick to a
volume.

-Vijay

-Vijay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Tue, May 2, 2017 at 12:30 AM, Shyam  wrote:

> On 05/01/2017 02:55 PM, Pranith Kumar Karampuri wrote:
>
>>
>>
>> On Tue, May 2, 2017 at 12:20 AM, Gandalf Corvotempesta
>> > > wrote:
>>
>> 2017-05-01 20:43 GMT+02:00 Shyam > >:
>> > I do agree that for the duration a brick is replaced its
>> replication count
>> > is down by 1, is that your concern? In which case I do note that
>> without (a)
>> > above, availability is at risk during the operation. Which needs
>> other
>> > strategies/changes to ensure tolerance to errors/faults.
>>
>> Oh, yes, i've forgot this too.
>>
>> I don't know Ceph, but Lizard, when moving chunks across the cluster,
>> does a copy, not a movement
>> During the whole operation you'll end with some files/chunks
>> replicated more than the requirement.
>>
>>
>> Replace-brick as a command is implemented with the goal of replacing a
>> disk that went bad. So the availability was already less. In 2013-2014 I
>> proposed that we do it by adding brick to just the replica set and
>> increase its replica-count just for that set once heal is complete we
>> could remove this brick. But at the point I didn't see any benefit to
>> that approach, because availability was already down by 1. But with all
>> of this discussion it seems like a good time to revive this idea. I saw
>> that Shyam suggested the same in the PR he mentioned before.
>>
>
> Ah! I did not know this, thanks. Yes, in essence this is what I suggest,
> but at that time (13-14) I guess we did not have EC, so in the current
> proposal I include EC and also on ways to deal with pure-distribute only
> environments, using the same/similar scheme.
>

Yeah this whole discussion came up because we wanted to deprecate pump
xlator which was doing what we are discussing here on the brick side for
both replicate and distribute(EC didn't exist at the time) but it had its
problems. So Avati at the time suggested we use replace-brick for replica
and remove-brick/add-brick for distribute and deprecate pump. I suggested
we could instead increase replica count for just that
brick(plain-distribute)/replica set alone. I just didn't have strong reason
to push for it because I never thought of this usecase at the time. If I
knew we would have had this feature in action ;-).


>
>
>>
>>
>> If you have a replica 3, during the movement, some file get replica 4
>> In Gluster the same operation will bring you replica 2.
>>
>> IMHO, this isn't a viable/reliable solution
>>
>> Any change to change "replace-brick" to increase the replica count
>> during the operation ?
>>
>> It can be done. We just need to find time to do this.
>>
>
> Agreed, to add to this point, and to reiterate. We are looking at "+1
> scaling", this discussion helps in attempting to converge on a lot of why's
> for the same at least, if not necessarily the how's.
>
> So, Gandalf, it will be part of the roadmap, just when we maybe able to
> pick and deliver this is not clear yet (as Pranith puts it as well).
>
>
>>
>> --
>> Pranith
>>
>


-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:46 GMT+02:00 Shyam :
> Fair point. If Gandalf concurs, we will add this to our "+1 scaling" feature
> effort (not yet on github as an issue).

Everything is ok for me as long that:

a) operation must be automated (this is what i've asked initially
[1]), maybe with a single command.

b) during the replace phase, replica count is not decreased but increased.

[1]
Joe's solution require a distributed-replicated volume. this shouldn't
be an issue, modern servers has multiple disk bays, starting with 1
disk per server (replica 3) is fine, If I need to expand i'll add 3
disks more (1 more per server). After this, will be possible to do a
replacement like in Joe's solution. The biggest drawback is that this
soulution isn't viable for server with just 1 brick and no more disk
bays available (ie: a single raid with all disks used as single brick)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Shyam

On 05/01/2017 02:55 PM, Pranith Kumar Karampuri wrote:



On Tue, May 2, 2017 at 12:20 AM, Gandalf Corvotempesta
> wrote:

2017-05-01 20:43 GMT+02:00 Shyam >:
> I do agree that for the duration a brick is replaced its replication count
> is down by 1, is that your concern? In which case I do note that without 
(a)
> above, availability is at risk during the operation. Which needs other
> strategies/changes to ensure tolerance to errors/faults.

Oh, yes, i've forgot this too.

I don't know Ceph, but Lizard, when moving chunks across the cluster,
does a copy, not a movement
During the whole operation you'll end with some files/chunks
replicated more than the requirement.


Replace-brick as a command is implemented with the goal of replacing a
disk that went bad. So the availability was already less. In 2013-2014 I
proposed that we do it by adding brick to just the replica set and
increase its replica-count just for that set once heal is complete we
could remove this brick. But at the point I didn't see any benefit to
that approach, because availability was already down by 1. But with all
of this discussion it seems like a good time to revive this idea. I saw
that Shyam suggested the same in the PR he mentioned before.


Ah! I did not know this, thanks. Yes, in essence this is what I suggest, 
but at that time (13-14) I guess we did not have EC, so in the current 
proposal I include EC and also on ways to deal with pure-distribute only 
environments, using the same/similar scheme.






If you have a replica 3, during the movement, some file get replica 4
In Gluster the same operation will bring you replica 2.

IMHO, this isn't a viable/reliable solution

Any change to change "replace-brick" to increase the replica count
during the operation ?

It can be done. We just need to find time to do this.


Agreed, to add to this point, and to reiterate. We are looking at "+1 
scaling", this discussion helps in attempting to converge on a lot of 
why's for the same at least, if not necessarily the how's.


So, Gandalf, it will be part of the roadmap, just when we maybe able to 
pick and deliver this is not clear yet (as Pranith puts it as well).





--
Pranith

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Tue, May 2, 2017 at 12:20 AM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> 2017-05-01 20:43 GMT+02:00 Shyam :
> > I do agree that for the duration a brick is replaced its replication
> count
> > is down by 1, is that your concern? In which case I do note that without
> (a)
> > above, availability is at risk during the operation. Which needs other
> > strategies/changes to ensure tolerance to errors/faults.
>
> Oh, yes, i've forgot this too.
>
> I don't know Ceph, but Lizard, when moving chunks across the cluster,
> does a copy, not a movement
> During the whole operation you'll end with some files/chunks
> replicated more than the requirement.
>

Replace-brick as a command is implemented with the goal of replacing a disk
that went bad. So the availability was already less. In 2013-2014 I
proposed that we do it by adding brick to just the replica set and increase
its replica-count just for that set once heal is complete we could remove
this brick. But at the point I didn't see any benefit to that approach,
because availability was already down by 1. But with all of this discussion
it seems like a good time to revive this idea. I saw that Shyam suggested
the same in the PR he mentioned before.


>
> If you have a replica 3, during the movement, some file get replica 4
> In Gluster the same operation will bring you replica 2.
>
> IMHO, this isn't a viable/reliable solution
>
> Any change to change "replace-brick" to increase the replica count
> during the operation ?
>
It can be done. We just need to find time to do this.


-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Shyam

On 05/01/2017 02:47 PM, Pranith Kumar Karampuri wrote:



On Tue, May 2, 2017 at 12:14 AM, Shyam > wrote:

On 05/01/2017 02:42 PM, Pranith Kumar Karampuri wrote:



On Tue, May 2, 2017 at 12:07 AM, Shyam 
>> wrote:

On 05/01/2017 02:23 PM, Pranith Kumar Karampuri wrote:



On Mon, May 1, 2017 at 11:43 PM, Shyam

>



Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:43 GMT+02:00 Shyam :
> I do agree that for the duration a brick is replaced its replication count
> is down by 1, is that your concern? In which case I do note that without (a)
> above, availability is at risk during the operation. Which needs other
> strategies/changes to ensure tolerance to errors/faults.

Oh, yes, i've forgot this too.

I don't know Ceph, but Lizard, when moving chunks across the cluster,
does a copy, not a movement
During the whole operation you'll end with some files/chunks
replicated more than the requirement.

If you have a replica 3, during the movement, some file get replica 4
In Gluster the same operation will bring you replica 2.

IMHO, this isn't a viable/reliable solution

Any change to change "replace-brick" to increase the replica count
during the operation ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:42 GMT+02:00 Joe Julian :
> Because it's done by humans.

Exactly. I forgot to mention this.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Tue, May 2, 2017 at 12:14 AM, Shyam  wrote:

> On 05/01/2017 02:42 PM, Pranith Kumar Karampuri wrote:
>
>>
>>
>> On Tue, May 2, 2017 at 12:07 AM, Shyam > > wrote:
>>
>> On 05/01/2017 02:23 PM, Pranith Kumar Karampuri wrote:
>>
>>
>>
>> On Mon, May 1, 2017 at 11:43 PM, Shyam > 
>> >> wrote:
>>
>> On 05/01/2017 02:00 PM, Pranith Kumar Karampuri wrote:
>>
>> Splitting the bricks need not be a post factum
>> decision, we can
>> start with larger brick counts, on a given node/disk
>> count, and
>> hence spread these bricks to newer nodes/bricks as
>> they are
>> added.
>>
>>
>> Let's say we have 1 disk, we format it with say XFS and
>> that
>> becomes a
>> brick at the moment. Just curious, what will be the
>> relationship
>> between
>> brick to disk in this case(If we leave out LVM for this
>> example)?
>>
>>
>> I would assume the relation is brick to provided FS
>> directory (not
>> brick to disk, we do not control that at the moment, other
>> than
>> providing best practices around the same).
>>
>>
>> Hmmm... as per my understanding, if we do this then 'df' I guess
>> will
>> report wrong values? available-size/free-size etc will be
>> counted more
>> than once?
>>
>>
>> This is true even today, if anyone uses 2 bricks from the same mount.
>>
>>
>> That is the reason why documentation is the way it is as far as I can
>> remember.
>>
>>
>>
>> I forgot a converse though, we could take a disk and partition it
>> (LVM thinp volumes) and use each of those partitions as bricks,
>> avoiding the problem of df double counting. Further thinp will help
>> us expand available space to other bricks on the same disk, as we
>> destroy older bricks or create new ones to accommodate the moving
>> pieces (needs more careful thought though, but for sure is a
>> nightmare without thinp).
>>
>> I am not so much a fan of large number of thinp partitions, so as
>> long as that is reasonably in control, we can possibly still use it.
>> The big advantage though is, we nuke a thinp volume when the brick
>> that uses that partition, moves out of that disk, and we get the
>> space back, rather than having or to something akin to rm -rf on the
>> backend to reclaim space.
>>
>>
>> Other way to achieve the same is to leverage the quota functionality of
>> counting how much size is used under a directory.
>>
>
> Yes, I think this is the direction to solve the 2 bricks on a single FS as
> well. Also, IMO, the weight of accounting at each directory level that
> quota brings in seems/is heavyweight to solve just *this* problem.


I saw some github issues where Sanoj is exploring XFS-quota integration.
Project Quota ideas which are a bit less heavy would be nice too. Actually
all these issues are very much interlinked.

It all seems to point that we basically need to increase granularity of
brick and solve problems that come up as we go along.


>
>
>
>>
>>
>>
>>
>>
>>
>> Today, gluster takes in a directory on host as a brick, and
>> assuming
>> we retain that, we would need to split this into multiple
>> sub-dirs
>> and use each sub-dir as a brick internally.
>>
>> All these sub-dirs thus created are part of the same volume
>> (due to
>> our current snapshot mapping requirements).
>>
>>
>>
>>
>> --
>> Pranith
>>
>>
>>
>>
>> --
>> Pranith
>>
>


-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Shyam

On 05/01/2017 02:42 PM, Joe Julian wrote:



On 05/01/2017 11:36 AM, Pranith Kumar Karampuri wrote:



On Tue, May 2, 2017 at 12:04 AM, Gandalf Corvotempesta
> wrote:

2017-05-01 20:30 GMT+02:00 Shyam >:
> Yes, as a matter of fact, you can do this today using the CLI
and creating
> nx2 instead of 1x2. 'n' is best decided by you, depending on the
growth
> potential of your cluster, as at some point 'n' wont be enough
if you grow
> by some nodes.
>
> But, when a brick is replaced we will fail to address "(a)
ability to retain
> replication/availability levels" as we support only homogeneous
replication
> counts across all DHT subvols. (I could be corrected on this
when using
> replace-brick though)


Yes, but this is error prone.


Why?



Because it's done by humans.


Fair point. If Gandalf concurs, we will add this to our "+1 scaling" 
feature effort (not yet on github as an issue).






I'm still thinking that saving (I don't know where, I don't know how)
a mapping between
files and bricks would solve many issues and add much more
flexibility.




--
Pranith


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users




___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:36 GMT+02:00 Pranith Kumar Karampuri :
> Why?

Because you have to manually replace bricks with the newer one, format
the older one and add it back.
What happens if, by mistake, we replace the older brick with another
brick on the same disk ?

Currently you have to only check proper placement based on server,
with this workaround you also have
to check for brick placement on each disk. You add a level and thus
you are incrementing the moving parts and
operations that may  go wrong.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Shyam

On 05/01/2017 02:42 PM, Pranith Kumar Karampuri wrote:



On Tue, May 2, 2017 at 12:07 AM, Shyam > wrote:

On 05/01/2017 02:23 PM, Pranith Kumar Karampuri wrote:



On Mon, May 1, 2017 at 11:43 PM, Shyam 
>> wrote:

On 05/01/2017 02:00 PM, Pranith Kumar Karampuri wrote:

Splitting the bricks need not be a post factum
decision, we can
start with larger brick counts, on a given node/disk
count, and
hence spread these bricks to newer nodes/bricks as
they are
added.


Let's say we have 1 disk, we format it with say XFS and that
becomes a
brick at the moment. Just curious, what will be the
relationship
between
brick to disk in this case(If we leave out LVM for this
example)?


I would assume the relation is brick to provided FS
directory (not
brick to disk, we do not control that at the moment, other than
providing best practices around the same).


Hmmm... as per my understanding, if we do this then 'df' I guess
will
report wrong values? available-size/free-size etc will be
counted more
than once?


This is true even today, if anyone uses 2 bricks from the same mount.


That is the reason why documentation is the way it is as far as I can
remember.



I forgot a converse though, we could take a disk and partition it
(LVM thinp volumes) and use each of those partitions as bricks,
avoiding the problem of df double counting. Further thinp will help
us expand available space to other bricks on the same disk, as we
destroy older bricks or create new ones to accommodate the moving
pieces (needs more careful thought though, but for sure is a
nightmare without thinp).

I am not so much a fan of large number of thinp partitions, so as
long as that is reasonably in control, we can possibly still use it.
The big advantage though is, we nuke a thinp volume when the brick
that uses that partition, moves out of that disk, and we get the
space back, rather than having or to something akin to rm -rf on the
backend to reclaim space.


Other way to achieve the same is to leverage the quota functionality of
counting how much size is used under a directory.


Yes, I think this is the direction to solve the 2 bricks on a single FS 
as well. Also, IMO, the weight of accounting at each directory level 
that quota brings in seems/is heavyweight to solve just *this* problem.










Today, gluster takes in a directory on host as a brick, and
assuming
we retain that, we would need to split this into multiple
sub-dirs
and use each sub-dir as a brick internally.

All these sub-dirs thus created are part of the same volume
(due to
our current snapshot mapping requirements).




--
Pranith




--
Pranith

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Shyam

On 05/01/2017 02:36 PM, Pranith Kumar Karampuri wrote:



On Tue, May 2, 2017 at 12:04 AM, Gandalf Corvotempesta
> wrote:

2017-05-01 20:30 GMT+02:00 Shyam >:
> Yes, as a matter of fact, you can do this today using the CLI and creating
> nx2 instead of 1x2. 'n' is best decided by you, depending on the growth
> potential of your cluster, as at some point 'n' wont be enough if you grow
> by some nodes.
>
> But, when a brick is replaced we will fail to address "(a) ability to 
retain
> replication/availability levels" as we support only homogeneous 
replication
> counts across all DHT subvols. (I could be corrected on this when using
> replace-brick though)


Yes, but this is error prone.


Why?


To add to Pranith's question, (and to touch a raw nerve, my apologies) 
there is no rebalance in this situation (yet), if you notice.


I do agree that for the duration a brick is replaced its replication 
count is down by 1, is that your concern? In which case I do note that 
without (a) above, availability is at risk during the operation. Which 
needs other strategies/changes to ensure tolerance to errors/faults.






I'm still thinking that saving (I don't know where, I don't know how)
a mapping between
files and bricks would solve many issues and add much more flexibility.




--
Pranith

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Joe Julian



On 05/01/2017 11:36 AM, Pranith Kumar Karampuri wrote:



On Tue, May 2, 2017 at 12:04 AM, Gandalf Corvotempesta 
> wrote:


2017-05-01 20:30 GMT+02:00 Shyam >:
> Yes, as a matter of fact, you can do this today using the CLI
and creating
> nx2 instead of 1x2. 'n' is best decided by you, depending on the
growth
> potential of your cluster, as at some point 'n' wont be enough
if you grow
> by some nodes.
>
> But, when a brick is replaced we will fail to address "(a)
ability to retain
> replication/availability levels" as we support only homogeneous
replication
> counts across all DHT subvols. (I could be corrected on this
when using
> replace-brick though)


Yes, but this is error prone.


Why?



Because it's done by humans.



I'm still thinking that saving (I don't know where, I don't know how)
a mapping between
files and bricks would solve many issues and add much more
flexibility.




--
Pranith


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Tue, May 2, 2017 at 12:07 AM, Shyam  wrote:

> On 05/01/2017 02:23 PM, Pranith Kumar Karampuri wrote:
>
>>
>>
>> On Mon, May 1, 2017 at 11:43 PM, Shyam > > wrote:
>>
>> On 05/01/2017 02:00 PM, Pranith Kumar Karampuri wrote:
>>
>> Splitting the bricks need not be a post factum decision, we
>> can
>> start with larger brick counts, on a given node/disk count,
>> and
>> hence spread these bricks to newer nodes/bricks as they are
>> added.
>>
>>
>> Let's say we have 1 disk, we format it with say XFS and that
>> becomes a
>> brick at the moment. Just curious, what will be the relationship
>> between
>> brick to disk in this case(If we leave out LVM for this example)?
>>
>>
>> I would assume the relation is brick to provided FS directory (not
>> brick to disk, we do not control that at the moment, other than
>> providing best practices around the same).
>>
>>
>> Hmmm... as per my understanding, if we do this then 'df' I guess will
>> report wrong values? available-size/free-size etc will be counted more
>> than once?
>>
>
> This is true even today, if anyone uses 2 bricks from the same mount.
>

That is the reason why documentation is the way it is as far as I can
remember.


>
> I forgot a converse though, we could take a disk and partition it (LVM
> thinp volumes) and use each of those partitions as bricks, avoiding the
> problem of df double counting. Further thinp will help us expand available
> space to other bricks on the same disk, as we destroy older bricks or
> create new ones to accommodate the moving pieces (needs more careful
> thought though, but for sure is a nightmare without thinp).
>
> I am not so much a fan of large number of thinp partitions, so as long as
> that is reasonably in control, we can possibly still use it. The big
> advantage though is, we nuke a thinp volume when the brick that uses that
> partition, moves out of that disk, and we get the space back, rather than
> having or to something akin to rm -rf on the backend to reclaim space.


Other way to achieve the same is to leverage the quota functionality of
counting how much size is used under a directory.


>
>
>
>>
>>
>> Today, gluster takes in a directory on host as a brick, and assuming
>> we retain that, we would need to split this into multiple sub-dirs
>> and use each sub-dir as a brick internally.
>>
>> All these sub-dirs thus created are part of the same volume (due to
>> our current snapshot mapping requirements).
>>
>>
>>
>>
>> --
>> Pranith
>>
>


-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Shyam

On 05/01/2017 02:23 PM, Pranith Kumar Karampuri wrote:



On Mon, May 1, 2017 at 11:43 PM, Shyam > wrote:

On 05/01/2017 02:00 PM, Pranith Kumar Karampuri wrote:

Splitting the bricks need not be a post factum decision, we can
start with larger brick counts, on a given node/disk count, and
hence spread these bricks to newer nodes/bricks as they are
added.


Let's say we have 1 disk, we format it with say XFS and that
becomes a
brick at the moment. Just curious, what will be the relationship
between
brick to disk in this case(If we leave out LVM for this example)?


I would assume the relation is brick to provided FS directory (not
brick to disk, we do not control that at the moment, other than
providing best practices around the same).


Hmmm... as per my understanding, if we do this then 'df' I guess will
report wrong values? available-size/free-size etc will be counted more
than once?


This is true even today, if anyone uses 2 bricks from the same mount.

I forgot a converse though, we could take a disk and partition it (LVM 
thinp volumes) and use each of those partitions as bricks, avoiding the 
problem of df double counting. Further thinp will help us expand 
available space to other bricks on the same disk, as we destroy older 
bricks or create new ones to accommodate the moving pieces (needs more 
careful thought though, but for sure is a nightmare without thinp).


I am not so much a fan of large number of thinp partitions, so as long 
as that is reasonably in control, we can possibly still use it. The big 
advantage though is, we nuke a thinp volume when the brick that uses 
that partition, moves out of that disk, and we get the space back, 
rather than having or to something akin to rm -rf on the backend to 
reclaim space.






Today, gluster takes in a directory on host as a brick, and assuming
we retain that, we would need to split this into multiple sub-dirs
and use each sub-dir as a brick internally.

All these sub-dirs thus created are part of the same volume (due to
our current snapshot mapping requirements).




--
Pranith

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Tue, May 2, 2017 at 12:04 AM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> 2017-05-01 20:30 GMT+02:00 Shyam :
> > Yes, as a matter of fact, you can do this today using the CLI and
> creating
> > nx2 instead of 1x2. 'n' is best decided by you, depending on the growth
> > potential of your cluster, as at some point 'n' wont be enough if you
> grow
> > by some nodes.
> >
> > But, when a brick is replaced we will fail to address "(a) ability to
> retain
> > replication/availability levels" as we support only homogeneous
> replication
> > counts across all DHT subvols. (I could be corrected on this when using
> > replace-brick though)
>
>
> Yes, but this is error prone.
>

Why?


>
> I'm still thinking that saving (I don't know where, I don't know how)
> a mapping between
> files and bricks would solve many issues and add much more flexibility.
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:30 GMT+02:00 Shyam :
> Yes, as a matter of fact, you can do this today using the CLI and creating
> nx2 instead of 1x2. 'n' is best decided by you, depending on the growth
> potential of your cluster, as at some point 'n' wont be enough if you grow
> by some nodes.
>
> But, when a brick is replaced we will fail to address "(a) ability to retain
> replication/availability levels" as we support only homogeneous replication
> counts across all DHT subvols. (I could be corrected on this when using
> replace-brick though)


Yes, but this is error prone.

I'm still thinking that saving (I don't know where, I don't know how)
a mapping between
files and bricks would solve many issues and add much more flexibility.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Shyam

On 05/01/2017 02:25 PM, Gandalf Corvotempesta wrote:

2017-05-01 20:22 GMT+02:00 Shyam :

Brick splitting (I think was first proposed by Jeff Darcy) is to create more
bricks out of given storage backends. IOW, instead of using a given brick as
is, create sub-dirs and use them as bricks.

Hence, given 2 local FS end points by the user (say), instead of creating a
1x2 volume, create a nx2 volume, with n sub-dirs within the given local FS
end points as the bricks themselves.

Hence, this gives us n units to work with than just one, helping with issues
like +1 scaling, among others.


So, with just one disk, you'll be able to do some replacement like
Joe's solution
for adding a single brick regardless the replica count



Yes, as a matter of fact, you can do this today using the CLI and 
creating nx2 instead of 1x2. 'n' is best decided by you, depending on 
the growth potential of your cluster, as at some point 'n' wont be 
enough if you grow by some nodes.


But, when a brick is replaced we will fail to address "(a) ability to 
retain replication/availability levels" as we support only homogeneous 
replication counts across all DHT subvols. (I could be corrected on this 
when using replace-brick though)

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:22 GMT+02:00 Shyam :
> Brick splitting (I think was first proposed by Jeff Darcy) is to create more
> bricks out of given storage backends. IOW, instead of using a given brick as
> is, create sub-dirs and use them as bricks.
>
> Hence, given 2 local FS end points by the user (say), instead of creating a
> 1x2 volume, create a nx2 volume, with n sub-dirs within the given local FS
> end points as the bricks themselves.
>
> Hence, this gives us n units to work with than just one, helping with issues
> like +1 scaling, among others.

So, with just one disk, you'll be able to do some replacement like
Joe's solution
for adding a single brick regardless the replica count
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Mon, May 1, 2017 at 11:43 PM, Shyam  wrote:

> On 05/01/2017 02:00 PM, Pranith Kumar Karampuri wrote:
>
>> Splitting the bricks need not be a post factum decision, we can
>> start with larger brick counts, on a given node/disk count, and
>> hence spread these bricks to newer nodes/bricks as they are added.
>>
>>
>> Let's say we have 1 disk, we format it with say XFS and that becomes a
>> brick at the moment. Just curious, what will be the relationship between
>> brick to disk in this case(If we leave out LVM for this example)?
>>
>
> I would assume the relation is brick to provided FS directory (not brick
> to disk, we do not control that at the moment, other than providing best
> practices around the same).
>

Hmmm... as per my understanding, if we do this then 'df' I guess will
report wrong values? available-size/free-size etc will be counted more than
once?


>
> Today, gluster takes in a directory on host as a brick, and assuming we
> retain that, we would need to split this into multiple sub-dirs and use
> each sub-dir as a brick internally.
>
> All these sub-dirs thus created are part of the same volume (due to our
> current snapshot mapping requirements).
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:08 GMT+02:00 Pranith Kumar Karampuri :
> Filename can be renamed and then we lost the link because hash will be
> different. Anyways all these kinds of problems are already solved in
> distribute layer.

Filename can be renamed even with the current architecture.
How do you change the GFID after file rename ? In the same way you can
re-hash the file.

> I am sorry at the moment with the given information I am not able to wrap my
> head around the solution you are trying to suggest :-(.

mine was just a POC.

tl;dr: if you can save a mapping between files and brick, inside the
gluster cluster,
you'll get much more flexibility, no SPOF and no need for dedicated
metadata servers.

> At the moment, brick-splitting, inversion of afr/dht has some merit in my
> mind, with tilt towards any solution that avoids this inversion and still
> get the desired benefits.

What is brick-splitting ? Any docs about this ?
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Shyam

On 05/01/2017 01:52 PM, Gandalf Corvotempesta wrote:

2017-05-01 19:50 GMT+02:00 Shyam :

Splitting the bricks need not be a post factum decision, we can start with
larger brick counts, on a given node/disk count, and hence spread these
bricks to newer nodes/bricks as they are added.

If I understand the ceph PG count, it works on a similar notion, till the
cluster grows beyond the initial PG count (set for the pool) at which point
there is a lot more data movement (as the pg count has to be increased, and
hence existing PGs need to be further partitioned)


Exactly.
Last time i've used ceph, the PGs worked in a similiar way.



Expanding on this notion, the considered brick-splitting needs some 
other enhancements that can retain the replication/availability count, 
when moving existing bricks from one place to another. Thoughts on this 
are posted here [1].


In essence we are looking at "+1 scaling" (what that +1 is, a disk, a 
node,... is not set in stone yet, but converging at a disk is fine as an 
example). +1 scaling involves,

 a) ability to retain replication/availability levels
 b) optimal data movement
 c) optimal/acceptable time before which added capacity is available 
for use (by the consumer of the volume)

 d) is there a (d)? Would help in getting the requirement clear...

Brick splitting can help with (b) and (c), with strategies like [1] for 
(a), IMO.


Brick splitting also brings in complexities in DHT (like looking up 
everywhere, or the scale count of distribution that would increase). 
Such complexities have some solutions (like lookup optimize), and 
possibly needs some testing and bench marking to ensure it does not trip 
at this layer.


Also, brick multiplexing is already in the code base, which is to deal 
with large(r) number of bricks per node. Which would be the default with 
brick splitting and hence would help.


Further, the direction with JBR, needed a leader per node for a brick 
(so that clients are utilizing all server connections than just the 
leader) and was possibly the birth place for brick splitting thought.


Also, the ideas behind larger bucket counts for DHT2 than real bricks 
was to deal with (b).


Why I put this story together is to state 2 things,
- We realize that we need this, and have been working on strategies 
towards achieving the same
- We need the bits chained right, so that we can make this work and 
there is substantial work to be done here


Shyam

[1] Moving a brick in pure dist/replica/ec setup to another within or 
across nodes thoughts (my first comment on this issue, github does not 
have a comment index for me to point to the exact comment): 
https://github.com/gluster/glusterfs/issues/170

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 20:00 GMT+02:00 Pranith Kumar Karampuri :
> Let's say we have 1 disk, we format it with say XFS and that becomes a brick
> at the moment. Just curious, what will be the relationship between brick to
> disk in this case(If we leave out LVM for this example)?

No relation. You have to add that brick to the volume.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Mon, May 1, 2017 at 11:20 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> 2017-05-01 19:36 GMT+02:00 Pranith Kumar Karampuri :
> > To know GFID of file1 you must know where the file resides so that you
> can
> > do getxattr trusted.gfid on the file. So storing server/brick location on
> > gfid is not getting us much more information that what we already have.
>
> It was an example. You can use the same xattr solution based on a hash.
> A full-path for each volume is unique (obviously, you can't have two
> "/tmp/my/file" on the same volume), thus
> hashing that to something like SHA1("/tmp/my/file") will give you a
> unique name (50b73d9c5dfda264d3878860ed7b1295e104e8ae)
> You can use that unique file-name (stored somewhere like
> ".metedata/50b73d9c5dfda264d3878860ed7b1295e104e8ae" to store the
> xattr with proper file locations across the cluster.
>

Filename can be renamed and then we lost the link because hash will be
different. Anyways all these kinds of problems are already solved in
distribute layer.


>
> As long as you sync the ".metadata" directory across the trusted pool
> (or across all member regarding the affected volume),
> you should be able to get proper file location by looking for the xattr.
>
> This is just a very basic and stupid POC, i'm just trying to explain
> my reasoning.
>
I am sorry at the moment with the given information I am not able to wrap
my head around the solution you are trying to suggest :-(.

At the moment, brick-splitting, inversion of afr/dht has some merit in my
mind, with tilt towards any solution that avoids this inversion and still
get the desired benefits.

-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Mon, May 1, 2017 at 11:20 PM, Shyam  wrote:

> On 05/01/2017 01:13 PM, Pranith Kumar Karampuri wrote:
>
>>
>>
>> On Mon, May 1, 2017 at 10:42 PM, Pranith Kumar Karampuri
>> > wrote:
>>
>>
>>
>> On Mon, May 1, 2017 at 10:39 PM, Gandalf Corvotempesta
>> > > wrote:
>>
>> 2017-05-01 18:57 GMT+02:00 Pranith Kumar Karampuri
>> >:
>>
>> > Yes this is precisely what all the other SDS with metadata
>> servers kind of
>> > do. They kind of keep a map of on what all servers a particular
>> file/blob is
>> > stored in a metadata server.
>>
>> Not exactly. Other SDS has some servers dedicated to metadata and,
>> personally, I don't like that approach.
>>
>> > GlusterFS doesn't do that. In GlusterFS what
>> > bricks need to be replicated is always given and distribute
>> layer on top of
>> > these replication layer will do the job of distributing and
>> fetching the
>> > data. Because replication happens at a brick level and not at a
>> file level
>> > and distribute happens on top of replication and not at file
>> level. There
>> > isn't too much metadata that needs to be stored per file. Hence
>> no need for
>> > separate metadata servers.
>>
>> And this is great, that's why i'm talking about embedding a sort
>> of database
>> to be stored on all nodes. no metadata servers, only a mapping
>> between files
>> and servers.
>>
>> > If you know path of the file, you can always know where the
>> file is stored
>> > using pathinfo:
>> > Method-2 in the following link:
>> > https://gluster.readthedocs.io/en/latest/Troubleshooting/gfi
>> d-to-path/
>> > id-to-path/>
>> >
>> > You don't need any db.
>>
>> For the current gluster yes.
>> I'm talking about a different thing.
>>
>> In a RAID, you have data stored somewhere on the array, with
>> metadata
>> defining how this data should
>> be wrote or read. obviously, raid metadata must be stored in a
>> fixed
>> position, or you won't be able to read
>> that.
>>
>> Something similiar could be added in gluster (i don't know if it
>> would
>> be hard): you store a file mapping in a fixed
>> position in gluster, then all gluster clients will be able to know
>> where a file is by looking at this "metadata" stored in
>> the fixed position.
>>
>> Like ".gluster" directory. Gluster is using some "internal"
>> directories for internal operations (".shards", ".gluster",
>> ".trash")
>> A ".metadata" with file mapping would be hard to add ?
>>
>> > Basically what you want, if I understood correctly is:
>> > If we add a 3rd node with just one disk, the data should
>> automatically
>> > arrange itself splitting itself to 3 categories(Assuming
>> replica-2)
>> > 1) Files that are present in Node1, Node2
>> > 2) Files that are present in Node2, Node3
>> > 3) Files that are present in Node1, Node3
>> >
>> > As you can see we arrived at a contradiction where all the
>> nodes should have
>> > at least 2 bricks but there is only 1 disk. Hence the
>> contradiction. We
>> > can't do what you are asking without brick splitting. i.e. we
>> need to split
>> > the disk into 2 bricks.
>>
>
> Splitting the bricks need not be a post factum decision, we can start with
> larger brick counts, on a given node/disk count, and hence spread these
> bricks to newer nodes/bricks as they are added.
>

Let's say we have 1 disk, we format it with say XFS and that becomes a
brick at the moment. Just curious, what will be the relationship between
brick to disk in this case(If we leave out LVM for this example)?


> If I understand the ceph PG count, it works on a similar notion, till the
> cluster grows beyond the initial PG count (set for the pool) at which point
> there is a lot more data movement (as the pg count has to be increased, and
> hence existing PGs need to be further partitioned) . (just using ceph as an
> example, a similar approach exists for openstack swift with their partition
> power settings).
>
>
>> I don't think so.
>> Let's assume a replica 2.
>>
>> S1B1 + S2B1
>>
>> 1TB each, thus 1TB available (2TB/2)
>>
>> Adding a third 1TB disks should increase available space to
>> 1.5TB (3TB/2)
>>
>>
>> I agree it should. Question is how? What will be the resulting
>> brick-map?
>>
>>
>> I don't see any solution that we can do without at least 2 bricks on

Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 19:50 GMT+02:00 Shyam :
> Splitting the bricks need not be a post factum decision, we can start with
> larger brick counts, on a given node/disk count, and hence spread these
> bricks to newer nodes/bricks as they are added.
>
> If I understand the ceph PG count, it works on a similar notion, till the
> cluster grows beyond the initial PG count (set for the pool) at which point
> there is a lot more data movement (as the pg count has to be increased, and
> hence existing PGs need to be further partitioned)

Exactly.
Last time i've used ceph, the PGs worked in a similiar way.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Shyam

On 05/01/2017 01:13 PM, Pranith Kumar Karampuri wrote:



On Mon, May 1, 2017 at 10:42 PM, Pranith Kumar Karampuri
> wrote:



On Mon, May 1, 2017 at 10:39 PM, Gandalf Corvotempesta
> wrote:

2017-05-01 18:57 GMT+02:00 Pranith Kumar Karampuri
>:
> Yes this is precisely what all the other SDS with metadata servers 
kind of
> do. They kind of keep a map of on what all servers a particular 
file/blob is
> stored in a metadata server.

Not exactly. Other SDS has some servers dedicated to metadata and,
personally, I don't like that approach.

> GlusterFS doesn't do that. In GlusterFS what
> bricks need to be replicated is always given and distribute layer on 
top of
> these replication layer will do the job of distributing and fetching 
the
> data. Because replication happens at a brick level and not at a file 
level
> and distribute happens on top of replication and not at file level. 
There
> isn't too much metadata that needs to be stored per file. Hence no 
need for
> separate metadata servers.

And this is great, that's why i'm talking about embedding a sort
of database
to be stored on all nodes. no metadata servers, only a mapping
between files
and servers.

> If you know path of the file, you can always know where the file is 
stored
> using pathinfo:
> Method-2 in the following link:
> https://gluster.readthedocs.io/en/latest/Troubleshooting/gfid-to-path/

>
> You don't need any db.

For the current gluster yes.
I'm talking about a different thing.

In a RAID, you have data stored somewhere on the array, with
metadata
defining how this data should
be wrote or read. obviously, raid metadata must be stored in a fixed
position, or you won't be able to read
that.

Something similiar could be added in gluster (i don't know if it
would
be hard): you store a file mapping in a fixed
position in gluster, then all gluster clients will be able to know
where a file is by looking at this "metadata" stored in
the fixed position.

Like ".gluster" directory. Gluster is using some "internal"
directories for internal operations (".shards", ".gluster",
".trash")
A ".metadata" with file mapping would be hard to add ?

> Basically what you want, if I understood correctly is:
> If we add a 3rd node with just one disk, the data should automatically
> arrange itself splitting itself to 3 categories(Assuming replica-2)
> 1) Files that are present in Node1, Node2
> 2) Files that are present in Node2, Node3
> 3) Files that are present in Node1, Node3
>
> As you can see we arrived at a contradiction where all the nodes 
should have
> at least 2 bricks but there is only 1 disk. Hence the contradiction. 
We
> can't do what you are asking without brick splitting. i.e. we need to 
split
> the disk into 2 bricks.


Splitting the bricks need not be a post factum decision, we can start 
with larger brick counts, on a given node/disk count, and hence spread 
these bricks to newer nodes/bricks as they are added.


If I understand the ceph PG count, it works on a similar notion, till 
the cluster grows beyond the initial PG count (set for the pool) at 
which point there is a lot more data movement (as the pg count has to be 
increased, and hence existing PGs need to be further partitioned) . 
(just using ceph as an example, a similar approach exists for openstack 
swift with their partition power settings).




I don't think so.
Let's assume a replica 2.

S1B1 + S2B1

1TB each, thus 1TB available (2TB/2)

Adding a third 1TB disks should increase available space to
1.5TB (3TB/2)


I agree it should. Question is how? What will be the resulting
brick-map?


I don't see any solution that we can do without at least 2 bricks on
each of the 3 servers.




--
Pranith




--
Pranith


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 19:36 GMT+02:00 Pranith Kumar Karampuri :
> To know GFID of file1 you must know where the file resides so that you can
> do getxattr trusted.gfid on the file. So storing server/brick location on
> gfid is not getting us much more information that what we already have.

It was an example. You can use the same xattr solution based on a hash.
A full-path for each volume is unique (obviously, you can't have two
"/tmp/my/file" on the same volume), thus
hashing that to something like SHA1("/tmp/my/file") will give you a
unique name (50b73d9c5dfda264d3878860ed7b1295e104e8ae)
You can use that unique file-name (stored somewhere like
".metedata/50b73d9c5dfda264d3878860ed7b1295e104e8ae" to store the
xattr with proper file locations across the cluster.

As long as you sync the ".metadata" directory across the trusted pool
(or across all member regarding the affected volume),
you should be able to get proper file location by looking for the xattr.

This is just a very basic and stupid POC, i'm just trying to explain
my reasoning.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Mon, May 1, 2017 at 10:57 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> 2017-05-01 19:12 GMT+02:00 Pranith Kumar Karampuri :
> > I agree it should. Question is how? What will be the resulting brick-map?
>
> This is why i'm suggesting to add a file mapping somewhere.
> You could also use xattr for this:
>
> "file1" is mapped to GFID, then, as xattr for that GFID, you could
> save the server/brick location, it this
> way you always know where a file is.
>

To know GFID of file1 you must know where the file resides so that you can
do getxattr trusted.gfid on the file. So storing server/brick location on
gfid is not getting us much more information that what we already have.


>
> To keep it simple for non-developers like me (this is wrong, it's a
> simplification):
> "/tmp/file1" hashes to 306040e474f199e7969ec266afd10d93
>
> hash starts with "3" thus is located on brick3
>
> You don't need any metadata for this, the hash algoritm is the only
> thing you need.
>
> But if you store the file location mapping somewhere (in example as
> xattr for the GFID file) you can look for the file without using the
> hash algoritm location.
>
> ORIG_FILE="/tmp/file1"
>


> GFID="306040e474f199e7969ec266afd10d93" << How did we get GFID?
>

May be I didn't understand your solution properly.


> FILE_LOCATION=$(getfattr -n "file_location" $GFID)
>
> if $FILE_LOCATION
>read from $FILE_LOCATION
> else
>read from original algoritm
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Mon, May 1, 2017 at 10:42 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On Mon, May 1, 2017 at 10:39 PM, Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com> wrote:
>
>> 2017-05-01 18:57 GMT+02:00 Pranith Kumar Karampuri :
>> > Yes this is precisely what all the other SDS with metadata servers kind
>> of
>> > do. They kind of keep a map of on what all servers a particular
>> file/blob is
>> > stored in a metadata server.
>>
>> Not exactly. Other SDS has some servers dedicated to metadata and,
>> personally, I don't like that approach.
>>
>> > GlusterFS doesn't do that. In GlusterFS what
>> > bricks need to be replicated is always given and distribute layer on
>> top of
>> > these replication layer will do the job of distributing and fetching the
>> > data. Because replication happens at a brick level and not at a file
>> level
>> > and distribute happens on top of replication and not at file level.
>> There
>> > isn't too much metadata that needs to be stored per file. Hence no need
>> for
>> > separate metadata servers.
>>
>> And this is great, that's why i'm talking about embedding a sort of
>> database
>> to be stored on all nodes. no metadata servers, only a mapping between
>> files
>> and servers.
>>
>> > If you know path of the file, you can always know where the file is
>> stored
>> > using pathinfo:
>> > Method-2 in the following link:
>> > https://gluster.readthedocs.io/en/latest/Troubleshooting/gfid-to-path/
>> >
>> > You don't need any db.
>>
>> For the current gluster yes.
>> I'm talking about a different thing.
>>
>> In a RAID, you have data stored somewhere on the array, with metadata
>> defining how this data should
>> be wrote or read. obviously, raid metadata must be stored in a fixed
>> position, or you won't be able to read
>> that.
>>
>> Something similiar could be added in gluster (i don't know if it would
>> be hard): you store a file mapping in a fixed
>> position in gluster, then all gluster clients will be able to know
>> where a file is by looking at this "metadata" stored in
>> the fixed position.
>>
>> Like ".gluster" directory. Gluster is using some "internal"
>> directories for internal operations (".shards", ".gluster", ".trash")
>> A ".metadata" with file mapping would be hard to add ?
>>
>> > Basically what you want, if I understood correctly is:
>> > If we add a 3rd node with just one disk, the data should automatically
>> > arrange itself splitting itself to 3 categories(Assuming replica-2)
>> > 1) Files that are present in Node1, Node2
>> > 2) Files that are present in Node2, Node3
>> > 3) Files that are present in Node1, Node3
>> >
>> > As you can see we arrived at a contradiction where all the nodes should
>> have
>> > at least 2 bricks but there is only 1 disk. Hence the contradiction. We
>> > can't do what you are asking without brick splitting. i.e. we need to
>> split
>> > the disk into 2 bricks.
>>
>> I don't think so.
>> Let's assume a replica 2.
>>
>> S1B1 + S2B1
>>
>> 1TB each, thus 1TB available (2TB/2)
>>
>> Adding a third 1TB disks should increase available space to 1.5TB (3TB/2)
>>
>
> I agree it should. Question is how? What will be the resulting brick-map?
>

I don't see any solution that we can do without at least 2 bricks on each
of the 3 servers.


>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Mon, May 1, 2017 at 10:39 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> 2017-05-01 18:57 GMT+02:00 Pranith Kumar Karampuri :
> > Yes this is precisely what all the other SDS with metadata servers kind
> of
> > do. They kind of keep a map of on what all servers a particular
> file/blob is
> > stored in a metadata server.
>
> Not exactly. Other SDS has some servers dedicated to metadata and,
> personally, I don't like that approach.
>
> > GlusterFS doesn't do that. In GlusterFS what
> > bricks need to be replicated is always given and distribute layer on top
> of
> > these replication layer will do the job of distributing and fetching the
> > data. Because replication happens at a brick level and not at a file
> level
> > and distribute happens on top of replication and not at file level. There
> > isn't too much metadata that needs to be stored per file. Hence no need
> for
> > separate metadata servers.
>
> And this is great, that's why i'm talking about embedding a sort of
> database
> to be stored on all nodes. no metadata servers, only a mapping between
> files
> and servers.
>
> > If you know path of the file, you can always know where the file is
> stored
> > using pathinfo:
> > Method-2 in the following link:
> > https://gluster.readthedocs.io/en/latest/Troubleshooting/gfid-to-path/
> >
> > You don't need any db.
>
> For the current gluster yes.
> I'm talking about a different thing.
>
> In a RAID, you have data stored somewhere on the array, with metadata
> defining how this data should
> be wrote or read. obviously, raid metadata must be stored in a fixed
> position, or you won't be able to read
> that.
>
> Something similiar could be added in gluster (i don't know if it would
> be hard): you store a file mapping in a fixed
> position in gluster, then all gluster clients will be able to know
> where a file is by looking at this "metadata" stored in
> the fixed position.
>
> Like ".gluster" directory. Gluster is using some "internal"
> directories for internal operations (".shards", ".gluster", ".trash")
> A ".metadata" with file mapping would be hard to add ?
>
> > Basically what you want, if I understood correctly is:
> > If we add a 3rd node with just one disk, the data should automatically
> > arrange itself splitting itself to 3 categories(Assuming replica-2)
> > 1) Files that are present in Node1, Node2
> > 2) Files that are present in Node2, Node3
> > 3) Files that are present in Node1, Node3
> >
> > As you can see we arrived at a contradiction where all the nodes should
> have
> > at least 2 bricks but there is only 1 disk. Hence the contradiction. We
> > can't do what you are asking without brick splitting. i.e. we need to
> split
> > the disk into 2 bricks.
>
> I don't think so.
> Let's assume a replica 2.
>
> S1B1 + S2B1
>
> 1TB each, thus 1TB available (2TB/2)
>
> Adding a third 1TB disks should increase available space to 1.5TB (3TB/2)
>

I agree it should. Question is how? What will be the resulting brick-map?


-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 18:57 GMT+02:00 Pranith Kumar Karampuri :
> Yes this is precisely what all the other SDS with metadata servers kind of
> do. They kind of keep a map of on what all servers a particular file/blob is
> stored in a metadata server.

Not exactly. Other SDS has some servers dedicated to metadata and,
personally, I don't like that approach.

> GlusterFS doesn't do that. In GlusterFS what
> bricks need to be replicated is always given and distribute layer on top of
> these replication layer will do the job of distributing and fetching the
> data. Because replication happens at a brick level and not at a file level
> and distribute happens on top of replication and not at file level. There
> isn't too much metadata that needs to be stored per file. Hence no need for
> separate metadata servers.

And this is great, that's why i'm talking about embedding a sort of database
to be stored on all nodes. no metadata servers, only a mapping between files
and servers.

> If you know path of the file, you can always know where the file is stored
> using pathinfo:
> Method-2 in the following link:
> https://gluster.readthedocs.io/en/latest/Troubleshooting/gfid-to-path/
>
> You don't need any db.

For the current gluster yes.
I'm talking about a different thing.

In a RAID, you have data stored somewhere on the array, with metadata
defining how this data should
be wrote or read. obviously, raid metadata must be stored in a fixed
position, or you won't be able to read
that.

Something similiar could be added in gluster (i don't know if it would
be hard): you store a file mapping in a fixed
position in gluster, then all gluster clients will be able to know
where a file is by looking at this "metadata" stored in
the fixed position.

Like ".gluster" directory. Gluster is using some "internal"
directories for internal operations (".shards", ".gluster", ".trash")
A ".metadata" with file mapping would be hard to add ?

> Basically what you want, if I understood correctly is:
> If we add a 3rd node with just one disk, the data should automatically
> arrange itself splitting itself to 3 categories(Assuming replica-2)
> 1) Files that are present in Node1, Node2
> 2) Files that are present in Node2, Node3
> 3) Files that are present in Node1, Node3
>
> As you can see we arrived at a contradiction where all the nodes should have
> at least 2 bricks but there is only 1 disk. Hence the contradiction. We
> can't do what you are asking without brick splitting. i.e. we need to split
> the disk into 2 bricks.

I don't think so.
Let's assume a replica 2.

S1B1 + S2B1

1TB each, thus 1TB available (2TB/2)

Adding a third 1TB disks should increase available space to 1.5TB (3TB/2)
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Sun, Apr 30, 2017 at 1:43 PM,  wrote:

> > So I was a little but luck. If I has all the hardware part, probably i
> > would be firesd after causing data loss by using a software marked as
> stable
>
> Yes, we lost our data last year to this bug, and it wasn't a test cluster.
> We still hear from it from our clients to this day.
>
> > Is known that this feature is causing data loss and there is no evidence
> or
> > no warning in official docs.
> >
>
> I was (I believe) the first one to run into the bug, it happens and I knew
> it
> was a risk when installing gluster.
> But since then I didn't see any warnings anywhere except here, I agree
> with you that it should be mentionned in big bold letters on the site.
>

After discussion with 3.10 release maintainer, this was added in
release-notes of 3.10.1:
https://github.com/gluster/glusterfs/blob/release-3.10/doc/release-notes/3.10.1.md

But you are right in the sense that just this much documentation doesn't do
enough justice.


>
> Might even be worth adding a warning directly on the cli when trying to
> add bricks if sharding is enabled, to make sure no-one will destroy a
> whole cluster for a known bug.
>

Want to raise a bug on 'distribute' component? If you don't have the time
let me know I will do the needful.


>
>
> > Il 30 apr 2017 12:14 AM,  ha scritto:
> >
> > > I have to agree though, you keep acting like a customer.
> > > If you don't like what the developers focus on, you are free to
> > > try and offer a bounty to motivate someone to look at what you want,
> > > or even better : go and buy a license for one of gluster's commercial
> > > alternatives.
> > >
> > >
> > > On Sat, Apr 29, 2017 at 11:43:54PM +0200, Gandalf Corvotempesta wrote:
> > > > I'm pretty sure that I'll be able to sleep well even after your
> block.
> > > >
> > > > Il 29 apr 2017 11:28 PM, "Joe Julian"  ha
> scritto:
> > > >
> > > > > No, you proposed a wish. A feature needs described behavior,
> certainly
> > > a
> > > > > lot more than "it should just know what I want it to do".
> > > > >
> > > > > I'm done. You can continue to feel entitled here on the mailing
> list.
> > > I'll
> > > > > just set my filters to bitbucket anything from you.
> > > > >
> > > > > On 04/29/2017 01:00 PM, Gandalf Corvotempesta wrote:
> > > > >
> > > > > I repeat: I've just proposed a feature
> > > > > I'm not a C developer and I don't know gluster internals, so I
> can't
> > > > > provide details
> > > > >
> > > > > I've just asked if simplifying the add brick process is something
> that
> > > > > developers are interested to add
> > > > >
> > > > > Il 29 apr 2017 9:34 PM, "Joe Julian"  ha
> > > scritto:
> > > > >
> > > > >> What I said publicly in another email ... but not to call out my
> > > > >> perception of your behavior publicly if also like to say:
> > > > >>
> > > > >> Acting adversarial doesn't make anybody want to help, especially
> not
> > > me
> > > > >> and I'm the user community's biggest proponent.
> > > > >>
> > > > >> On April 29, 2017 11:08:45 AM PDT, Gandalf Corvotempesta <
> > > > >> gandalf.corvotempe...@gmail.com> wrote:
> > > > >>>
> > > > >>> Mine was a suggestion
> > > > >>> Fell free to ignore was gluster users has to say and still keep
> going
> > > > >>> though your way
> > > > >>>
> > > > >>> Usually, open source project tends to follow users suggestions
> > > > >>>
> > > > >>> Il 29 apr 2017 5:32 PM, "Joe Julian"  ha
> > > scritto:
> > > > >>>
> > > >  Since this is an open source community project, not a company
> > > product,
> > > >  feature requests like these are welcome, but would be more
> welcome
> > > with
> > > >  either code or at least a well described method. Broad asks like
> > > these are
> > > >  of little value, imho.
> > > > 
> > > > 
> > > >  On 04/29/2017 07:12 AM, Gandalf Corvotempesta wrote:
> > > > 
> > > > > Anyway, the proposed workaround:
> > > > > https://joejulian.name/blog/how-to-expand-glusterfs-replicat
> > > > > ed-clusters-by-one-server/
> > > > > won't work with just a single volume made up of 2 replicated
> > > bricks.
> > > > > If I have a replica 2 volume with server1:brick1 and
> > > server2:brick1,
> > > > > how can I add server3:brick1 ?
> > > > > I don't have any bricks to "replace"
> > > > >
> > > > > This is something i would like to see implemented in gluster.
> > > > >
> > > > > 2017-04-29 16:08 GMT+02:00 Gandalf Corvotempesta
> > > > > :
> > > > >
> > > > >> 2017-04-24 10:21 GMT+02:00 Pranith Kumar Karampuri <
> > > > >> pkara...@redhat.com>:
> > > > >>
> > > > >>> Are you suggesting this process to be easier through
> commands,
> > > > >>> rather than
> > > > >>> for administrators to figure out how to place the data?
> > > > >>>
> > > > >>> [1] 

Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 18:30 GMT+02:00 Gandalf Corvotempesta
:
> Maybe a simple DB (just as an idea: sqlite, berkeleydb, ...) stored in
> a fixed location on gluster itself, being replicated across nodes.

Even better, embedding RocksDB with it's data directory stored in Gluster
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Mon, May 1, 2017 at 9:53 PM, Pranith Kumar Karampuri  wrote:

>
>
> On Sun, Apr 30, 2017 at 2:04 PM, Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com> wrote:
>
>> 2017-04-30 10:13 GMT+02:00  :
>> > I was (I believe) the first one to run into the bug, it happens and I
>> knew it
>> > was a risk when installing gluster.
>>
>> I know.
>>
>> > But since then I didn't see any warnings anywhere except here, I agree
>> > with you that it should be mentionned in big bold letters on the site.
>> >
>> > Might even be worth adding a warning directly on the cli when trying to
>> > add bricks if sharding is enabled, to make sure no-one will destroy a
>> > whole cluster for a known bug.
>>
>> Exactly. This is making me angry.
>>
>> Even $BigVendor usually release a security bulletin, in example:
>> https://support.citrix.com/article/CTX214305
>> https://support.citrix.com/article/CTX214768
>>
>> Immediatly after discovering that bug, a report was made available (on
>> official website, not on a mailinglist)
>> telling users which operations should be avoided until a fix is made.
>>
>> Gluster don't. There is a huge bug that isn't referenced in official docs.
>>
>> Is not acting like a customer, i'm just asking for some transparancy.
>>
>> Even if this is an open source project, nobody should play with user data.
>> This bug (or, better, these bugs) are know from time, an there is NO WORDS
>> in any official docs nor the web site.
>>
>> is not a rare bug, it *always* loose data when used with VMs and
>> sharding during a rebalance.
>> this feature should be disabled or users should be warned somewhere on
>> web site and not forcing
>> all of them to look through ML archives.
>>
>> Anyway, i've just asked for a feature like simplifying the add-brick
>> process. Gluster devs are free to ignore it
>> but if they are interest in something similiar, i'm willing to provide
>> more info (if I can, i'm not a developer)
>>
>> I really love gluster, lack of metadata server is awesome, files
>> stored "verbatim" with no alteration is amazing (almost all SDS alter
>> file when stored on disks)
>> but being forced to add bricks in a multiple of replica count is
>> making gluster very expesive (yes, there is a workaround with multiple
>> steps, but this is prone to
>> error, thus i'm asking to simplify this phase allowing users to add a
>> single brick to a replica X volume with automatic member replacement
>> and rebalance)
>>
>
> IMHO It is difficult to implement what you are asking for without metadata
> server which stores where each replica is stored.
>

Anther way is probably loading replica on top of distribute, but that is
architecture change and may need lot of testing and fixing corner cases. I
don't think it is easier to get this done.


>
>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
2017-05-01 18:23 GMT+02:00 Pranith Kumar Karampuri :
> IMHO It is difficult to implement what you are asking for without metadata
> server which stores where each replica is stored.

Can't you distribute a sort of file mapping to each node ?
AFAIK , gluster already has some metadata stored in the cluster, what
is missing is a mapping between each file/shard and brick.

Maybe a simple DB (just as an idea: sqlite, berkeleydb, ...) stored in
a fixed location on gluster itself, being replicated across nodes.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Pranith Kumar Karampuri
On Sun, Apr 30, 2017 at 2:04 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> 2017-04-30 10:13 GMT+02:00  :
> > I was (I believe) the first one to run into the bug, it happens and I
> knew it
> > was a risk when installing gluster.
>
> I know.
>
> > But since then I didn't see any warnings anywhere except here, I agree
> > with you that it should be mentionned in big bold letters on the site.
> >
> > Might even be worth adding a warning directly on the cli when trying to
> > add bricks if sharding is enabled, to make sure no-one will destroy a
> > whole cluster for a known bug.
>
> Exactly. This is making me angry.
>
> Even $BigVendor usually release a security bulletin, in example:
> https://support.citrix.com/article/CTX214305
> https://support.citrix.com/article/CTX214768
>
> Immediatly after discovering that bug, a report was made available (on
> official website, not on a mailinglist)
> telling users which operations should be avoided until a fix is made.
>
> Gluster don't. There is a huge bug that isn't referenced in official docs.
>
> Is not acting like a customer, i'm just asking for some transparancy.
>
> Even if this is an open source project, nobody should play with user data.
> This bug (or, better, these bugs) are know from time, an there is NO WORDS
> in any official docs nor the web site.
>
> is not a rare bug, it *always* loose data when used with VMs and
> sharding during a rebalance.
> this feature should be disabled or users should be warned somewhere on
> web site and not forcing
> all of them to look through ML archives.
>
> Anyway, i've just asked for a feature like simplifying the add-brick
> process. Gluster devs are free to ignore it
> but if they are interest in something similiar, i'm willing to provide
> more info (if I can, i'm not a developer)
>
> I really love gluster, lack of metadata server is awesome, files
> stored "verbatim" with no alteration is amazing (almost all SDS alter
> file when stored on disks)
> but being forced to add bricks in a multiple of replica count is
> making gluster very expesive (yes, there is a workaround with multiple
> steps, but this is prone to
> error, thus i'm asking to simplify this phase allowing users to add a
> single brick to a replica X volume with automatic member replacement
> and rebalance)
>

IMHO It is difficult to implement what you are asking for without metadata
server which stores where each replica is stored.


> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>



-- 
Pranith
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Don't allow data loss via add-brick (was Re: Add single server)

2017-05-01 Thread Joe Julian


On 04/30/2017 01:13 AM, lemonni...@ulrar.net wrote:

So I was a little but luck. If I has all the hardware part, probably i
would be firesd after causing data loss by using a software marked as stable

Yes, we lost our data last year to this bug, and it wasn't a test cluster.
We still hear from it from our clients to this day.


Is known that this feature is causing data loss and there is no evidence or
no warning in official docs.


I was (I believe) the first one to run into the bug, it happens and I knew it
was a risk when installing gluster.
But since then I didn't see any warnings anywhere except here, I agree
with you that it should be mentionned in big bold letters on the site.

Might even be worth adding a warning directly on the cli when trying to
add bricks if sharding is enabled, to make sure no-one will destroy a
whole cluster for a known bug.


I absolutely agree - or, just disable the ability to add-brick with 
sharding enabled. Losing data should never be allowed.

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Add single server

2017-05-01 Thread Gandalf Corvotempesta
Il 29 apr 2017 4:12 PM, "Gandalf Corvotempesta" <
gandalf.corvotempe...@gmail.com> ha scritto:

Anyway, the proposed workaround:
https://joejulian.name/blog/how-to-expand-glusterfs-
replicated-clusters-by-one-server/
won't work with just a single volume made up of 2 replicated bricks.
If I have a replica 2 volume with server1:brick1 and server2:brick1,
how can I add server3:brick1 ?
I don't have any bricks to "replace"


Can someone confirm this?
Is possible to use the method described by Joe even with only 3 bricks ?

What if I would like to add the fourth?

I'm really asking, not criticizing.
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Release 3.10.2: Scheduled for the 30th of April

2017-05-01 Thread Raghavendra Talur
On Mon, May 1, 2017 at 3:35 PM, Raghavendra Talur  wrote:
> We seem to have merged all the intended patches for 3.10.2 except for
> one. While we wait for [1] to be merged, this is a last chance for
> others to point out if any patch is missing. I have verified that no
> backports are missing in 3.10 when compared to 3.8.
>
> Please visit 
> https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-10-dashboard
> to check if any of your patches are pending merge.
>
> To make my work easier, if you could point out any major changes that
> happened for 3.10.2, please comment on
> https://review.gluster.org/#/c/17063/ .
>
> I am looking forward to make announcement on 3rd May and we should
> have builds/packages ready by then. We will be tagging by tomorrow and
> performing a final test with packages before making the announcement.

[1] https://review.gluster.org/#/c/17134/

>
> Thanks,
> Raghavendra Talur
>
> On Mon, Apr 17, 2017 at 10:31 AM, Raghavendra Talur  wrote:
>> Hi,
>>
>> It's time to prepare the 3.10.2 release, which falls on the 30th of
>> each month, and hence would be April-30th-2017 this time around.
>>
>> This mail is to call out the following,
>>
>> 1) Are there any pending *blocker* bugs that need to be tracked for
>> 3.10.2? If so mark them against the provided tracker [1] as blockers
>> for the release, or at the very least post them as a response to this
>> mail
>>
>> 2) Pending reviews in the 3.10 dashboard will be part of the release,
>> *iff* they pass regressions and have the review votes, so use the
>> dashboard [2] to check on the status of your patches to 3.10 and get
>> these going
>>
>> 3) I have made checks on what went into 3.8 post 3.10 release and if
>> these fixes are included in 3.10 branch, the status on this is *green*
>> as all fixes ported to 3.8, are ported to 3.10 as well
>>
>> 4) Empty release notes are posted here [3], if there are any specific
>> call outs for 3.10 beyond bugs, please update the review, or leave a
>> comment in the review, for me to pick it up
>>
>> Thanks,
>> Raghavendra Talur
>>
>> [1] Release bug tracker:
>> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.10.2
>>
>> [2] 3.10 review dashboard:
>> https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-10-dashboard
>>
>> [3] Release notes WIP: https://review.gluster.org/#/c/17063/
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] Release 3.10.2: Scheduled for the 30th of April

2017-05-01 Thread Raghavendra Talur
We seem to have merged all the intended patches for 3.10.2 except for
one. While we wait for [1] to be merged, this is a last chance for
others to point out if any patch is missing. I have verified that no
backports are missing in 3.10 when compared to 3.8.

Please visit 
https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-10-dashboard
to check if any of your patches are pending merge.

To make my work easier, if you could point out any major changes that
happened for 3.10.2, please comment on
https://review.gluster.org/#/c/17063/ .

I am looking forward to make announcement on 3rd May and we should
have builds/packages ready by then. We will be tagging by tomorrow and
performing a final test with packages before making the announcement.

Thanks,
Raghavendra Talur

On Mon, Apr 17, 2017 at 10:31 AM, Raghavendra Talur  wrote:
> Hi,
>
> It's time to prepare the 3.10.2 release, which falls on the 30th of
> each month, and hence would be April-30th-2017 this time around.
>
> This mail is to call out the following,
>
> 1) Are there any pending *blocker* bugs that need to be tracked for
> 3.10.2? If so mark them against the provided tracker [1] as blockers
> for the release, or at the very least post them as a response to this
> mail
>
> 2) Pending reviews in the 3.10 dashboard will be part of the release,
> *iff* they pass regressions and have the review votes, so use the
> dashboard [2] to check on the status of your patches to 3.10 and get
> these going
>
> 3) I have made checks on what went into 3.8 post 3.10 release and if
> these fixes are included in 3.10 branch, the status on this is *green*
> as all fixes ported to 3.8, are ported to 3.10 as well
>
> 4) Empty release notes are posted here [3], if there are any specific
> call outs for 3.10 beyond bugs, please update the review, or leave a
> comment in the review, for me to pick it up
>
> Thanks,
> Raghavendra Talur
>
> [1] Release bug tracker:
> https://bugzilla.redhat.com/show_bug.cgi?id=glusterfs-3.10.2
>
> [2] 3.10 review dashboard:
> https://review.gluster.org/#/projects/glusterfs,dashboards/dashboard:3-10-dashboard
>
> [3] Release notes WIP: https://review.gluster.org/#/c/17063/
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users