Re: [Gluster-devel] Suggestions for improving the block/gluster driver in QEMU

2016-07-28 Thread Niels de Vos
On Thu, Jul 28, 2016 at 09:04:43PM +0530, Ravishankar N wrote:
> On 07/28/2016 04:43 PM, Niels de Vos wrote:
> > > posix_discard() in gluster seems to be using fallocate() with
> > > >FALLOC_FL_PUNCH_HOLE flag. And posix_zerofill() can be made smarter to 
> > > >use
> > > >FALLOC_FL_ZERO_RANGE and fallback to writing zeroes if ZERO_RANGE is not
> > > >supported.
> > Oh, nice find! I was expecting that posix_zerofill() uses fallocate()
> > already... Definitely something that shoud be improved too. Care to file
> > a bug for that?
> Sent http://review.gluster.org/#/c/15037/ :-)

Or like that, thanks!

Niels


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] ./tests/basic/afr/granular-esh/add-brick.t suprious failure

2016-07-28 Thread Krutika Dhananjay
I looked at the logs and the only thing that is known is that heal hadn't
completed even after $HEAL_TIMEOUT seconds.
I also ran the test in a loop in my vm for > 1day without success (meaning
without failure of the test :)).

Nigel,

Would it be possible for you to lend me a jenkins slave for a day to try
and recreate the issue and then to debug further if it fails?

-Krutika

On Tue, Jul 26, 2016 at 3:39 PM, Krutika Dhananjay 
wrote:

> I'll take a look.
>
> -Krutika
>
> On Tue, Jul 26, 2016 at 3:36 PM, Kotresh Hiremath Ravishankar <
> khire...@redhat.com> wrote:
>
>> Hi,
>>
>> Above mentioned AFT test has failed and is not related to the below patch.
>>
>>
>> https://build.gluster.org/job/rackspace-regression-2GB-triggered/22485/consoleFull
>>
>> Can someone from AFR team look into it?
>>
>> Thanks and Regards,
>> Kotresh H R
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Suggestions for improving the block/gluster driver in QEMU

2016-07-28 Thread Ravishankar N

On 07/28/2016 04:43 PM, Niels de Vos wrote:

posix_discard() in gluster seems to be using fallocate() with
>FALLOC_FL_PUNCH_HOLE flag. And posix_zerofill() can be made smarter to use
>FALLOC_FL_ZERO_RANGE and fallback to writing zeroes if ZERO_RANGE is not
>supported.

Oh, nice find! I was expecting that posix_zerofill() uses fallocate()
already... Definitely something that shoud be improved too. Care to file
a bug for that?

Sent http://review.gluster.org/#/c/15037/ :-)
-Ravi
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] [Gluster-users] Need a way to display and flush gluster cache ?

2016-07-28 Thread Mohammed Rafi K C


On 07/28/2016 07:56 PM, Niels de Vos wrote:
> On Thu, Jul 28, 2016 at 05:58:15PM +0530, Mohammed Rafi K C wrote:
>>
>> On 07/27/2016 04:33 PM, Raghavendra G wrote:
>>>
>>> On Wed, Jul 27, 2016 at 10:29 AM, Mohammed Rafi K C
>>> mailto:rkavu...@redhat.com>> wrote:
>>>
>>> Thanks for your feedback.
>>>
>>> In fact meta xlator is loaded only on fuse mount, is there any
>>> particular reason to not to use meta-autoload xltor for nfs server
>>> and libgfapi ?
>>>
>>>
>>> I think its because of lack of resources. I am not aware of any
>>> technical reason for not using on NFSv3 server and gfapi.
>> Cool. I will try to see how we can implement meta-autoliad feature for
>> nfs-server and libgfapi. Once we have the feature in place, I will
>> implement the cache memory display/flush feature using meta xlators.
> In case you plan to have this ready in a month (before the end of
> August), you should propose it as a 3.9 feature. Click the "Edir this
> page on GitHub" link on the bottom of
> https://www.gluster.org/community/roadmap/3.9/ :)
I will do an assessment and will see if I can spend some time on this
for 3.9 release. If so, I will add it into 3.9 feature page.

Regards
Rafi KC

>
> Thanks,
> Niels
>
>
>> Thanks for your valuable feedback.
>> Rafi KC
>>
>>>  
>>>
>>> Regards
>>>
>>> Rafi KC
>>>
>>> On 07/26/2016 04:05 PM, Niels de Vos wrote:
 On Tue, Jul 26, 2016 at 12:43:56PM +0530, Kaushal M wrote:
> On Tue, Jul 26, 2016 at 12:28 PM, Prashanth Pai  
>  wrote:
>> +1 to option (2) which similar to echoing into 
>> /proc/sys/vm/drop_caches
>>
>>  -Prashanth Pai
>>
>> - Original Message -
>>> From: "Mohammed Rafi K C"  
>>> 
>>> To: "gluster-users"  
>>> , "Gluster Devel" 
>>>  
>>> Sent: Tuesday, 26 July, 2016 10:44:15 AM
>>> Subject: [Gluster-devel] Need a way to display and flush gluster 
>>> cache ?
>>>
>>> Hi,
>>>
>>> Gluster stack has it's own caching mechanism , mostly on client 
>>> side.
>>> But there is no concrete method to see how much memory are 
>>> consuming by
>>> gluster for caching and if needed there is no way to flush the 
>>> cache memory.
>>>
>>> So my first question is, Do we require to implement this two 
>>> features
>>> for gluster cache?
>>>
>>>
>>> If so I would like to discuss some of our thoughts towards it.
>>>
>>> (If you are not interested in implementation discussion, you can 
>>> skip
>>> this part :)
>>>
>>> 1) Implement a virtual xattr on root, and on doing setxattr, flush 
>>> all
>>> the cache, and for getxattr we can print the aggregated cache size.
>>>
>>> 2) Currently in gluster native client support .meta virtual 
>>> directory to
>>> get meta data information as analogues to proc. we can implement a
>>> virtual file inside the .meta directory to read  the cache size. 
>>> Also we
>>> can flush the cache using a special write into the file, (similar to
>>> echoing into proc file) . This approach may be difficult to 
>>> implement in
>>> other clients.
> +1 for making use of the meta-xlator. We should be making more use of 
> it.
 Indeed, this would be nice. Maybe this can also expose the memory
 allocations like /proc/slabinfo.

 The io-stats xlator can dump some statistics to
 /var/log/glusterfs/samples/ and /var/lib/glusterd/stats/ . That seems 
 to
 be acceptible too, and allows to get statistics from server-side
 processes without involving any clients.

 HTH,
 Niels


>>> 3) A cli command to display and flush the data with ip and port as 
>>> an
>>> argument. GlusterD need to send the op to client from the connected
>>> client list. But this approach would be difficult to implement for
>>> libgfapi based clients. For me, it doesn't seems to be a good 
>>> option.
>>>
>>> Your suggestions and comments are most welcome.
>>>
>>> Thanks to Talur and Poornima for their suggestions.
>>>
>>> Regards
>>>
>>> Rafi KC
>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org 
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org 
>> http://www.gluster.org/mailman/listinfo/gluster-devel
> ___

Re: [Gluster-devel] [Gluster-users] Need a way to display and flush gluster cache ?

2016-07-28 Thread Niels de Vos
On Thu, Jul 28, 2016 at 05:58:15PM +0530, Mohammed Rafi K C wrote:
> 
> 
> On 07/27/2016 04:33 PM, Raghavendra G wrote:
> >
> >
> > On Wed, Jul 27, 2016 at 10:29 AM, Mohammed Rafi K C
> > mailto:rkavu...@redhat.com>> wrote:
> >
> > Thanks for your feedback.
> >
> > In fact meta xlator is loaded only on fuse mount, is there any
> > particular reason to not to use meta-autoload xltor for nfs server
> > and libgfapi ?
> >
> >
> > I think its because of lack of resources. I am not aware of any
> > technical reason for not using on NFSv3 server and gfapi.
> 
> Cool. I will try to see how we can implement meta-autoliad feature for
> nfs-server and libgfapi. Once we have the feature in place, I will
> implement the cache memory display/flush feature using meta xlators.

In case you plan to have this ready in a month (before the end of
August), you should propose it as a 3.9 feature. Click the "Edir this
page on GitHub" link on the bottom of
https://www.gluster.org/community/roadmap/3.9/ :)

Thanks,
Niels


> 
> Thanks for your valuable feedback.
> Rafi KC
> 
> >  
> >
> > Regards
> >
> > Rafi KC
> >
> > On 07/26/2016 04:05 PM, Niels de Vos wrote:
> >> On Tue, Jul 26, 2016 at 12:43:56PM +0530, Kaushal M wrote:
> >>> On Tue, Jul 26, 2016 at 12:28 PM, Prashanth Pai  
> >>>  wrote:
>  +1 to option (2) which similar to echoing into 
>  /proc/sys/vm/drop_caches
> 
>   -Prashanth Pai
> 
>  - Original Message -
> > From: "Mohammed Rafi K C"  
> > 
> > To: "gluster-users"  
> > , "Gluster Devel" 
> >  
> > Sent: Tuesday, 26 July, 2016 10:44:15 AM
> > Subject: [Gluster-devel] Need a way to display and flush gluster 
> > cache ?
> >
> > Hi,
> >
> > Gluster stack has it's own caching mechanism , mostly on client 
> > side.
> > But there is no concrete method to see how much memory are 
> > consuming by
> > gluster for caching and if needed there is no way to flush the 
> > cache memory.
> >
> > So my first question is, Do we require to implement this two 
> > features
> > for gluster cache?
> >
> >
> > If so I would like to discuss some of our thoughts towards it.
> >
> > (If you are not interested in implementation discussion, you can 
> > skip
> > this part :)
> >
> > 1) Implement a virtual xattr on root, and on doing setxattr, flush 
> > all
> > the cache, and for getxattr we can print the aggregated cache size.
> >
> > 2) Currently in gluster native client support .meta virtual 
> > directory to
> > get meta data information as analogues to proc. we can implement a
> > virtual file inside the .meta directory to read  the cache size. 
> > Also we
> > can flush the cache using a special write into the file, (similar to
> > echoing into proc file) . This approach may be difficult to 
> > implement in
> > other clients.
> >>> +1 for making use of the meta-xlator. We should be making more use of 
> >>> it.
> >> Indeed, this would be nice. Maybe this can also expose the memory
> >> allocations like /proc/slabinfo.
> >>
> >> The io-stats xlator can dump some statistics to
> >> /var/log/glusterfs/samples/ and /var/lib/glusterd/stats/ . That seems 
> >> to
> >> be acceptible too, and allows to get statistics from server-side
> >> processes without involving any clients.
> >>
> >> HTH,
> >> Niels
> >>
> >>
> > 3) A cli command to display and flush the data with ip and port as 
> > an
> > argument. GlusterD need to send the op to client from the connected
> > client list. But this approach would be difficult to implement for
> > libgfapi based clients. For me, it doesn't seems to be a good 
> > option.
> >
> > Your suggestions and comments are most welcome.
> >
> > Thanks to Talur and Poornima for their suggestions.
> >
> > Regards
> >
> > Rafi KC
> >
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org 
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> >
>  ___
>  Gluster-devel mailing list
>  Gluster-devel@gluster.org 
>  http://www.gluster.org/mailman/listinfo/gluster-devel
> >>> ___
> >>> Gluster-users mailing list
> >>> gluster-us...@gluster.org 
> >>> http://www.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>>
> >

Re: [Gluster-devel] Support to reclaim locks (posix) provided lkowner & range matches

2016-07-28 Thread Soumya Koduri



On 07/27/2016 02:38 AM, Vijay Bellur wrote:


On 07/26/2016 05:56 AM, Soumya Koduri wrote:

Hi Vijay,

On 07/26/2016 12:13 AM, Vijay Bellur wrote:

On 07/22/2016 08:44 AM, Soumya Koduri wrote:

Hi,

In certain scenarios (esp.,in highly available environments), the
application may have to fail-over/connect to a different glusterFS
client while the I/O is happening. In such cases until there is a ping
timer expiry and glusterFS server cleans up the locks held by the older
glusterFS client, the application will not be able to reclaim their
lost
locks. To avoid that we need support in Gluster to let clients reclaim
the existing locks provided lkwoner and the lock range matches.



If the server detects a disconnection, it goes about cleaning up the
locks held by the disconnected client. Only if the failover connection
happens before this server cleanup the outlined scheme would work.Since
there is no ping timer on the server, do you propose to have a grace
timer on the server?


But we are looking for a solution which can work in active-active
configuration as well. We need to handle cases where in the connection
between server and the old-client is still in use, which can happen
during load-balancing or failback.

Different cases which I can outline are:

Application Client - (AC)
Application/GlusterClient 1 - GC1
Application/GlusterClient 2 - GC2
Gluster Server (GS)

1) Active-Passive config  (service gone down)

AC > GC1  > GS (GC2 is not active)

| (failover)
v

AC > GC2  > GS (GC1 connection gets dropped and GC2 establishes
connection)

In this case, we can have grace timer to allow reclaims only for certain
time post GC2 (any) rpc connection establishment.

2) Active-Active config  (service gone down)

AC > GC1  > GS
 ^
 |
 GC2  ---

| (failover)
v

AC > GC2  > GS (GC1 connection gets dropped)

The grace timer then shall not get triggered in this case. But at-least
the locks from GC1 gets cleaned post its connection cleanup.



grace timer is not required if lock reclamation can happen before the
old connection between GC1 & GS gets dropped. Is this guaranteed to
happen every time?


Not all the time but more likely since failover time is usually lesser 
than ping timer / rpc connection expiry time.






3) Active-Active config  (both the services active/load-balancing)
This is the trick one.

AC > GC1  > GS
 ^
 |
 GC2  ---

| (load-balancing/failback)
v

 GC1  > GS
 ^
 |
AC > GC2  ---

The locks taken by GC1 shall end up being on the server for ever unless
we restart either GC1 or the server.



Yes, this is trickier. The behavior is dependent on how the application
performs a failback. How do we handle this with Ganesha today? Since the
connection between nfs client and Ganesha/GC1 is broken, would it not
send cleanup requests on locks it held on behalf of that client?

Yes. I checked within NFS-Ganesha community too. There seems to be a 
provision in NFS-Ganesha to trigger an event upon receiving which it can 
flush the locks associated with an IP. We could send this event to the 
active servers (in this case GC1) while triggering fail-back. So from 
NFS-Ganesha perspective, this seems to be taken care of. Unless some 
other application (SMB3 handles?) has this use-case, we may for now can 
ignore it.





Considering above cases, looks like we may need to allow reclaim of the
locks all the time. Please suggest if I have missed out any details.



I agree that lock reclamation is needed. Grace timeout behavior does
need more thought for all these cases. Given the involved nature of this
problem, it might be better to write down a more detailed spec that
discusses all these cases for a more thorough review.


Sure. I will open up a spec.

Thanks,
Soumya







For client-side support, I am thinking if we can integrate with the new
lock API being introduced as part of mandatory lock support in gfapi
[2]



Is glfs_file_lock() planned to be used here? If so, how do we specify
that it is a reclaim lock in this api?


Yes. We have been discussing on that patch-set if we can use the same
API. We should either have a separate field to pass reclaim flag or if
we choose not to change its definition, then probably can have
additional lock types -

GLFS_LK_ADVISORY
GLFS_LK_MANDATORY

New lock-types
GLFS_LK_RECLAIM_ADVISORY
GLFS_LK_RECLAIM_MANDATORY



Either approach seems reasonable to me.



We also would need to pass the reclaim_lock flag over rpc.


To avoid new fop/rpc changes, I was considering to take xdata approach
(similar to the way lock mode is passed in xdata for mandatory lock
support) since the processing of reclamation doesn't differ much from
the existing lk fop except for conflicting lock checks.



This looks ok to me.

Thanks,
Vijay




___
Gluster-devel mailing list
Gluste

Re: [Gluster-devel] [Gluster-users] Need a way to display and flush gluster cache ?

2016-07-28 Thread Mohammed Rafi K C


On 07/27/2016 04:33 PM, Raghavendra G wrote:
>
>
> On Wed, Jul 27, 2016 at 10:29 AM, Mohammed Rafi K C
> mailto:rkavu...@redhat.com>> wrote:
>
> Thanks for your feedback.
>
> In fact meta xlator is loaded only on fuse mount, is there any
> particular reason to not to use meta-autoload xltor for nfs server
> and libgfapi ?
>
>
> I think its because of lack of resources. I am not aware of any
> technical reason for not using on NFSv3 server and gfapi.

Cool. I will try to see how we can implement meta-autoliad feature for
nfs-server and libgfapi. Once we have the feature in place, I will
implement the cache memory display/flush feature using meta xlators.

Thanks for your valuable feedback.
Rafi KC

>  
>
> Regards
>
> Rafi KC
>
> On 07/26/2016 04:05 PM, Niels de Vos wrote:
>> On Tue, Jul 26, 2016 at 12:43:56PM +0530, Kaushal M wrote:
>>> On Tue, Jul 26, 2016 at 12:28 PM, Prashanth Pai  
>>>  wrote:
 +1 to option (2) which similar to echoing into /proc/sys/vm/drop_caches

  -Prashanth Pai

 - Original Message -
> From: "Mohammed Rafi K C"  
> 
> To: "gluster-users"  
> , "Gluster Devel" 
>  
> Sent: Tuesday, 26 July, 2016 10:44:15 AM
> Subject: [Gluster-devel] Need a way to display and flush gluster 
> cache ?
>
> Hi,
>
> Gluster stack has it's own caching mechanism , mostly on client side.
> But there is no concrete method to see how much memory are consuming 
> by
> gluster for caching and if needed there is no way to flush the cache 
> memory.
>
> So my first question is, Do we require to implement this two features
> for gluster cache?
>
>
> If so I would like to discuss some of our thoughts towards it.
>
> (If you are not interested in implementation discussion, you can skip
> this part :)
>
> 1) Implement a virtual xattr on root, and on doing setxattr, flush all
> the cache, and for getxattr we can print the aggregated cache size.
>
> 2) Currently in gluster native client support .meta virtual directory 
> to
> get meta data information as analogues to proc. we can implement a
> virtual file inside the .meta directory to read  the cache size. Also 
> we
> can flush the cache using a special write into the file, (similar to
> echoing into proc file) . This approach may be difficult to implement 
> in
> other clients.
>>> +1 for making use of the meta-xlator. We should be making more use of 
>>> it.
>> Indeed, this would be nice. Maybe this can also expose the memory
>> allocations like /proc/slabinfo.
>>
>> The io-stats xlator can dump some statistics to
>> /var/log/glusterfs/samples/ and /var/lib/glusterd/stats/ . That seems to
>> be acceptible too, and allows to get statistics from server-side
>> processes without involving any clients.
>>
>> HTH,
>> Niels
>>
>>
> 3) A cli command to display and flush the data with ip and port as an
> argument. GlusterD need to send the op to client from the connected
> client list. But this approach would be difficult to implement for
> libgfapi based clients. For me, it doesn't seems to be a good option.
>
> Your suggestions and comments are most welcome.
>
> Thanks to Talur and Poornima for their suggestions.
>
> Regards
>
> Rafi KC
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org 
 http://www.gluster.org/mailman/listinfo/gluster-devel
>>> ___
>>> Gluster-users mailing list
>>> gluster-us...@gluster.org 
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>>> ___
>>> Gluster-devel mailing list
>>> Gluster-devel@gluster.org 
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org 
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> -- 
> Raghavendra G

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-

Re: [Gluster-devel] Suggestions for improving the block/gluster driver in QEMU

2016-07-28 Thread Niels de Vos
On Thu, Jul 28, 2016 at 04:19:42PM +0530, Ravishankar N wrote:
> On 07/28/2016 03:32 PM, Niels de Vos wrote:
> > There are some features in QEMU that we could implement with the
> > existing libgfapi functions. Kevin asked me about this a while back, and
> > I have finally (sorry for the delay Kevin!) taken the time to look into
> > it.
> > 
> > There are some optional operations that can be set in the BlockDriver
> > structure. The ones missing that we could have, or have useless
> > implementations are these:
> > 
> >.bdrv_get_info/.bdrv_refresh_limits:
> >  This seems to set values in a BlockDriverInfo and BlockLimits
> >  structure that is used by QEMUs block layer. By setting the right
> >  values, we can use glfs_discard() and glfs_zerofill() to reduce the
> >  writing of 0-bytes that QEMU falls back on at the moment.
> > 
> >.bdrv_has_zero_init / qemu_gluster_has_zero_init:
> >  Currently always returns 0. But if a file gets created on a Gluster
> >  volume, it should never have old contents in it. Rewriting it with
> >  0-bytes looks unneeded to me.
> 
> N00b question, what is the need for separate glfs_discard() and
> glfs_zerofill() functions? Can we not just use glfs_fallocate() with
> appropriate flags?

glfs_fallocate() does not have an argument for flags :-/ If we introduce
it now, we'll change the API and existing libgfapi applications using
the function will fail to compile. It can be done though, and involved
implementing a new glfs_fallocate() with an updated symbol version. But,
it'll be painful for the existing applications in any case.

> posix_discard() in gluster seems to be using fallocate() with
> FALLOC_FL_PUNCH_HOLE flag. And posix_zerofill() can be made smarter to use
> FALLOC_FL_ZERO_RANGE and fallback to writing zeroes if ZERO_RANGE is not
> supported.

Oh, nice find! I was expecting that posix_zerofill() uses fallocate()
already... Definitely something that shoud be improved too. Care to file
a bug for that?

Thanks,
Niels


> Regards,
> Ravi
> 
> > 
> > With these improvements the gluster:// URL usage with QEMU (and now also
> > the new JSON QAPI), certain operations are expected to be a little
> > faster. Anyone starting to work on this would want to trace the actual
> > operations (on a single-brick volume) with ltrace/wireshark on the
> > system where QEMU runs.
> > 
> > Who is interested to take this on?
> > Niels
> > 
> > 
> > ___
> > Gluster-devel mailing list
> > Gluster-devel@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-devel
> 
> 


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Suggestions for improving the block/gluster driver in QEMU

2016-07-28 Thread Prasanna Kalever
On Thu, Jul 28, 2016 at 4:13 PM, Niels de Vos  wrote:
> On Thu, Jul 28, 2016 at 03:51:11PM +0530, Prasanna Kalever wrote:
>> On Thu, Jul 28, 2016 at 3:32 PM, Niels de Vos  wrote:
>> > There are some features in QEMU that we could implement with the
>> > existing libgfapi functions. Kevin asked me about this a while back, and
>> > I have finally (sorry for the delay Kevin!) taken the time to look into
>> > it.
>> >
>> > There are some optional operations that can be set in the BlockDriver
>> > structure. The ones missing that we could have, or have useless
>> > implementations are these:
>> >
>> >   .bdrv_get_info/.bdrv_refresh_limits:
>> > This seems to set values in a BlockDriverInfo and BlockLimits
>> > structure that is used by QEMUs block layer. By setting the right
>> > values, we can use glfs_discard() and glfs_zerofill() to reduce the
>> > writing of 0-bytes that QEMU falls back on at the moment.
>>
>> Hey Niels and Kevin,
>>
>> In one of our discussions Jeff shown his interest in knowing about
>> discard support in gluster upstream.
>> I thinks his intention was same here.
>>
>> >
>> >   .bdrv_has_zero_init / qemu_gluster_has_zero_init:
>> > Currently always returns 0. But if a file gets created on a Gluster
>> > volume, it should never have old contents in it. Rewriting it with
>> > 0-bytes looks unneeded to me.
>>
>> I agree
>>
>> >
>> > With these improvements the gluster:// URL usage with QEMU (and now also
>> > the new JSON QAPI), certain operations are expected to be a little
>> > faster. Anyone starting to work on this would want to trace the actual
>> > operations (on a single-brick volume) with ltrace/wireshark on the
>> > system where QEMU runs.
>> >
>> > Who is interested to take this on?
>>
>> Of course I am very much interested to do this work :)
>>
>> But please expect at least a week or two at initializing this from my side,
>> as currently my plate is filled with block store tasks.
>>
>> Hopefully this is meant for 2.8 (as 2.7 is in hard-freeze) I think
>> delay should be acceptable.
>
> Thanks! There are no strict timelines for any of the community work. It
> all depends on what your manager(s) want to see in future productized
> versions. At the moment, and for all I know, this is just an improvement
> that we should do at one point.

Yeah, right!

Since we are okay with the timelines, I shall get this done to fall
with qemu-2.8

Thanks for bringing this into notice :)

--
Prasanna

>
> Niels
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] Suggestions for improving the block/gluster driver in QEMU

2016-07-28 Thread Ravishankar N

On 07/28/2016 03:32 PM, Niels de Vos wrote:

There are some features in QEMU that we could implement with the
existing libgfapi functions. Kevin asked me about this a while back, and
I have finally (sorry for the delay Kevin!) taken the time to look into
it.

There are some optional operations that can be set in the BlockDriver
structure. The ones missing that we could have, or have useless
implementations are these:

   .bdrv_get_info/.bdrv_refresh_limits:
 This seems to set values in a BlockDriverInfo and BlockLimits
 structure that is used by QEMUs block layer. By setting the right
 values, we can use glfs_discard() and glfs_zerofill() to reduce the
 writing of 0-bytes that QEMU falls back on at the moment.

   .bdrv_has_zero_init / qemu_gluster_has_zero_init:
 Currently always returns 0. But if a file gets created on a Gluster
 volume, it should never have old contents in it. Rewriting it with
 0-bytes looks unneeded to me.


N00b question, what is the need for separate glfs_discard() and 
glfs_zerofill() functions? Can we not just use glfs_fallocate() with 
appropriate flags?
posix_discard() in gluster seems to be using fallocate() with 
FALLOC_FL_PUNCH_HOLE flag. And posix_zerofill() can be made smarter to 
use FALLOC_FL_ZERO_RANGE and fallback to writing zeroes if ZERO_RANGE is 
not supported.


Regards,
Ravi



With these improvements the gluster:// URL usage with QEMU (and now also
the new JSON QAPI), certain operations are expected to be a little
faster. Anyone starting to work on this would want to trace the actual
operations (on a single-brick volume) with ltrace/wireshark on the
system where QEMU runs.

Who is interested to take this on?
Niels


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Suggestions for improving the block/gluster driver in QEMU

2016-07-28 Thread Niels de Vos
On Thu, Jul 28, 2016 at 03:51:11PM +0530, Prasanna Kalever wrote:
> On Thu, Jul 28, 2016 at 3:32 PM, Niels de Vos  wrote:
> > There are some features in QEMU that we could implement with the
> > existing libgfapi functions. Kevin asked me about this a while back, and
> > I have finally (sorry for the delay Kevin!) taken the time to look into
> > it.
> >
> > There are some optional operations that can be set in the BlockDriver
> > structure. The ones missing that we could have, or have useless
> > implementations are these:
> >
> >   .bdrv_get_info/.bdrv_refresh_limits:
> > This seems to set values in a BlockDriverInfo and BlockLimits
> > structure that is used by QEMUs block layer. By setting the right
> > values, we can use glfs_discard() and glfs_zerofill() to reduce the
> > writing of 0-bytes that QEMU falls back on at the moment.
> 
> Hey Niels and Kevin,
> 
> In one of our discussions Jeff shown his interest in knowing about
> discard support in gluster upstream.
> I thinks his intention was same here.
> 
> >
> >   .bdrv_has_zero_init / qemu_gluster_has_zero_init:
> > Currently always returns 0. But if a file gets created on a Gluster
> > volume, it should never have old contents in it. Rewriting it with
> > 0-bytes looks unneeded to me.
> 
> I agree
> 
> >
> > With these improvements the gluster:// URL usage with QEMU (and now also
> > the new JSON QAPI), certain operations are expected to be a little
> > faster. Anyone starting to work on this would want to trace the actual
> > operations (on a single-brick volume) with ltrace/wireshark on the
> > system where QEMU runs.
> >
> > Who is interested to take this on?
> 
> Of course I am very much interested to do this work :)
> 
> But please expect at least a week or two at initializing this from my side,
> as currently my plate is filled with block store tasks.
> 
> Hopefully this is meant for 2.8 (as 2.7 is in hard-freeze) I think
> delay should be acceptable.

Thanks! There are no strict timelines for any of the community work. It
all depends on what your manager(s) want to see in future productized
versions. At the moment, and for all I know, this is just an improvement
that we should do at one point.

Niels


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Suggestions for improving the block/gluster driver in QEMU

2016-07-28 Thread Prasanna Kalever
On Thu, Jul 28, 2016 at 3:32 PM, Niels de Vos  wrote:
> There are some features in QEMU that we could implement with the
> existing libgfapi functions. Kevin asked me about this a while back, and
> I have finally (sorry for the delay Kevin!) taken the time to look into
> it.
>
> There are some optional operations that can be set in the BlockDriver
> structure. The ones missing that we could have, or have useless
> implementations are these:
>
>   .bdrv_get_info/.bdrv_refresh_limits:
> This seems to set values in a BlockDriverInfo and BlockLimits
> structure that is used by QEMUs block layer. By setting the right
> values, we can use glfs_discard() and glfs_zerofill() to reduce the
> writing of 0-bytes that QEMU falls back on at the moment.

Hey Niels and Kevin,

In one of our discussions Jeff shown his interest in knowing about
discard support in gluster upstream.
I thinks his intention was same here.

>
>   .bdrv_has_zero_init / qemu_gluster_has_zero_init:
> Currently always returns 0. But if a file gets created on a Gluster
> volume, it should never have old contents in it. Rewriting it with
> 0-bytes looks unneeded to me.

I agree

>
> With these improvements the gluster:// URL usage with QEMU (and now also
> the new JSON QAPI), certain operations are expected to be a little
> faster. Anyone starting to work on this would want to trace the actual
> operations (on a single-brick volume) with ltrace/wireshark on the
> system where QEMU runs.
>
> Who is interested to take this on?

Of course I am very much interested to do this work :)

But please expect at least a week or two at initializing this from my side,
as currently my plate is filled with block store tasks.

Hopefully this is meant for 2.8 (as 2.7 is in hard-freeze) I think
delay should be acceptable.

Thanks,
--
Prasanna


> Niels
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] Suggestions for improving the block/gluster driver in QEMU

2016-07-28 Thread Niels de Vos
There are some features in QEMU that we could implement with the
existing libgfapi functions. Kevin asked me about this a while back, and
I have finally (sorry for the delay Kevin!) taken the time to look into
it.

There are some optional operations that can be set in the BlockDriver
structure. The ones missing that we could have, or have useless
implementations are these:

  .bdrv_get_info/.bdrv_refresh_limits:
This seems to set values in a BlockDriverInfo and BlockLimits
structure that is used by QEMUs block layer. By setting the right
values, we can use glfs_discard() and glfs_zerofill() to reduce the
writing of 0-bytes that QEMU falls back on at the moment.

  .bdrv_has_zero_init / qemu_gluster_has_zero_init:
Currently always returns 0. But if a file gets created on a Gluster
volume, it should never have old contents in it. Rewriting it with
0-bytes looks unneeded to me.

With these improvements the gluster:// URL usage with QEMU (and now also
the new JSON QAPI), certain operations are expected to be a little
faster. Anyone starting to work on this would want to trace the actual
operations (on a single-brick volume) with ltrace/wireshark on the
system where QEMU runs.

Who is interested to take this on?
Niels


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel