[OpenZFS Developer] ds_unique_bytes and dd_used_bytes significance with snapshots

2014-02-05 Thread Gaurav Mahajan
Hi,

I am trying to understand the space accounting done in dsl_dataset and
snapshots.
Usually whenever a block is allocate for a dataset we
call dsl_dataset_block_born.
And when we free the block we call dsl_dataset_block_kill.

In both of these function we add the allocated/deletes space to/from
current dsl_dataset and dsl_dir.

So my question is how the ds_unique_bytes and dd_used_bytes are related?

What happens in case of snapshots to these fields.?
>From my understanding what I have guessed is ds_unique_byes represents
blocks
allocate/born in current dsl_dataset. But dsl_dir accounts for all the
blocks in snapshots as well as current dsl_dataset. Please correct me if I
am wrong.

Also when we take a snapshot what happens to ds_unique_byes, are they
copied from snapshot or initialized as 0?

Thanks in advance !!

Regards,
Gaurav.
___
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer


Re: [OpenZFS Developer] zio parent and zio children relation.

2013-12-17 Thread Gaurav Mahajan
Hi George,

Thanks for the response..!!

On Tue, Dec 17, 2013 at 7:27 PM, George Wilson wrote:

>
> On 12/17/13 5:21 AM, Gaurav Mahajan wrote:
>
> Hi all,
>
>  I am trying to understand relations of root ZIO and children ZIO.
>
>  This is what I have understood.. Please correct me if I'm wrong.
>
>  Usually whenever we want to do a series of IO operations like in sync
> thread.
> We create a root ZIO with zio_root(). Now this root ZIO becomes parent for
> every ZIO that we create while syncing the async data to disk (in
> dbuf_sync_leaf).
>
>  All the child ZIO are issued using zio_nowait()
> After issuing all the children ZIO at the end we call zio_wait() on root
> ZIO.
>
>  So the question that comes in my mind is that after zio_wait for root
> ZIO is over, are we guaranteed that all the children ZIO are complete.?
>
>
> Yes, the root zio cannot complete until all its children have completed.
>

Why do we need the convergence logic then (multiple pass while syncing) ?
Why dsl_pool_sync is called multiple times?

I'm asking this because dsl_pool_sync creates root ZIO, which intern calls
the dnode_sync
Now for indirect blocks the root ZIO becomes parent of indirect block's ZIO.
then indirect blocks ZIO becomes parent for data/leaf block ZIO.
So i guess in one pass only it will write all the dirty data for all the
dirty dnode in an object set.

Please correct me if i'm wrong.

Thanks !!
Gaurav.

- George
>
>
>  complete in sense like block allocation and data write are done and
> io_done callback are complete.
>
>  I may be wrong with my understanding. Please correct me.
>
>  Thanks !!!
> Gaurav.
>
>
>
> ___
> developer mailing 
> listdeveloper@open-zfs.orghttp://lists.open-zfs.org/mailman/listinfo/developer
>
>
>
___
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer


[OpenZFS Developer] zio parent and zio children relation.

2013-12-17 Thread Gaurav Mahajan
Hi all,

I am trying to understand relations of root ZIO and children ZIO.

This is what I have understood.. Please correct me if I'm wrong.

Usually whenever we want to do a series of IO operations like in sync
thread.
We create a root ZIO with zio_root(). Now this root ZIO becomes parent for
every ZIO that we create while syncing the async data to disk (in
dbuf_sync_leaf).

All the child ZIO are issued using zio_nowait()
After issuing all the children ZIO at the end we call zio_wait() on root
ZIO.

So the question that comes in my mind is that after zio_wait for root ZIO
is over, are we guaranteed that all the children ZIO are complete.?

complete in sense like block allocation and data write are done and io_done
callback are complete.

I may be wrong with my understanding. Please correct me.

Thanks !!!
Gaurav.
___
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer


[OpenZFS Developer] ZFS Transaction groups data modification limit?

2013-10-31 Thread Gaurav Mahajan
Hi All,

I'm new to ZFS and trying to understand the working txgs are used and
handles in ZFS.

I wanted to know if there is any limit on how much data we can modify in a
particular txg?

Also how zfs does the accounting for sync writes? as in sync writes force
ZIL to flush. So how accounting is done for available space?

Is there any limit on Metadata writes performed by sync thread?

Thanks,
Gaurav
___
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer


Re: [OpenZFS Developer] [zfs-devel] blk_phys_birth and blk_birth

2013-10-15 Thread Gaurav Mahajan
Hi Matt,

Thanks for the replay.
On Wed, Oct 16, 2013 at 1:24 AM, Matthew Ahrens  wrote:

>
>
>
> On Mon, Oct 14, 2013 at 11:26 PM, Gaurav Mahajan wrote:
>
>> Hi Matt,
>>
>> I want to delay the assignment of blocks to the dnode.
>>
>> So whenever I need a block I will call metaslab_alloc which is internal
>> part of zio pipeline.
>> This metaslab_alloc will return me the  blkptr with blk_birth as say 123.
>>
>> I will issue the Zio to write data to the disk with this blkptr.
>>
>> Now After some time I want to assign this block to some particular dnode.
>> So I will just update the level 1 indirect block of the dnode with the
>> blkptr we created earlier.
>> So the assignment will have different txg say 456,
>> So when i update the blkptr with indirect block I  need to update the
>> logical birth time of block i.e. blk_birth with 456 So that DSL and DMU
>> will consider the block as if it was created in txg=456. But the
>> blk_phys_birth will remain to 123 which is the actual birth time of the
>> block.
>>
>> Now in this case I dont have dedupe enabled. I just wanted to confirm
>> that if I have dedupe disabled and a block with different blk_birth and
>> blk_phys_birth then  will this work?. Or it will cause any problem.?
>>
>
> It might work.  There are several other concerns that would require an
> overall understanding of what you're doing.   Why do you want to do that?
>  What problem is this solving?  What happens to the BP in between when it's
> allocated and when its written?
>
I am planning to keep an intent log for it.

> What if the system crashes before you write the BP?  It seems like the
> space would be leaked.  What if a scrub or resilver starts after you have
> allocated the block but before the BP is written to disk?  Could it miss
> scrubbing that block?
>
I am not sure about how scrub works. Is it based on object sets like it
will check object set ->dnode->blkptr ?
If that is the case then we will miss the block from scrubbing.
Or is it based on SPA level like it will ask SPA what blocks you have and
start checking them out.
then we might not miss the block.

Thanks Matt for your help.
--Gaurav

--matt
>
>
>
>>
>> Thanks,
>> Gaurav.
>>
>>
>>
>>
>>
>> On Tue, Oct 15, 2013 at 11:36 AM, Matthew Ahrens wrote:
>>
>>>
>>>
>>>
>>> On Mon, Oct 14, 2013 at 10:53 PM, Gaurav Mahajan wrote:
>>>
>>>> Hi Matt,
>>>>
>>>> Thanks for the reply.
>>>> So can it happen like block gets physically allocate in one txg (say
>>>> txg=123) using metaslab_alloc()
>>>>  So the blkptr will have only blk_birth=123 and bkp_phys_birth=0.
>>>>
>>>
>>>
>>>>  Later on this block gets assigned to a Dnode (lets say ion txg 456)
>>>>
>>>
>>> What do you mean by that?  If it was written as part of that dnode then
>>> it would be "assigned" in the same txg, 123.  Or do you mean the same data
>>> is later written to a different file and you have dedup enabled?
>>>
>>>
>>>> then we can modify the blkptr as blk_birth=456 and blk_phys_birth=123.
>>>>
>>>
>>> blkptrs are not modified once they are on disk.
>>>
>>>
>>>>
>>>> Would this work? this would work only in case we have dedupe enabled.
>>>>
>>>
>>> If you had dedup enabled, and you write data that is identical to a
>>> block that already exists, it will have blk_phys_birth = whenever it was
>>> first written, some time in the past; and blk_birth = now, when the new
>>> reference is created to the existing physical block.
>>>
>>> --matt
>>>
>>>
>>>>
>>>>  Thanks,
>>>> Gaurav
>>>>
>>>>
>>>> On Tue, Oct 15, 2013 at 10:59 AM, Matthew Ahrens 
>>>> wrote:
>>>>
>>>>> On Wed, Oct 9, 2013 at 3:35 AM, Gaurav Mahajan wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am new to ZFS and trying to understand the code and workflow.
>>>>>>
>>>>>> In blkptr structure there are blk_phys_birth and blk_birth are
>>>>>> nothing but the txgs.
>>>>>>
>>>>>> What is the difference between them?
>>>>>> How DDT uses these two fields?
>>>>>>
>>>>>> typedef struct blkptr {
>>>>>> 
>>

Re: [OpenZFS Developer] [zfs-devel] blk_phys_birth and blk_birth

2013-10-14 Thread Gaurav Mahajan
Hi Matt,

I want to delay the assignment of blocks to the dnode.

So whenever I need a block I will call metaslab_alloc which is internal
part of zio pipeline.
This metaslab_alloc will return me the  blkptr with blk_birth as say 123.

I will issue the Zio to write data to the disk with this blkptr.

Now After some time I want to assign this block to some particular dnode.
So I will just update the level 1 indirect block of the dnode with the
blkptr we created earlier.
So the assignment will have different txg say 456,
So when i update the blkptr with indirect block I  need to update the
logical birth time of block i.e. blk_birth with 456 So that DSL and DMU
will consider the block as if it was created in txg=456. But the
blk_phys_birth will remain to 123 which is the actual birth time of the
block.

Now in this case I dont have dedupe enabled. I just wanted to confirm that
if I have dedupe disabled and a block with different blk_birth and
blk_phys_birth then  will this work?. Or it will cause any problem.?

Thanks,
Gaurav.





On Tue, Oct 15, 2013 at 11:36 AM, Matthew Ahrens wrote:

>
>
>
> On Mon, Oct 14, 2013 at 10:53 PM, Gaurav Mahajan wrote:
>
>> Hi Matt,
>>
>> Thanks for the reply.
>> So can it happen like block gets physically allocate in one txg (say
>> txg=123) using metaslab_alloc()
>>  So the blkptr will have only blk_birth=123 and bkp_phys_birth=0.
>>
>
>
>> Later on this block gets assigned to a Dnode (lets say ion txg 456)
>>
>
> What do you mean by that?  If it was written as part of that dnode then it
> would be "assigned" in the same txg, 123.  Or do you mean the same data is
> later written to a different file and you have dedup enabled?
>
>
>> then we can modify the blkptr as blk_birth=456 and blk_phys_birth=123.
>>
>
> blkptrs are not modified once they are on disk.
>
>
>>
>> Would this work? this would work only in case we have dedupe enabled.
>>
>
> If you had dedup enabled, and you write data that is identical to a block
> that already exists, it will have blk_phys_birth = whenever it was first
> written, some time in the past; and blk_birth = now, when the new reference
> is created to the existing physical block.
>
> --matt
>
>
>>
>> Thanks,
>> Gaurav
>>
>>
>> On Tue, Oct 15, 2013 at 10:59 AM, Matthew Ahrens wrote:
>>
>>> On Wed, Oct 9, 2013 at 3:35 AM, Gaurav Mahajan wrote:
>>>
>>>> Hi,
>>>>
>>>> I am new to ZFS and trying to understand the code and workflow.
>>>>
>>>> In blkptr structure there are blk_phys_birth and blk_birth are nothing
>>>> but the txgs.
>>>>
>>>> What is the difference between them?
>>>> How DDT uses these two fields?
>>>>
>>>> typedef struct blkptr {
>>>> 
>>>> uint64_tblk_phys_birth;  /* txg when block was allocated */
>>>> uint64_tblk_birth;  /* transaction group at birth   */
>>>> .
>>>> } blkptr_t;
>>>>
>>>>
>>> blk_birth_phys is when the block was allocated.
>>> blk_birth is when this particular reference was created.
>>>
>>> If the block is dedup'ed, blk_birth_phys can be before blk_birth.  (The
>>> block was allocated at time A, then another reference was added to that
>>> block at a later time B.)
>>>
>>> If they are the same (e.g. as is the case for all non-dedup'ed blocks),
>>> we only store blk_birth.  (I believe this is for backwards compatibility
>>> with software that doesn't understand dedup (pool version 21)).
>>>
>>> blk_birth_phys is used by the SPA, for example when resilvering devices
>>> that were temporarily offline.
>>>
>>> blk_birth is used by the DMU and DSL, for example to determine when to
>>> free a block, or what blocks should be part of a zfs send stream.
>>>
>>> --matt
>>>
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to zfs-devel+unsubscr...@zfsonlinux.org.
>>>
>>
>>  To unsubscribe from this group and stop receiving emails from it, send
>> an email to zfs-devel+unsubscr...@zfsonlinux.org.
>>
>
>  To unsubscribe from this group and stop receiving emails from it, send an
> email to zfs-devel+unsubscr...@zfsonlinux.org.
>
___
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer


Re: [Developer] [zfs-devel] blk_phys_birth and blk_birth

2013-10-14 Thread Gaurav Mahajan
Hi Matt,

Thanks for the reply.
So can it happen like block gets physically allocate in one txg (say
txg=123) using metaslab_alloc()
So the blkptr will have only blk_birth=123 and bkp_phys_birth=0.
Later on this block gets assigned to a Dnode (lets say ion txg 456) then we
can modify the blkptr as blk_birth=456 and blk_phys_birth=123.

Would this work? this would work only in case we have dedupe enabled.

Thanks,
Gaurav


On Tue, Oct 15, 2013 at 10:59 AM, Matthew Ahrens wrote:

> On Wed, Oct 9, 2013 at 3:35 AM, Gaurav Mahajan wrote:
>
>> Hi,
>>
>> I am new to ZFS and trying to understand the code and workflow.
>>
>> In blkptr structure there are blk_phys_birth and blk_birth are nothing
>> but the txgs.
>>
>> What is the difference between them?
>> How DDT uses these two fields?
>>
>> typedef struct blkptr {
>> 
>> uint64_tblk_phys_birth;  /* txg when block was allocated */
>> uint64_tblk_birth;  /* transaction group at birth   */
>> .
>> } blkptr_t;
>>
>>
> blk_birth_phys is when the block was allocated.
> blk_birth is when this particular reference was created.
>
> If the block is dedup'ed, blk_birth_phys can be before blk_birth.  (The
> block was allocated at time A, then another reference was added to that
> block at a later time B.)
>
> If they are the same (e.g. as is the case for all non-dedup'ed blocks), we
> only store blk_birth.  (I believe this is for backwards compatibility with
> software that doesn't understand dedup (pool version 21)).
>
> blk_birth_phys is used by the SPA, for example when resilvering devices
> that were temporarily offline.
>
> blk_birth is used by the DMU and DSL, for example to determine when to
> free a block, or what blocks should be part of a zfs send stream.
>
> --matt
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to zfs-devel+unsubscr...@zfsonlinux.org.
>
___
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer