> On Feb 5, 2021, at 3:06 AM, Nikolay Borisov <nbori...@suse.com> wrote:
> 
>> +
>> +/*
>> + * drop all the items for this inode with this key_type.  Before
>> + * doing a verity enable we cleanup any existing verity items.
>> + *
>> + * This is also used to clean up if a verity enable failed half way
>> + * through
>> + */
>> +static int drop_verity_items(struct btrfs_inode *inode, u8 key_type)
>> +{
> 
> You should ideally be using btrfs_truncate_inode_items as it also
> implements throttling policies and keeps everything in one place. If for
> any reason that interface is not sufficient I'd rather see it refactored
> and broken down in smaller pieces than just copying stuff around, this
> just increments the maintenance burden.
> 

btrfs_truncate_inode_items is already complex, and it’s designed for things 
that deal with changes to i_size.  It would have to be reworked to remove the 
assumption that we’re zapping unconditionally from high offsets to low.

Maybe once we’ve finalized everything else about fsverity, it makes sense to 
optimize drop_verity_items and fold it into the other truncate helper, but the 
function as it stands is easy to review and has no risks of adding regressions 
outside of fsverity.  The important part now is getting the disk format nailed 
down and basic functionality right.

>> +
>> +            /* desc = NULL to just sum all the item lengths */
> 
> nit: typo - dest instead of desc
> 

Whoops, that came from the original ext4 code and I didn’t swap the comment.

>> +
>> +/*
>> + * fsverity calls this to ask us to setup the inode for enabling.  We
>> + * drop any existing verity items and set the in progress bit
>> + */
>> +static int btrfs_begin_enable_verity(struct file *filp)
>> +{
>> +    struct inode *inode = file_inode(filp);
>> +    int ret;
>> +
>> +    if (test_bit(BTRFS_INODE_VERITY_IN_PROGRESS, 
>> &BTRFS_I(inode)->runtime_flags))
>> +            return -EBUSY;
>> +
>> +    /*
>> +     * ext4 adds the inode to the orphan list here, presumably because the
>> +     * truncate done at orphan processing time will delete partial
>> +     * measurements.  TODO: setup orphans
>> +     */
> 
> Any plans when orphan support is going to be implemented?
> 

I was on the fence about ignoring them completely.  The space isn’t leaked, and 
orphans are bad enough already.  It wouldn’t be hard to make the orphan code 
check for incomplete fsverity enables though.

>> +    set_bit(BTRFS_INODE_VERITY_IN_PROGRESS, &BTRFS_I(inode)->runtime_flags);
>> +    ret = drop_verity_items(BTRFS_I(inode), BTRFS_VERITY_DESC_ITEM_KEY);
>> +    if (ret)
>> +            goto err;
>> +
>> +    ret = drop_verity_items(BTRFS_I(inode), BTRFS_VERITY_MERKLE_ITEM_KEY);
>> +    if (ret)
>> +            goto err;
> 
> A slightly higher-level question:
> 
> In drop_verity_items you are doing transaction-per-item, so what happens
> during a crash and only some of the items being deleted? Is this fine,
> presumably that'd make the file unreadable? I.e what should be the
> granule of atomicity when dealing with verity items - all or nothing or
> per-item is sufficient?

Just needs to rerun the verity enable after the crash.  The file won’t be half 
verity enabled because the bit isn’t set until after all of the verity items 
are inserted.

> 
>> +
>> +    return 0;
>> +
>> +err:
>> +    clear_bit(BTRFS_INODE_VERITY_IN_PROGRESS, 
>> &BTRFS_I(inode)->runtime_flags);
>> +    return ret;
>> +
>> +}
>> +
>> +/*
>> + * fsverity calls this when it's done with all of the pages in the file
>> + * and all of the merkle items have been inserted.  We write the
>> + * descriptor and update the inode in the btree to reflect it's new life
>> + * as a verity file
>> + */
>> +static int btrfs_end_enable_verity(struct file *filp, const void *desc,
>> +                              size_t desc_size, u64 merkle_tree_size)
>> +{
>> +    struct btrfs_trans_handle *trans;
>> +    struct inode *inode = file_inode(filp);
>> +    struct btrfs_root *root = BTRFS_I(inode)->root;
>> +    int ret;
>> +
>> +    if (desc != NULL) {
> 
> Redundant check as the descriptor can never be null as per enable_verity.
> 

Take a look at the rollback goto:

rollback:
        inode_lock(inode);
        (void)vops->end_enable_verity(filp, NULL, 0, params.tree_size);
        inode_unlock(inode);
        goto out

>> +            /* write out the descriptor */
>> +            ret = write_key_bytes(BTRFS_I(inode),
>> +                                  BTRFS_VERITY_DESC_ITEM_KEY, 0,
>> +                                  desc, desc_size);
>> +            if (ret)
>> +                    goto out;
>> +
>> +            /* update our inode flags to include fs verity */
>> +            trans = btrfs_start_transaction(root, 1);
>> +            if (IS_ERR(trans)) {
>> +                    ret = PTR_ERR(trans);
>> +                    goto out;
>> +            }
>> +            BTRFS_I(inode)->compat_flags |= BTRFS_INODE_VERITY;
>> +            btrfs_sync_inode_flags_to_i_flags(inode);
>> +            ret = btrfs_update_inode(trans, root, BTRFS_I(inode));
>> +            btrfs_end_transaction(trans);
>> +    }
>> +
>> +out:
>> +    if (desc == NULL || ret) {
> 
> Just checking for ret is sufficient.

See above, the desc check is required.

>> +static struct page *btrfs_read_merkle_tree_page(struct inode *inode,
>> +                                           pgoff_t index,
>> +                                           unsigned long num_ra_pages)
>> +{
>> +    struct page *p;
>> +    u64 start = index << PAGE_SHIFT;
>> +    unsigned long mapping_index;
>> +    ssize_t ret;
>> +    int err;
>> +
>> +    /*
>> +     * the file is readonly, so i_size can't change here.  We jump
>> +     * some pages past the last page to cache our merkles.  The goal
>> +     * is just to jump past any hugepages that might be mapped in.
>> +     */
>> +    mapping_index = (i_size_read(inode) >> PAGE_SHIFT) + 2048 + index;
> 
> So what does this magic number 2048 mean, how was it derived? Perhaps
> give it a symbolic name either via a local const var or via a define,
> something like
> 
> #define INODE_VERITY_EOF_OFFSET 2048 or something along those lines.
> 

No objections to the constant, but the offset is pretty arbitrary and not 
repeated anywhere else.  It makes more sense to just comment it where we use 
it, because INODE_VERITY_EOF_OFFSET looks like an important number and 2048 is 
obviously pulled out of the sky.

> If the file is RO then why do you need to add such a large offset, it's
> not clear what the hugepages issue is.


It’s possible for executable pages to be turned into hugepages by the VM, and 
there are various long term efforts to hugify all the things.  Jumping far 
enough away that we make sure we’re not overlapping one of those hugepages 
seems like a good idea.

-chris

Reply via email to