[ 
https://issues.apache.org/jira/browse/KUDU-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15357612#comment-15357612
 ] 

Todd Lipcon commented on KUDU-1508:
-----------------------------------

To summarize the bug:
- an ext4 file is made up of a set of extents
- the extents are stored in a b-tree with 4KB "pages". Apparently after 
accounting for headers, etc, the root page can hold 340 extent pointers.
- If you have more than 340 extents in a file, then the root page ends up 
holding 340 pointers to other interior nodes, each of which has 340 extent 
pointers (just like you'd expect with a btree). 
https://digital-forensics.sans.org/blog/2011/03/28/digital-forensics-understanding-ext4-part-3-extent-trees
 is a good reference
- In our case of the log block manager, we can end up with a lot of extents in 
a file due to hole punching. Imagine a 1GB container file with 1000x1MB blocks. 
If every odd block is deleted, we'd need 500 extents after we've hole-punched 
the deleted blocks.
- This would normally be fine, except that the referenced bug means that ext4 
forgot to update the interior node pointers, which causes an inconsistency

It seems that 'fsck' is fine at fixing the inconsistency, and we haven't seen 
any runtime issues due to this bug. It may be entirely harmless. That said, 
it's problematic because when systems reboot they sometimes run fsck and may 
need manual intervention to tell fsck to fix the issue.

I looked through the kernel changelog and unfortunately this isn't fixed in any 
version of el6. It is, however, fixed in el7 and probably any Ubuntu from the 
last several years (it was fixed upstream in Dec 2012).

So, it seems we have a few choices here regarding this issue:

a) *Do nothing*- if indeed the problem is a 'harmless' ext4 corruption fixable 
by fsck, then we can just document this as an el6 issue, ask RedHat to backport 
this patch into the next maintenance kernel, and let users know that they may 
have to look out for this particular error if fsck runs.
b) *Try to avoid multi-level extent trees*- if we limit the number of blocks 
per container to a smaller number (say 300) then it's quite unlikely to meet 
this issue. It's not a sure thing (the system could have arbitrary amounts of 
fragmentation) but it is easy to implement and probably would make the issue 
rare enough to not be a problem.
c) *Recommend xfs on el6* - XFS has performed better in most of the tests I've 
run, and also doesn't not exhibit this bug. However, it's a lot to ask of new 
users who are installing Kudu on existing clusters that are running ext4.
d) *Avoid hole punching* - we could spend the time to build a block manager 
implementation that doesn't rely on hole punching. This is likely a lot of work.








> Log block manager triggers ext4 hole punching bug in el6
> --------------------------------------------------------
>
>                 Key: KUDU-1508
>                 URL: https://issues.apache.org/jira/browse/KUDU-1508
>             Project: Kudu
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 0.9.0
>            Reporter: Todd Lipcon
>            Priority: Blocker
>
> I've experienced many times that when I reboot an el6 node that was running 
> Kudu tservers, fsck reports issues like:
> data6 contains a file system with errors, check forced.
> data6: Interior extent node level 0 of inode 5259348:
> Logical start 154699 does not match logical start 2623046 at next level.  
> After some investigation, I've determined that this is due to an ext4 kernel 
> bug: https://patchwork.ozlabs.org/patch/206123/
> Details in a comment to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to