On Thu, Feb 24, 2011 at 5:14 PM, Mitch Harder
<mitch.har...@sabayonlinux.org> wrote:
> On Thu, Feb 24, 2011 at 10:32 AM, Mitch Harder
> <mitch.har...@sabayonlinux.org> wrote:
>> On Thu, Feb 24, 2011 at 10:19 AM, Chris Mason <chris.ma...@oracle.com> wrote:
>>> Excerpts from Mitch Harder's message of 2011-02-24 11:03:07 -0500:
>>>> On Thu, Feb 24, 2011 at 10:00 AM, Chris Mason <chris.ma...@oracle.com> 
>>>> wrote:
>>>> > Excerpts from Mitch Harder's message of 2011-02-24 10:55:15 -0500:
>>>> >> 2011/2/24 Maria Wikström <ma...@ponstudios.se>:
>>>> >> > mån 2011-02-21 klockan 09:51 +0800 skrev Zhong, Xin:
>>>> >> >> The backtrace in your attachment looks like a known bug of 2.6.37 
>>>> >> >> which have already been fixed in 2.6.38. I have no idea why latest 
>>>> >> >> btrfs still hang in your environment if there's no debug info...
>>>> >> >>
>>>> >> >
>>>> >> > Haha, yes that's very hard :)
>>>> >> >
>>>> >> > 2.6.38-rc6 and btrfs-unstable behaves the same way. I can close the
>>>> >> > process with ctrl+c and it disappear a few seconds later. There is no
>>>> >> > CPU usage. Reading works because I can start htop and watch "svn info"
>>>> >> > disappear, but everything writing to btrfs slows down to a crawl. It
>>>> >> > takes about 1 minute to log in. So I had to put the logs on an other
>>>> >> > partition using ext3 to get the output from sysrq+t.
>>>> >> >
>>>> >>
>>>> >> I believe I've been experiencing this issue also.  However, my problem
>>>> >> usually results in a "No space left on device" error rather than a
>>>> >> lock-up or crash.  But I've bisected my issue to this patch, and my
>>>> >> "btrfs fi show" and "btrfs fi df" looks similar to others who've
>>>> >> posted to this tread with all my space being allocated, but not used.
>>>> >>
>>>> >
>>>> > Sorry, which patch did you bisect the problem down to?
>>>> >
>>>>
>>>> The patch at the head of this thread:
>>>>
>>>> Btrfs: pwrite blocked when writing from the mmaped buffer of the same page
>>>
>>> Hmmm, that patch shouldn't be changing our performance under delalloc
>>> pressure, and it really shouldn't impact early enospc.
>>>
>>
>> I've bisected this issue around where this patch went into git, and
>> I've also constructed a testing patch that reverts this patch, and
>> placed it on top of the current Btrfs git sources (I understand this
>> patch addresses a real issue, this was just for testing).
>>
>> It could be that this patch just "uncovers" another problem, but all
>> my tests seem to point to this patch triggering this issue.
>>
>
> I don't belief the previous ftrace I supplied had a large enough scope
> to capture the issue.
>
> I've expanded my ftrace buffer, and filtered out everything but btrfs*
> function calls ("# echo btrfs* >
> /sys/kernel/debug/tracing/set_ftrace_filter").
>
> In this trace, I see btrfs spending a great deal of time in a while
> loop (while (iov_iter_count(&i) > 0) {)) in the btrfs_file_aio_write()
> function in file.c without exiting the function.
>
> I'm going to try to inject some debugging trace_printk() statements to
> find if that portion of code is proceeding normally with my test case.
>
> I've put my expanded trace up on my local server, but my upload
> bandwidth is pretty sad, and it may take a few minutes to transfer
> even though it's only a 6MB file.
>
> http://dontpanic.dyndns.org/trace-openmotif-btrfs-v3.gz
>

Apologies for only hitting "Reply" instead of "Reply-All" on my last message.

I've inserted additional trace_printk() to the btrfs_file_aio_write()
and btrfs_copy_from_user() function in file.c in order to characterize
the problem I've been encountering.

I can see btrfs getting stuck in a loop in the "while
(iov_iter_count(&i) > 0) {}" portion of the btrfs_file_aio_write()
function.

The loop is more-or-less following this process (from within the
"while (iov_iter_count(&i) > 0) {}" loop):

(1) Reserve some space with btrfs_delalloc_reserve_space()
(2) Prepare the reserved space with prepare_pages()
(3) Call btrfs_copy_from_user() to copy to the prepared space.
-------------> From btrfs_copy_from_user()
(4) ........Try to copy with copied = iov_iter_copy_from_user_atomic()
(5) ........The above operation results with copied == 0. Break and
return with a return value of 0 bytes copied.
(6) There is no special handling for copied == 0 in the "while
(iov_iter_count(&i) > 0) {}" loop, so it loops back around, reserves
some more space, and tries again.

If I look back at how the code was set up before the patch at the head
of this thread was applied (Btrfs: pwrite blocked when writing from
the mmaped buffer of the same page), the btrfs_copy_from_user()
function had some handling for "copied == 0" that would change the
scope of the amount to write, and loop back to try the write again.

I attempted to construct a patch that just reverted the handling for
"copied == 0" in btrfs_copy_from_user(), however, that just resulted
in my computer locking up when it reached the point where it was
previously beginning to allocate disk space.

So, I apologize for not having a patch to address the issue I'm
seeing, but I hope I've added some insight.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to