Igor,

your suggestion was certainly a good one, however I took a path of a
lesser effort and tested my workload on the latest illumos kernel:

panic[cpu3]/thread=ffffff000bc56c40: assertion failed:
ba.ba_phys->bt_bytes == 0 (0x400 == 0x0), file:
../../common/fs/zfs/bptree.c, line: 293

ffffff000bc56890 genunix:process_type+164b75 ()
ffffff000bc56a20 zfs:bptree_iterate+4bf ()
ffffff000bc56a90 zfs:dsl_scan_sync+17c ()
ffffff000bc56b50 zfs:spa_sync+2bb ()
ffffff000bc56c20 zfs:txg_sync_thread+260 ()
ffffff000bc56c30 unix:thread_start+8 ()

syncing file systems... done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
dumping:  0:34 100% done
100% done: 339495 pages dumped, dump succeeded
rebooting...

So, if anyone is interested I can provide any requested information from
the crash dump or try your debugging suggestions.

On 22/06/2016 17:45, Igor Kozhukhov wrote:
> based on your changeset number - it is old update:
> https://github.com/illumos/illumos-gate/commit/26455f9efcf9b1e44937d4d86d1ce37b006f25a9
> 6052 decouple lzc_create() from the implementation details
> 
> we have a lot of others changes in illumos tree and i can say - i have
> no panic on my system with gcc48 build - i have tested by zfs tests.
> 
> Maybe, as solution, you can try to merge to latest changes and try to
> check it again?
> i had panic with gcc48 build, but Matt pointed to some delphix update
> and we have upstreamed it and i have no panics any more with full list
> of zfs tests, what availabe on illumos tree.
> 
> best regards,
> -Igor
> 
> 
>> On Jun 22, 2016, at 5:17 PM, Andriy Gapon <a...@freebsd.org
>> <mailto:a...@freebsd.org>> wrote:
>>
>>
>> I am not yet convinced that the problem has anything to do with
>> miscompiled code.  I am using exactly the same optimizations and exactly
>> the same compiler as the official FreeBSD builds.
>>
>> On 22/06/2016 17:03, Igor Kozhukhov wrote:
>>> Hi Andri,
>>>
>>> i have DilOS with gcc-4.8,5 (+ special patches) for illumos builds.
>>> i had some problems with zdb - found it by zfs tests.
>>>
>>> problem has been fixed by disable of optimization :
>>> -fno-aggressive-loop-optimizations
>>>
>>> also, i have added:
>>> -fno-ipa-sra
>>>
>>> but i no remember a story why i have added it ;)
>>> probabbly it was added with another illumos component and new gcc-4.8
>>>
>>> As you know, illumos still is using gcc-4.4.4 and some newer compilers
>>> can produce new issues with older code :)
>>>
>>> I think, you can try to play with your clang optimization flags too.
>>> i have no experience with clang.
>>>
>>> best regards,
>>> -Igor
>>>
>>>
>>>> On Jun 22, 2016, at 4:21 PM, Andriy Gapon <a...@freebsd.org
>>>> <mailto:a...@freebsd.org>
>>>> <mailto:a...@freebsd.org>> wrote:
>>>>
>>>>
>>>> I am getting the following panic using the latest FreeBSD head that is
>>>> synchronized with OpenZFS code as of
>>>> illumos/illumos-gate@26455f9efcf9b1e44937d4d86d1ce37b006f25a9.
>>>>
>>>> panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0), file:
>>>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c,
>>>> line: 292
>>>> cpuid = 1
>>>> KDB: stack backtrace:
>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>>> 0xfffffe004db9d310
>>>> vpanic() at vpanic+0x182/frame 0xfffffe004db9d390
>>>> panic() at panic+0x43/frame 0xfffffe004db9d3f0
>>>> assfail3() at assfail3+0x2c/frame 0xfffffe004db9d410
>>>> bptree_iterate() at bptree_iterate+0x35e/frame 0xfffffe004db9d540
>>>> dsl_scan_sync() at dsl_scan_sync+0x24f/frame 0xfffffe004db9d890
>>>> spa_sync() at spa_sync+0x897/frame 0xfffffe004db9dad0
>>>> txg_sync_thread() at txg_sync_thread+0x383/frame 0xfffffe004db9dbb0
>>>> fork_exit() at fork_exit+0x84/frame 0xfffffe004db9dbf0
>>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004db9dbf0
>>>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>>>>
>>>> I have a crash dump, but unfortunately it's hard to work with it,
>>>> because a lot of useful information got "optimized out" by clang.
>>>>
>>>> I can reproduce the panic using a synthetic workload, but I do not have
>>>> a concise reproduction scenario.  Every time the panic happens bt_bytes
>>>> is 0x400, I haven't seen any other number there.
>>>>
>>>> Does anyone have an idea what could be causing this?
>>>> I can try any diagnostic code that might shed more light.
>>>> Thank you!
>>>>
>>>> -- 
>>>> Andriy Gapon
>>>>
>>>>
>>>> http://www.listbox.com <http://www.listbox.com/>
>>>
>>> *openzfs-developer* | Archives
>>> <https://www.listbox.com/member/archive/274414/=now>
>>> <https://www.listbox.com/member/archive/rss/274414/28133750-22ed9730> |
>>> Modify
>>> <https://www.listbox.com/member/?&;>
>>> Your Subscription[Powered by Listbox] <http://www.listbox.com
>>> <http://www.listbox.com/>>
>>>
>> 
>> 
>> --
>> Andriy Gapon
>> 
> 
> *openzfs-developer* | Archives
> <https://www.listbox.com/member/archive/274414/=now>
> <https://www.listbox.com/member/archive/rss/274414/28133750-22ed9730> |
> Modify
> <https://www.listbox.com/member/?&;>
> Your Subscription     [Powered by Listbox] <http://www.listbox.com>
> 


-- 
Andriy Gapon


-------------------------------------------
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com

Reply via email to