Re: [developer] panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0)

2016-07-03 Thread George Wilson
Andriy,

Can you give me some details about how you're able to reproduce this panic.
I would like to help debug this. I'm also looking into the range_tree()
 panic, so any details you can provide would be very helpful.

If you can publish the crash dumps, I can also download them and take a
look.

Thanks,
George

On Wed, Jun 22, 2016 at 4:53 PM, Andriy Gapon  wrote:

>
> Igor,
>
> your suggestion was certainly a good one, however I took a path of a
> lesser effort and tested my workload on the latest illumos kernel:
>
> panic[cpu3]/thread=ff000bc56c40: assertion failed:
> ba.ba_phys->bt_bytes == 0 (0x400 == 0x0), file:
> ../../common/fs/zfs/bptree.c, line: 293
>
> ff000bc56890 genunix:process_type+164b75 ()
> ff000bc56a20 zfs:bptree_iterate+4bf ()
> ff000bc56a90 zfs:dsl_scan_sync+17c ()
> ff000bc56b50 zfs:spa_sync+2bb ()
> ff000bc56c20 zfs:txg_sync_thread+260 ()
> ff000bc56c30 unix:thread_start+8 ()
>
> syncing file systems... done
> dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
> dumping:  0:34 100% done
> 100% done: 339495 pages dumped, dump succeeded
> rebooting...
>
> So, if anyone is interested I can provide any requested information from
> the crash dump or try your debugging suggestions.
>
> On 22/06/2016 17:45, Igor Kozhukhov wrote:
> > based on your changeset number - it is old update:
> >
> https://github.com/illumos/illumos-gate/commit/26455f9efcf9b1e44937d4d86d1ce37b006f25a9
> > 6052 decouple lzc_create() from the implementation details
> >
> > we have a lot of others changes in illumos tree and i can say - i have
> > no panic on my system with gcc48 build - i have tested by zfs tests.
> >
> > Maybe, as solution, you can try to merge to latest changes and try to
> > check it again?
> > i had panic with gcc48 build, but Matt pointed to some delphix update
> > and we have upstreamed it and i have no panics any more with full list
> > of zfs tests, what availabe on illumos tree.
> >
> > best regards,
> > -Igor
> >
> >
> >> On Jun 22, 2016, at 5:17 PM, Andriy Gapon  >> > wrote:
> >>
> >>
> >> I am not yet convinced that the problem has anything to do with
> >> miscompiled code.  I am using exactly the same optimizations and exactly
> >> the same compiler as the official FreeBSD builds.
> >>
> >> On 22/06/2016 17:03, Igor Kozhukhov wrote:
> >>> Hi Andri,
> >>>
> >>> i have DilOS with gcc-4.8,5 (+ special patches) for illumos builds.
> >>> i had some problems with zdb - found it by zfs tests.
> >>>
> >>> problem has been fixed by disable of optimization :
> >>> -fno-aggressive-loop-optimizations
> >>>
> >>> also, i have added:
> >>> -fno-ipa-sra
> >>>
> >>> but i no remember a story why i have added it ;)
> >>> probabbly it was added with another illumos component and new gcc-4.8
> >>>
> >>> As you know, illumos still is using gcc-4.4.4 and some newer compilers
> >>> can produce new issues with older code :)
> >>>
> >>> I think, you can try to play with your clang optimization flags too.
> >>> i have no experience with clang.
> >>>
> >>> best regards,
> >>> -Igor
> >>>
> >>>
>  On Jun 22, 2016, at 4:21 PM, Andriy Gapon   
>  > wrote:
> 
> 
>  I am getting the following panic using the latest FreeBSD head that is
>  synchronized with OpenZFS code as of
>  illumos/illumos-gate@26455f9efcf9b1e44937d4d86d1ce37b006f25a9.
> 
>  panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0), file:
>  /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c,
>  line: 292
>  cpuid = 1
>  KDB: stack backtrace:
>  db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>  0xfe004db9d310
>  vpanic() at vpanic+0x182/frame 0xfe004db9d390
>  panic() at panic+0x43/frame 0xfe004db9d3f0
>  assfail3() at assfail3+0x2c/frame 0xfe004db9d410
>  bptree_iterate() at bptree_iterate+0x35e/frame 0xfe004db9d540
>  dsl_scan_sync() at dsl_scan_sync+0x24f/frame 0xfe004db9d890
>  spa_sync() at spa_sync+0x897/frame 0xfe004db9dad0
>  txg_sync_thread() at txg_sync_thread+0x383/frame 0xfe004db9dbb0
>  fork_exit() at fork_exit+0x84/frame 0xfe004db9dbf0
>  fork_trampoline() at fork_trampoline+0xe/frame 0xfe004db9dbf0
>  --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> 
>  I have a crash dump, but unfortunately it's hard to work with it,
>  because a lot of useful information got "optimized out" by clang.
> 
>  I can reproduce the panic using a synthetic workload, but I do not
> have
>  a concise reproduction scenario.  Every time the panic happens
> bt_bytes
>  is 0x400, I haven't seen any other number there.
> 
>  Does anyone have an idea what could be causing this?
>  I can try any diagnostic code that might shed more light.
>  Thank you!
> 
>  --
>  

Re: [developer] panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0)

2016-06-22 Thread Andriy Gapon

Igor,

your suggestion was certainly a good one, however I took a path of a
lesser effort and tested my workload on the latest illumos kernel:

panic[cpu3]/thread=ff000bc56c40: assertion failed:
ba.ba_phys->bt_bytes == 0 (0x400 == 0x0), file:
../../common/fs/zfs/bptree.c, line: 293

ff000bc56890 genunix:process_type+164b75 ()
ff000bc56a20 zfs:bptree_iterate+4bf ()
ff000bc56a90 zfs:dsl_scan_sync+17c ()
ff000bc56b50 zfs:spa_sync+2bb ()
ff000bc56c20 zfs:txg_sync_thread+260 ()
ff000bc56c30 unix:thread_start+8 ()

syncing file systems... done
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
dumping:  0:34 100% done
100% done: 339495 pages dumped, dump succeeded
rebooting...

So, if anyone is interested I can provide any requested information from
the crash dump or try your debugging suggestions.

On 22/06/2016 17:45, Igor Kozhukhov wrote:
> based on your changeset number - it is old update:
> https://github.com/illumos/illumos-gate/commit/26455f9efcf9b1e44937d4d86d1ce37b006f25a9
> 6052 decouple lzc_create() from the implementation details
> 
> we have a lot of others changes in illumos tree and i can say - i have
> no panic on my system with gcc48 build - i have tested by zfs tests.
> 
> Maybe, as solution, you can try to merge to latest changes and try to
> check it again?
> i had panic with gcc48 build, but Matt pointed to some delphix update
> and we have upstreamed it and i have no panics any more with full list
> of zfs tests, what availabe on illumos tree.
> 
> best regards,
> -Igor
> 
> 
>> On Jun 22, 2016, at 5:17 PM, Andriy Gapon > > wrote:
>>
>>
>> I am not yet convinced that the problem has anything to do with
>> miscompiled code.  I am using exactly the same optimizations and exactly
>> the same compiler as the official FreeBSD builds.
>>
>> On 22/06/2016 17:03, Igor Kozhukhov wrote:
>>> Hi Andri,
>>>
>>> i have DilOS with gcc-4.8,5 (+ special patches) for illumos builds.
>>> i had some problems with zdb - found it by zfs tests.
>>>
>>> problem has been fixed by disable of optimization :
>>> -fno-aggressive-loop-optimizations
>>>
>>> also, i have added:
>>> -fno-ipa-sra
>>>
>>> but i no remember a story why i have added it ;)
>>> probabbly it was added with another illumos component and new gcc-4.8
>>>
>>> As you know, illumos still is using gcc-4.4.4 and some newer compilers
>>> can produce new issues with older code :)
>>>
>>> I think, you can try to play with your clang optimization flags too.
>>> i have no experience with clang.
>>>
>>> best regards,
>>> -Igor
>>>
>>>
 On Jun 22, 2016, at 4:21 PM, Andriy Gapon 
 > wrote:


 I am getting the following panic using the latest FreeBSD head that is
 synchronized with OpenZFS code as of
 illumos/illumos-gate@26455f9efcf9b1e44937d4d86d1ce37b006f25a9.

 panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0), file:
 /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c,
 line: 292
 cpuid = 1
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
 0xfe004db9d310
 vpanic() at vpanic+0x182/frame 0xfe004db9d390
 panic() at panic+0x43/frame 0xfe004db9d3f0
 assfail3() at assfail3+0x2c/frame 0xfe004db9d410
 bptree_iterate() at bptree_iterate+0x35e/frame 0xfe004db9d540
 dsl_scan_sync() at dsl_scan_sync+0x24f/frame 0xfe004db9d890
 spa_sync() at spa_sync+0x897/frame 0xfe004db9dad0
 txg_sync_thread() at txg_sync_thread+0x383/frame 0xfe004db9dbb0
 fork_exit() at fork_exit+0x84/frame 0xfe004db9dbf0
 fork_trampoline() at fork_trampoline+0xe/frame 0xfe004db9dbf0
 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---

 I have a crash dump, but unfortunately it's hard to work with it,
 because a lot of useful information got "optimized out" by clang.

 I can reproduce the panic using a synthetic workload, but I do not have
 a concise reproduction scenario.  Every time the panic happens bt_bytes
 is 0x400, I haven't seen any other number there.

 Does anyone have an idea what could be causing this?
 I can try any diagnostic code that might shed more light.
 Thank you!

 -- 
 Andriy Gapon


 http://www.listbox.com 
>>>
>>> *openzfs-developer* | Archives
>>> 
>>>  |
>>> Modify
>>> 
>>> Your Subscription[Powered by Listbox] >> >
>>>
>> 
>> 
>> --
>> Andriy Gapon
>> 
> 
> *openzfs-developer* | Archives
> 
>  |
> Modify
> 

Re: [developer] panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0)

2016-06-22 Thread Igor Kozhukhov
based on your changeset number - it is old update:
https://github.com/illumos/illumos-gate/commit/26455f9efcf9b1e44937d4d86d1ce37b006f25a9
6052 decouple lzc_create() from the implementation details

we have a lot of others changes in illumos tree and i can say - i have no panic 
on my system with gcc48 build - i have tested by zfs tests.

Maybe, as solution, you can try to merge to latest changes and try to check it 
again?
i had panic with gcc48 build, but Matt pointed to some delphix update and we 
have upstreamed it and i have no panics any more with full list of zfs tests, 
what availabe on illumos tree.

best regards,
-Igor


> On Jun 22, 2016, at 5:17 PM, Andriy Gapon  wrote:
> 
> 
> I am not yet convinced that the problem has anything to do with
> miscompiled code.  I am using exactly the same optimizations and exactly
> the same compiler as the official FreeBSD builds.
> 
> On 22/06/2016 17:03, Igor Kozhukhov wrote:
>> Hi Andri,
>> 
>> i have DilOS with gcc-4.8,5 (+ special patches) for illumos builds.
>> i had some problems with zdb - found it by zfs tests.
>> 
>> problem has been fixed by disable of optimization :
>> -fno-aggressive-loop-optimizations
>> 
>> also, i have added:
>> -fno-ipa-sra
>> 
>> but i no remember a story why i have added it ;)
>> probabbly it was added with another illumos component and new gcc-4.8
>> 
>> As you know, illumos still is using gcc-4.4.4 and some newer compilers
>> can produce new issues with older code :)
>> 
>> I think, you can try to play with your clang optimization flags too.
>> i have no experience with clang.
>> 
>> best regards,
>> -Igor
>> 
>> 
>>> On Jun 22, 2016, at 4:21 PM, Andriy Gapon >> >> wrote:
>>> 
>>> 
>>> I am getting the following panic using the latest FreeBSD head that is
>>> synchronized with OpenZFS code as of
>>> illumos/illumos-gate@26455f9efcf9b1e44937d4d86d1ce37b006f25a9.
>>> 
>>> panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0), file:
>>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c,
>>> line: 292
>>> cpuid = 1
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>> 0xfe004db9d310
>>> vpanic() at vpanic+0x182/frame 0xfe004db9d390
>>> panic() at panic+0x43/frame 0xfe004db9d3f0
>>> assfail3() at assfail3+0x2c/frame 0xfe004db9d410
>>> bptree_iterate() at bptree_iterate+0x35e/frame 0xfe004db9d540
>>> dsl_scan_sync() at dsl_scan_sync+0x24f/frame 0xfe004db9d890
>>> spa_sync() at spa_sync+0x897/frame 0xfe004db9dad0
>>> txg_sync_thread() at txg_sync_thread+0x383/frame 0xfe004db9dbb0
>>> fork_exit() at fork_exit+0x84/frame 0xfe004db9dbf0
>>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe004db9dbf0
>>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>>> 
>>> I have a crash dump, but unfortunately it's hard to work with it,
>>> because a lot of useful information got "optimized out" by clang.
>>> 
>>> I can reproduce the panic using a synthetic workload, but I do not have
>>> a concise reproduction scenario.  Every time the panic happens bt_bytes
>>> is 0x400, I haven't seen any other number there.
>>> 
>>> Does anyone have an idea what could be causing this?
>>> I can try any diagnostic code that might shed more light.
>>> Thank you!
>>> 
>>> -- 
>>> Andriy Gapon
>>> 
>>> 
>>> http://www.listbox.com 
>> 
>> *openzfs-developer* | Archives
>> > >
>> > > |
>> Modify
>> >
>> Your Subscription[Powered by Listbox] > >
>> 
> 
> 
> --
> Andriy Gapon
> 



---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com


Re: [developer] panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0)

2016-06-22 Thread Andriy Gapon

I am not yet convinced that the problem has anything to do with
miscompiled code.  I am using exactly the same optimizations and exactly
the same compiler as the official FreeBSD builds.

On 22/06/2016 17:03, Igor Kozhukhov wrote:
> Hi Andri,
> 
> i have DilOS with gcc-4.8,5 (+ special patches) for illumos builds.
> i had some problems with zdb - found it by zfs tests.
> 
> problem has been fixed by disable of optimization :
> -fno-aggressive-loop-optimizations
> 
> also, i have added:
> -fno-ipa-sra
> 
> but i no remember a story why i have added it ;)
> probabbly it was added with another illumos component and new gcc-4.8
> 
> As you know, illumos still is using gcc-4.4.4 and some newer compilers
> can produce new issues with older code :)
> 
> I think, you can try to play with your clang optimization flags too.
> i have no experience with clang.
> 
> best regards,
> -Igor
> 
> 
>> On Jun 22, 2016, at 4:21 PM, Andriy Gapon > > wrote:
>>
>>
>> I am getting the following panic using the latest FreeBSD head that is
>> synchronized with OpenZFS code as of
>> illumos/illumos-gate@26455f9efcf9b1e44937d4d86d1ce37b006f25a9.
>>
>> panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0), file:
>> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c,
>> line: 292
>> cpuid = 1
>> KDB: stack backtrace:
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> 0xfe004db9d310
>> vpanic() at vpanic+0x182/frame 0xfe004db9d390
>> panic() at panic+0x43/frame 0xfe004db9d3f0
>> assfail3() at assfail3+0x2c/frame 0xfe004db9d410
>> bptree_iterate() at bptree_iterate+0x35e/frame 0xfe004db9d540
>> dsl_scan_sync() at dsl_scan_sync+0x24f/frame 0xfe004db9d890
>> spa_sync() at spa_sync+0x897/frame 0xfe004db9dad0
>> txg_sync_thread() at txg_sync_thread+0x383/frame 0xfe004db9dbb0
>> fork_exit() at fork_exit+0x84/frame 0xfe004db9dbf0
>> fork_trampoline() at fork_trampoline+0xe/frame 0xfe004db9dbf0
>> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>>
>> I have a crash dump, but unfortunately it's hard to work with it,
>> because a lot of useful information got "optimized out" by clang.
>>
>> I can reproduce the panic using a synthetic workload, but I do not have
>> a concise reproduction scenario.  Every time the panic happens bt_bytes
>> is 0x400, I haven't seen any other number there.
>>
>> Does anyone have an idea what could be causing this?
>> I can try any diagnostic code that might shed more light.
>> Thank you!
>>
>> -- 
>> Andriy Gapon
>>
>>
>> http://www.listbox.com
> 
> *openzfs-developer* | Archives
> 
>  |
> Modify
> 
> Your Subscription [Powered by Listbox] 
> 


-- 
Andriy Gapon


---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com


Re: [developer] panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0)

2016-06-22 Thread Igor Kozhukhov
Hi Andri,

i have DilOS with gcc-4.8,5 (+ special patches) for illumos builds.
i had some problems with zdb - found it by zfs tests.

problem has been fixed by disable of optimization :
-fno-aggressive-loop-optimizations

also, i have added:
-fno-ipa-sra

but i no remember a story why i have added it ;)
probabbly it was added with another illumos component and new gcc-4.8

As you know, illumos still is using gcc-4.4.4 and some newer compilers can 
produce new issues with older code :)

I think, you can try to play with your clang optimization flags too.
i have no experience with clang.

best regards,
-Igor


> On Jun 22, 2016, at 4:21 PM, Andriy Gapon  wrote:
> 
> I am getting the following panic using the latest FreeBSD head that is
> synchronized with OpenZFS code as of
> illumos/illumos-gate@26455f9efcf9b1e44937d4d86d1ce37b006f25a9.
> 
> panic: solaris assert: ba.ba_phys->bt_bytes == 0 (0x400 == 0x0), file:
> /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bptree.c, line: 292
> cpuid = 1
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfe004db9d310
> vpanic() at vpanic+0x182/frame 0xfe004db9d390
> panic() at panic+0x43/frame 0xfe004db9d3f0
> assfail3() at assfail3+0x2c/frame 0xfe004db9d410
> bptree_iterate() at bptree_iterate+0x35e/frame 0xfe004db9d540
> dsl_scan_sync() at dsl_scan_sync+0x24f/frame 0xfe004db9d890
> spa_sync() at spa_sync+0x897/frame 0xfe004db9dad0
> txg_sync_thread() at txg_sync_thread+0x383/frame 0xfe004db9dbb0
> fork_exit() at fork_exit+0x84/frame 0xfe004db9dbf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfe004db9dbf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> 
> I have a crash dump, but unfortunately it's hard to work with it,
> because a lot of useful information got "optimized out" by clang.
> 
> I can reproduce the panic using a synthetic workload, but I do not have
> a concise reproduction scenario.  Every time the panic happens bt_bytes
> is 0x400, I haven't seen any other number there.
> 
> Does anyone have an idea what could be causing this?
> I can try any diagnostic code that might shed more light.
> Thank you!
> 
> --
> Andriy Gapon
> 



---
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com