It sounds like you're saying that you hit the problem when sending from new -> old, but not when sending the same filesystems from old -> old? Another clue here could be the output of: zfs send ... | zstreamdump -v | gzip >file.gz Though that may be redundant with the dtrace output I mentioned. But if you could get the zstreamdump from both the new and old systems, we could compare them to determine what's happening differently.
--matt On Tue, Nov 8, 2016 at 4:22 PM, Matthew Ahrens <mahr...@delphix.com> wrote: > > > On Thu, Nov 3, 2016 at 6:18 AM, Hetrick, Joseph P < > joseph-hetr...@uiowa.edu> wrote: > >> Per Alex suggestion to see where ZFS is at during the hang period: >> >> THREAD STATE SOBJ COUNT >> ffffff007a8b3c40 SLEEP CV 3 >> swtch+0x145 >> cv_timedwait_hires+0xe0 >> cv_timedwait+0x5a >> txg_thread_wait+0x7c >> txg_sync_thread+0x118 >> thread_start+8 >> >> ffffff007a292c40 SLEEP CV 3 >> swtch+0x145 >> cv_wait+0x61 >> spa_thread+0x225 >> thread_start+8 >> >> ffffff007a8aac40 SLEEP CV 3 >> swtch+0x145 >> cv_wait+0x61 >> txg_thread_wait+0x5f >> txg_quiesce_thread+0x94 >> thread_start+8 >> >> ffffff007a1bbc40 SLEEP CV 1 >> swtch+0x145 >> cv_timedwait_hires+0xe0 >> cv_timedwait+0x5a >> arc_reclaim_thread+0x13d >> thread_start+8 >> >> ffffff007a1c1c40 SLEEP CV 1 >> swtch+0x145 >> cv_timedwait_hires+0xe0 >> cv_timedwait+0x5a >> l2arc_feed_thread+0xa1 >> thread_start+8 >> >> ffffff11bde0f4a0 ONPROC <NONE> 1 >> mutex_exit >> dbuf_hold_impl+0x81 >> dnode_next_offset_level+0xee >> dnode_next_offset+0xa2 >> dmu_object_next+0x54 >> restore_freeobjects+0x7e >> dmu_recv_stream+0x7f1 >> zfs_ioc_recv+0x416 >> zfsdev_ioctl+0x347 >> cdev_ioctl+0x45 >> spec_ioctl+0x5a >> fop_ioctl+0x7b >> ioctl+0x18e >> _sys_sysenter_post_swapgs+0x149 >> >> echo "::stacks -m zfs" |mdb -k >> THREAD STATE SOBJ COUNT >> ffffff007a8b3c40 SLEEP CV 3 >> swtch+0x145 >> cv_timedwait_hires+0xe0 >> cv_timedwait+0x5a >> txg_thread_wait+0x7c >> txg_sync_thread+0x118 >> thread_start+8 >> >> ffffff007a292c40 SLEEP CV 3 >> swtch+0x145 >> cv_wait+0x61 >> spa_thread+0x225 >> thread_start+8 >> >> ffffff007a8aac40 SLEEP CV 3 >> swtch+0x145 >> cv_wait+0x61 >> txg_thread_wait+0x5f >> txg_quiesce_thread+0x94 >> thread_start+8 >> >> ffffff007a1bbc40 SLEEP CV 1 >> swtch+0x145 >> cv_timedwait_hires+0xe0 >> cv_timedwait+0x5a >> arc_reclaim_thread+0x13d >> thread_start+8 >> >> ffffff007a1c1c40 SLEEP CV 1 >> swtch+0x145 >> cv_timedwait_hires+0xe0 >> cv_timedwait+0x5a >> l2arc_feed_thread+0xa1 >> thread_start+8 >> >> ffffff11bde0f4a0 ONPROC <NONE> 1 >> dbuf_hash+0xdc >> 0xffffff11ca05c460 >> dbuf_hold_impl+0x59 >> dnode_next_offset_level+0xee >> dnode_next_offset+0xa2 >> dmu_object_next+0x54 >> restore_freeobjects+0x7e >> dmu_recv_stream+0x7f1 >> zfs_ioc_recv+0x416 >> zfsdev_ioctl+0x347 >> cdev_ioctl+0x45 >> spec_ioctl+0x5a >> fop_ioctl+0x7b >> ioctl+0x18e >> _sys_sysenter_post_swapgs+0x149 >> > > Are you sure that it's hung? This stack trace seems to indicate that the > receive is running, and processing a FREEOBJECTS record. It's possible > that this is for a huge number of objects, which could take a long time > (perhaps more than it should). > > If you can reproduce this, can you capture the record we are processing, > e.g. with dtrace: > dtrace -n 'restore_freeobjects:entry{print(*args[1])}' > The last thing printed should be the one that we "hang" on. > > FYI - you must be running bits that do not include this commit, which > renamed restore_freeobjects(). > > commit a2cdcdd260232b58202b11a9bfc0103c9449ed52 > Author: Paul Dagnelie <p...@delphix.com> > Date: Fri Jul 17 14:51:38 2015 -0700 > > 5960 zfs recv should prefetch indirect blocks > 5925 zfs receive -o origin= > Reviewed by: Prakash Surya <prakash.su...@delphix.com> > Reviewed by: Matthew Ahrens <mahr...@delphix.com> > > --matt > > >> >> Where the action was: >> >> zfs recv -v dpool01/wtf <test-15-out >> receiving full stream of dpool01/test@now into dpool01/wtf@now >> >> test-15-out is zfs send dpool01/test@now >test-15-out and then sent to >> the node >> >> It’s only about 48k in size; no filesystem data (though, problem exists >> when I have a filesystem with data). >> >> I’ve created a few identical filesystems on a few nodes and done some >> hex compares with them; but nothing extensive beyond “I differences”. >> >> Thanks Alex, >> >> Joe >> >> On 11/2/16, 11:15 AM, "Hetrick, Joseph P" <joseph-hetr...@uiowa.edu> >> wrote: >> >> Hi folks, >> >> We’ve run into an odd issue that seems concerning. >> >> Our shop runs OpenIndiana and we’ve got several versions in play. >> Recently while testing a new system which is much more recent (bleeding >> edge OI Hipster release) we discovered that zfs sends to older systems >> caused hangs. By older, we’re talking same zfs/zpool versions of 5/28 and >> no visible properties differences. >> >> Can provide more info if told what is useful; but the gist is that: >> >> Zfs send of a vanilla dataset (no properties defined other than >> defaults) to any “older” system causes the recv to hang, eventually the >> host will crash. Truss’ing the recvr process doesn’t seem to give a lot of >> info as to the cause. Filesystem snapshot is received; and then that’s it. >> >> No fancy send or recv args in play (zfs send dataset via netcat or >> mbuffer or ssh to a recv –v <dest>. >> >> A close comparision of zfs and pool properties shows no difference. On >> a whim we even created pools and datasets that were downversioned below the >> senders. >> >> We’ve seen this in hosts a bit later than: illumos-a7317ce but not >> before (and certainly a bit later); and where we are now: illumos-2816291. >> >> Oddly illumos-a7317ce systems appear to be able to receive these >> datasets just fine…and we’ve had no problems with systems of that vintage >> sending to older systems. >> >> Any ideas and instruction is most welcome, >> >> Joe >> > > ------------------------------------------- illumos-discuss Archives: https://www.listbox.com/member/archive/182180/=now RSS Feed: https://www.listbox.com/member/archive/rss/182180/21175430-2e6923be Modify Your Subscription: https://www.listbox.com/member/?member_id=21175430&id_secret=21175430-6a77cda4 Powered by Listbox: http://www.listbox.com