Who is user <1-GN0KC>?
On 10/14/09, bugmail-sender at sun.com <bugmail-sender at sun.com> wrote:
> *Synopsis*: ZFS Send/ZFS Receive: Receive Side of Pipe is not created -
> Causes process hang - ksh93
>
> Due to a change requested by <User 1-5Q-9460>,
> <User 1-GN0KC> is now the responsible engineer for:
>
> CR 6876768/solaris_nevada changed on Oct 14 2009 by <User 1-5Q-9460>
>
> === Field ============ === New Value ============= === Old Value
> =============
>
> Responsible Engineer <User 1-GN0KC> <User 1-5Q-9460>
> ====================== ===========================
> ===========================
>
>
> *Change Request ID*: 6876768/solaris_nevada
>
> *Synopsis*: ZFS Send/ZFS Receive: Receive Side of Pipe is not created -
> Causes process hang - ksh93
>
> Product: solaris
> Category: shell
> Subcategory: korn93
> Type: Defect
> Subtype:
> Status: 2-Incomplete
> Substatus: Need More Info
> Priority: 2-High
> Introduced In Release:
> Introduced In Build:
> Responsible Engineer: <User 1-GN0KC>
> Keywords:
>
> === *Description*
> ============================================================
> zfs send/zfs receive operations experience hang under ksh:
>
> Problem is that the righthand side of the pipe is not being created
> "receive". The included truss will show that the zfs receive is not created
> as the truss doesn't even create the file on a failure mode.
>
> There MAY also be ioctl involvement as indicated by the last line of the
> included truss:
>
> 1955: ioctl(3, ZFS_IOC_SEND, 0x08044A00) (sleeping...)
>
> May also be an issue of when the receive side of an anonymous pipe dies it
> doesn't kill off the corresponding send side of the pipe.
>
> Potentially related CRs:
>
> 6859444
> 6782948
> 4657448 - Cu reports that the test included in CR4657448 does not fail on
> his host and he feels that the failure apparently depends on having multiple
> cores and the number of processes in the pipeline.
>
> In addition, included as an attachment are the cu scripts. You will find the
> following:
>
> 1) diff showing change that eliminated the hang (w/o changing shell)
> 2) r1.2 of the script that includes the hang
> 3) current version of script that includes various changes and does not
> (yet) cause hang (but multiple copies, run in parallel, can induce the zfs
> 3-way deadlock described in SR#71405530)
> 4) script used to create test zpool
> 5) script to create 100 test zfs [operations]
>
> *** (#1 of 1): 2009-08-28 00:23:49 GMT+00:00 <User 1-90DA97>
> *** Last Edit: 2009-09-23 15:13:21 GMT+00:00 <User 1-90DA97>
>
>
> === *Public Comments*
> ========================================================
> I don't see any indication that this is a ZFS bug. Please provide a crash
> dump from when the system is "hung".
>
> This bug will be closed or recategorized as a ksh bug on September 10 if
> more information is not provided.
>
> *** (#1 of 8): 2009-08-28 04:38:45 GMT+00:00 <User 1-5Q-9707>
>
> In examining the different trusses provided by the customer, we see that the
> action the customer wants to perform is
>
> zfs send | zfs receive
>
> We trussed both sides of the pipe with:
> truss -o sendside zfs send | truss -o recvside zfs receive
>
> This process was run in a loop to transfer multiple snapshots. The actual
> truss file names are unique across the entire loop. At some point in the
> process the left hand side hangs. There was no truss created for the
> receive side.
>
> This strongly suggests that the receive side of the pipe was not created at
> all. That the "zfs receive" was not a part of the issue.
>
> This issue was reproducable on the CU system. It might take take an hour
> or so, but the issue would happen. Since the CU is using these scripts to
> make backups, they are run a regular intervals and the chances of failure are
> very high.
>
> In an attempt to issolate the issue, we changed the send/receive pair from
> "zfs" to "dd". While we moved the same amount of data, in over 300
> iterations we did not get a single hang.
>
> The analysis suggests that this MIGHT be an interaction between KSH and
> Kernel/ZFS. The "zfs send" makes a single ioctl() call which generates large
> amounts of data. That data is moved directly from zfs to a file, by passing
> user space.
>
> Additional point: This lock up is NOT seen if the user pipes across the
> network. I.e. zfs send | ssh remote.system.com zfs receive
>
> *** (#2 of 8): 2009-08-28 12:45:35 GMT+00:00 <User 1-92VH66>
>
> Based on the additional information that Chris provided, is a crash dump
> still required?
>
> *** (#3 of 8): 2009-08-28 15:03:01 GMT+00:00 <User 1-90DA97>
>
> >From the latest update it looks like it is Solaris kh93 bug.
>
> *** (#4 of 8): 2009-09-23 15:43:34 GMT+00:00 <User 1-1SURPB>
>
> Transfering to shell/korn93 based on last comment for further investigation.
>
> *** (#5 of 8): 2009-10-01 17:13:43 GMT+00:00 <User 1-3GMVGZ>
>
> Roland Mainz, the OpenSolaris ksh93 integration project lead asks:
> Does the problem go away if you apply the ksh93-integration update2
> tarballs from
>
> http://www.opensolaris.org/os/project/ksh93-integration/downloads/2009-09-22/
>
> You can find his e-mail address and the project mailing list/web forum on
> the ksh93 project pages at:
> http://opensolaris.org/os/project/ksh93-integration/
>
> *** (#6 of 8): 2009-10-01 20:24:07 GMT+00:00 <User 1-5Q-1267>
>
> Queried customer and asked that he download and apply the eval/test tarballs
> to determine if they resolve the issue. Will update with results.
>
> *** (#7 of 8): 2009-10-09 17:04:42 GMT+00:00 <User 1-90DA97>
>
> Please see attachment 6876768emailupdate.txt for update queries related to
> customer response... Thanks.
>
> *** (#8 of 8): 2009-10-13 15:52:13 GMT+00:00 <User 1-90DA97>
>
>
> === *Workaround*
> =============================================================
>
> === *Additional Details*
> =====================================================
> Targeted Release: solaris_nevada
> Commit To Fix In Build:
> Fixed In Build:
> Integrated In Build:
> Verified In Build:
> See Also: 4657448, 6782948, 6859444
> Duplicate of:
> Hooks:
> Hook1:
> Hook2:
> Hook3:
> Hook4:
> Hook5:
> Hook6:
> Program Management:
> Root Cause:
> Fix Affects Documentation: No
> Fix Affects Localization: No
>
> === *History*
> ================================================================
> Date Submitted: 2009-08-28 00:23:46 GMT+00:00
> Submitted By: <User 1-90DA97>
>
> Status Changed Date Updated Updated By
> 2-Incomplete 2009-08-28 04:38:45 GMT+00:00 <User 1-5Q-9707>
>
>
> === *Service Request*
> ========================================================
> Impact: Significant
> Functionality: Secondary
> Severity: 3
> Product Name: solaris
> Product Release: osol_2009.06
> Product Build: osol_2009.06
> Operating System: osol_2009.06
> Hardware:
> Submitted Date: 2009-08-28 00:23:49 GMT+00:00
>
>
> === *Multiple Release (MR) Cluster* - 6876768
> ================================
> ID: +6876768/solaris_nevada
> SubCR Number: 6876768
> Targeted Release: solaris_nevada
>
> === *SubCR*
> ==================================================================
> ID: 6876768/osol_2009.06u6
> Status: 1-Dispatched
> Substatus:
> Priority: 3-Medium
> Responsible Engineer: <User 1-5Q-9460>
> SubCR Number: 2182414
> Targeted Release: osol_2009.06u6
> Commit To Fix In Build:
> Fixed In Build:
> Integrated In Build:
> Verified In Build:
> Hook1:
> Hook2:
> Program Management:
>
> _______________________________________________
> ksh93-integration-discuss mailing list
> ksh93-integration-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/ksh93-integration-discuss
>
--
, _ _ ,
{ \/`o;====- Olga Kryzhanovska -====;o`\/ }
.----'-/`-/ olga.kryzhanovska at gmail.com \-`\-'----.
`'-..-| / Solaris/BSD//C/C++ programmer \ |-..-'`
/\/\ /\/\
`--` `--`