Internationalization issue - string processing

2017-09-29 Thread Ernie Coskrey
We have a Java program that launches Cygwin bash processes which in turn
run a script.  The LC_ALL variable is set to "ja_JP".  The script will
execute processes using Unicode strings that are specified like this:

"\u3053"

(for the Hiragana letter Ko).

For some reason, when bash calls another program and passes the string
above to it, the string is being converted to "0x3f 0x3f".

The script that is being run contains the following command:

perl dump.pl "\u3053"

The perl script just prints out the hex values of its arguments, and it
displays:

??
3f 3f


The behavior is not reproducible if we run bash from a CMD prompt.  I know
this is pretty open-ended but are there any ideas as to what might be
causing this sort of localization issue?

Ernie Coskrey
SIOS Technology Corp.

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU

2007-08-09 Thread Ernie Coskrey
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Ernie Coskrey
> Sent: Wednesday, August 08, 2007 2:11 PM
> To: cygwin@cygwin.com
> Subject: RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Ernie Coskrey
> > Sent: Tuesday, July 31, 2007 3:40 PM
> > To: cygwin@cygwin.com
> > Subject: cygwin 1.5.20-1, spinning pdksh, 100% CPU
> > 
> >  
> > I've run into a problem with cygwin 1.5.20-1 and pdksh 
> 5.2.14.  We've 
> > got a pdksh.exe process that is spinning, using all the CPU.
> >  
> > This scenario is very hard to reproduce, but has happened 
> on our test 
> > systems occasionally.  It occurred recently, and I 
> currently have gdb 
> > attached to the process and have the symbols loaded.  I see 
> that pdksh 
> > is continually calling "sigsuspend()", which is immediately 
> returning 
> > from cancelable_wait due to the fact that the 
> signal_arrived event is 
> > set.  I also see that pdksh is waiting for a subprocess to 
> complete, 
> > and has a handle to the PID of that process - however the 
> process has 
> > long since terminated.
> >  
> > It appears that something went wrong during delivery of SIGCHLD.
> >  
> > I've got two questions related to this:
> >  
> > - have there been changes between 1.5.20-1 and 1.5.24-2, or 
> the latest 
> > snapshot, that might have fixed this issue?  We've done 
> some limited 
> > testing with 1.5.24-2 and haven't seen this happen yet, but 
> as I said 
> > the it only happens rarely.
> > - is there anything I can look at in gdb to help identify what the 
> > issue is?
> >  
> > Any suggestions would be appreciated!
> >  
> > -
> > Ernie Coskrey
> 
> I've discovered an interesting piece of information that I 
> think is related to this.  I'm hoping this might ring a bell 
> with someone on the list.
> 
> Looking at _main_tls->stack[], when I've set a breakpoint in 
> handle_sigsuspend just after the cancelable_wait() call, I 
> see the following entries:
> 
> 0x6109186f  0x4132ac
> 
> 0x6109186f is "sigdelayed()", which is the routine that 
> should have been called to deliver the signal and reset the 
> signal_arrived event.
> 0x4132ac is j_waitj (in pdksh).
> 
> So, somehow, when this problem occurs, "sigdelayed" gets 
> pushed onto the stack *before* j_waitj does.  So, _sigbe 
> never calls sigdelayed.
> 
> I don't think there's ever a case where sigdelayed should be 
> at _main_tls->stack[0].  However this happened is, I believe, 
> the cause of this problem.
> 
> Ernie Coskrey
> 

Well, I think that I may have found the cause of this issue, and I
believe that the problem exists in 1.5.24-2.  Please take a look at what
I think is the solution, and let me know if I'm mistaken.

I believe that the problem is in _sigbe, at the very end of the
assembler code.  _sigbe decrements the lock *before* it decrements
incyg.  This leaves a very small window where another thread - possibly
the sig thread that's doing setup_handler() - can acquire the lock, see
that incyg is still set to 1, and act accordingly.  In setup_handler,
this will cause the thread to go into _cygtls::interrupt_setup, which
pushes sigdelayed onto the tls stack.  But since we're not really in
Cygwin code when this happens, sigdelayed() never gets executed and you
end up spinning as we're seeing.

I'll post a patch to cygwin-patches.

Ernie Coskrey

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU

2007-08-08 Thread Ernie Coskrey
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Ernie Coskrey
> Sent: Tuesday, July 31, 2007 3:40 PM
> To: cygwin@cygwin.com
> Subject: cygwin 1.5.20-1, spinning pdksh, 100% CPU
> 
>  
> I've run into a problem with cygwin 1.5.20-1 and pdksh 
> 5.2.14.  We've got a pdksh.exe process that is spinning, 
> using all the CPU.
>  
> This scenario is very hard to reproduce, but has happened on 
> our test systems occasionally.  It occurred recently, and I 
> currently have gdb attached to the process and have the 
> symbols loaded.  I see that pdksh is continually calling 
> "sigsuspend()", which is immediately returning from 
> cancelable_wait due to the fact that the signal_arrived event 
> is set.  I also see that pdksh is waiting for a subprocess to 
> complete, and has a handle to the PID of that process - 
> however the process has long since terminated.
>  
> It appears that something went wrong during delivery of SIGCHLD.
>  
> I've got two questions related to this:
>  
> - have there been changes between 1.5.20-1 and 1.5.24-2, or 
> the latest snapshot, that might have fixed this issue?  We've 
> done some limited testing with 1.5.24-2 and haven't seen this 
> happen yet, but as I said the it only happens rarely.
> - is there anything I can look at in gdb to help identify 
> what the issue is?
>  
> Any suggestions would be appreciated!
>  
> -
> Ernie Coskrey 

I've discovered an interesting piece of information that I think is
related to this.  I'm hoping this might ring a bell with someone on the
list.

Looking at _main_tls->stack[], when I've set a breakpoint in
handle_sigsuspend just after the cancelable_wait() call, I see the
following entries:

0x6109186f  0x4132ac

0x6109186f is "sigdelayed()", which is the routine that should have been
called to deliver the signal and reset the signal_arrived event.
0x4132ac is j_waitj (in pdksh).

So, somehow, when this problem occurs, "sigdelayed" gets pushed onto the
stack *before* j_waitj does.  So, _sigbe never calls sigdelayed.

I don't think there's ever a case where sigdelayed should be at
_main_tls->stack[0].  However this happened is, I believe, the cause of
this problem.

Ernie Coskrey

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU

2007-08-07 Thread Ernie Coskrey
> -Original Message-
> From: Igor Peshansky [mailto:[EMAIL PROTECTED]
> Sent: Monday, August 06, 2007 5:59 PM
> To: Ernie Coskrey
> Cc: cygwin@cygwin.com
> Subject: RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU
> 
> On Mon, 6 Aug 2007, Ernie Coskrey wrote:
> 
> > > Quite possibly.  There were changes to signal handling since
> 1.5.20,
> > > IIRC. Unless I'm mistaken, there's even a patch for a race
> condition
> > > in process handling code (though it's not in 1.5.24, I think).
> >
> > I just want to make sure I understand this - are you talking about a
> > change that has been made since 1.5.24-2 was released, which is in
> the
> > snapshot view now?  Or did you mean a fix that was made sometime
> between
> > 1.5.20-1 and 1.5.24-2.
> 
> I meant the former, but I don't know if these changes have actually
> fixed
> your problem.

I'll download the latest snapshot and look at what's changed.  Do you
remember where the changes might be located - I'd guess somewhere in
sigproc.cc, exception.cc, and/or wait.cc.  Or if you remember the date
and/or subject of the email discussion that I could look at, that would
be very helpful as well.

> 
> Any particulars about the machines on which this happens?  Are they
> multi-core?  I don't recall seeing a cygcheck output from an affected
> machine...
>   Igor

This happens on a variety of hardware - single-CPU is where it's the
biggest problem since the system becomes nearly unusable.  But we've
seen it on multi-core and multi-physical-CPU systems as well.

Here's cygcheck from one of the systems where it's happened a few times:

Cygwin Configuration Diagnostics
Current System Time: Tue Aug 07 09:01:03 2007

Windows 2003 Server Ver 5.2 Build 3790 Service Pack 2

Running in Terminal Service session

Path:   c:\WINDOWS\system32
c:\WINDOWS
c:\WINDOWS\System32\Wbem
c:\Program Files\SUperior SU
c:\Program Files\Microsoft SQL Server\80\Tools\BINN
C:\LK\bin
c:\SDR
c:\SDR\support
c:\Program files\Debugging Tools for Windows

SysDir: C:\WINDOWS\system32
WinDir: C:\WINDOWS

HOME = '/home/Administrator'

Use '-r' to scan registry

a:  fd N/AN/A
c:  hd  NTFS  8662Mb  84% CP CS UN PA FC
d:  net NTFS 17351Mb  90% CP CS UN PA FC BUILD
e:  cd N/AN/A
h:  hd  NTFS  4337Mb   1% CP CS UN PA FC Shared_H
i:  hd N/AN/A
j:  hd  NTFS 17367Mb   1% CP CS UN PA FC Shared_J
k:  hd  NTFS 17367Mb   1% CP CS UN PA FC Shared_K
l:  hd  NTFS 17343Mb   1% CP CS UN PA FC Shared_L
n:  hd  NTFS 17476Mb   1% CP CS UN PA FC Shared_N
o:  hd  NTFS  1027Mb   1% CP CS UN PA FC Shared_O
p:  hd N/AN/A
r:  hd N/AN/A
s:  hd  NTFS 69954Mb   1% CP CS UN PA FC iSCSI_S
t:  hd  NTFS 69954Mb   1% CP CS UN PA FC ISCSI_T
v:  net NTFS  8096Mb  73% CP CS UN PA FC
w:  net NTFS   1402454Mb  34% CP CSPAcoskrey
x:  net NTFS 17355Mb  26% CP CS UN PA FC Dev_Y
y:  hd  NTFS  8665Mb   7% CP CS UN PA FC Vol_Y
z:  hd N/AN/A


Found: C:\LK\bin\awk.exe
Found: C:\LK\bin\bash.exe
Found: C:\LK\bin\cat.exe
Found: C:\LK\bin\cp.exe
Not Found: cpp (good!)
Not Found: crontab
Found: C:\LK\bin\find.exe
Not Found: gcc
Found: C:\LK\bin\gdb.exe
Found: C:\LK\bin\grep.exe
Found: C:\LK\bin\kill.exe
Found: c:\Program files\Debugging Tools for Windows\kill.exe
Not Found: ld
Found: C:\LK\bin\ls.exe
Not Found: make
Found: C:\LK\bin\mv.exe
Not Found: patch
Found: C:\LK\bin\perl.exe
Found: C:\LK\bin\rm.exe
Found: C:\LK\bin\sed.exe
Not Found: ssh
Found: C:\LK\bin\sh.exe
Found: C:\LK\bin\tar.exe
Found: C:\LK\bin\test.exe
Found: C:\LK\bin\vi.exe
Found: C:\LK\bin\vim.exe

   56k 2007/07/14 C:\LK\bin\cygbz2-1.dll
7k 2007/07/14 C:\LK\bin\cygcharset-1.dll
7k 2007/07/14 C:\LK\bin\cygcrypt-0.dll
   40k 2007/07/14 C:\LK\bin\cygform-8.dll
   45k 2007/07/14 C:\LK\bin\cygform5.dll
   35k 2007/07/14 C:\LK\bin\cygform6.dll
   48k 2007/07/14 C:\LK\bin\cygform7.dll
   28k 2007/07/14 C:\LK\bin\cyggdbm-3.dll
   30k 2007/07/14 C:\LK\bin\cyggdbm-4.dll
   19k 2007/07/14 C:\LK\bin\cyggdbm.dll
   15k 2007/07/14 C:\LK\bin\cyggdbm_compat-3.dll
   15k 2007/07/14 C:\LK\bin\cyggdbm_compat-4.dll
   17k 2007/07/14 C:\LK\bin\cyghistory4.dll
   29k 2007/07/14 C:\LK\bin\cyghistory5.dll
   24k 2007/07/14 C:\LK\bin\cyghistory6.dll
  947k 2007/07/14 C:\LK\bin\cygiconv-2.dll
   22k 2007/07/14 C:\LK\bin\cygintl-1.dll
   37k 2007/07/14 C:\LK\bin\cygintl-2.dll
   31k 2007/07/14 C:\LK\bin\cygintl-3.dll
   21k 2007/07/14 C:\LK\bin\cygintl.dll
   21k 2007/07/14 C:\LK\bin\cygmenu-8.dll
   26k 2007/07/14 C:\LK\bin\cygmenu5.dll
   20k 2007/07/14 C:\LK\bin\cygmenu6.dll
   29k 2007/07/14 C:\LK\bin\cygmenu7.dll
   67k 2

RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU

2007-08-06 Thread Ernie Coskrey
 
> Quite possibly.  There were changes to signal handling since 1.5.20, 
> IIRC.
> Unless I'm mistaken, there's even a patch for a race condition in 
> process handling code (though it's not in 1.5.24, I think).
> 

I just want to make sure I understand this - are you talking about a
change that has been made since 1.5.24-2 was released, which is in the
snapshot view now?  Or did you mean a fix that was made sometime between
1.5.20-1 and 1.5.24-2.

> > >
> > > Any suggestions would be appreciated!
> > 
> > Posting a sequence of steps that reliably reproduces the 
> problem for 
> > you would be great (but not necessarily easy).
> 

We've seen the issue happen with the following scripts.  Run a few
instances of "tst.sh".  Occasionally, one will become hung - if you
terminate the other tst.sh with Ctrl-C, you'll see that there's a
subtest.sh shell that is using up all the CPU.

First - generate "tstfile" by running
ls -l /bin > tstfile

tst.sh
==
while true
do
for ltr in a b c d e f g
do
out=`./subtest.sh $ltr`
echo Found $out
date
done
done

subtest.sh
==
for i in `seq 1 100`
do
f=`awk '{if(NR == i)print}' i=$i tstfile`
m=`/bin/echo $f | grep $1`
if [ ! -z "$m" ]
then
echo $i: $m
fi
done


-
Ernie Coskrey
SteelEye Technology, Inc.

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU

2007-08-01 Thread Ernie Coskrey
0
0x006874b8
0x22c558:   0x0022c588  0x610917b8  0x0042db80
0x0068b3f0
0x22c568:   0x0022c588  0x600301dc  0x006854d8
0x0003
0x22c578:   0x0022c588  0x006874b8  0x006854d8
0x0003
0x22c588:   0x0022c5a8  0x004126e0  0x006842a0
0x
0x22c598:   0x0042972b  0x006874b8  0x
0x006874b8
0x22c5a8:   0x0022c698  0x0040b160  0x0068b3f0
0x
0x22c5b8:   0x0068a614  0x0001  0x0022c680
0x0019
0x22c5c8:   0x0068bbe8  0x  0x61171d44
0x0068
0x22c5d8:   0x  0x  0x61171dd4
0x0001
0x22c5e8:   0x  0x  0x0001
0x
0x22c5f8:   0x0022c640  0x  0x
0x
0x22c608:   0x00687518  0x  0x0004
0x00685470
0x22c618:   0x0068ad98  0x  0x0001
0x61104ab4
0x22c628:   0x0003  0x0001  0x0668
0x0068a614
0x22c638:   0x00685478  0x610564f7  0x0068ad98
0x006854bc
0x22c648:   0x0001  0x0068ad60  0x0068ad60
0x
0x22c658:   0x00685ae0  0x0001  0x
0x0068b3f0
0x22c668:   0x0022c698  0x0041  0x00685530
0x006854b0
0x22c678:   0x0080  0x0068a614  0x0001
0x006854bc
0x22c688:   0x00cb  0x006874b8  0x
0x00687350
0x22c698:   0x0022c6c8  0x0040a654  0x006874b8
0x0022c6b0
0x22c6a8:   0x0020  0x6105642c  0x00685498
0x00685498
0x22c6b8:   0x0068549c  0x001d  0x
0x
0x22c6c8:   0x0022c718  0x0040d80a  0x006874b8
0x0020
0x22c6d8:   0x0068a610  0x  0x001d
0x

Ernie Coskrey
SteelEye Technology, Inc.

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: cygwin 1.5.20-1, spinning pdksh, 100% CPU

2007-08-01 Thread Ernie Coskrey
> -Original Message-
> From: Igor Peshansky
> 
> On Tue, 31 Jul 2007, Ernie Coskrey wrote:
> 
> > I've run into a problem with cygwin 1.5.20-1 and pdksh 5.2.14.
We've
> > got a pdksh.exe process that is spinning, using all the CPU.
> >
> > This scenario is very hard to reproduce, but has happened on our
test
> > systems occasionally.  It occurred recently, and I currently have
gdb
> > attached to the process and have the symbols loaded.
> 
> I assume you've rebuilt pdksh from source, since the packaged binary
is
> stripped...  Do you also have the symbols for the Cygwin DLL?

Yes, I've built both pdksh and cygwin1.dll from source and have the
symbols.

> 
> > I see that pdksh is continually calling "sigsuspend()", which is
> > immediately returning from cancelable_wait due to the fact that the
> > signal_arrived event is set.
> 
> Do you mean the sigpause() call?  Can you see which signal it attempts
> to
> suspend?  Can you email me (privately, if you wish) the stack dump
from
> gdb?
> 

It's sigsuspend() in j_waitj - line 1191 in jobs.c.  It calls
sigsuspend(&sm_default), and sm_default is 0 (no signals are blocked).

This immediately returns, and I see that j->state is still PRUNNING
every time.

> > I also see that pdksh is waiting for a subprocess to complete, and
> has a
> > handle to the PID of that process - however the process has long
> since
> > terminated.
> 
> That's normal (I think).  Cygwin may not deliver SIGCHLD immediately
> after
> process termination.  Until pdksh gets SIGCHLD, it'll keep the process
> handle.
> 
> > It appears that something went wrong during delivery of SIGCHLD.
> 
> Does this happen before or after j_sigchld() gets invoked?
> 

I suspect that j_sigchld never got invoked, or didn't run properly, but
can't definitvely prove that.

> > I've got two questions related to this:
> >
> > - have there been changes between 1.5.20-1 and 1.5.24-2, or the
> latest
> > snapshot, that might have fixed this issue?  We've done some limited
> > testing with 1.5.24-2 and haven't seen this happen yet, but as I
said
> > the it only happens rarely.
> 
> Quite possibly.  There were changes to signal handling since 1.5.20,
> IIRC.
> Unless I'm mistaken, there's even a patch for a race condition in
> process
> handling code (though it's not in 1.5.24, I think).
> 
> > - is there anything I can look at in gdb to help identify what the
> issue
> > is?
> >
> > Any suggestions would be appreciated!
> 
> Posting a sequence of steps that reliably reproduces the problem for
> you
> would be great (but not necessarily easy).

I wish I could supply this, but the problem happens very rarely.  I've
run many thousands of test shell iterations and haven't seen it reoccur
yet.

> 
> As I said above, a stack dump (with full pdksh symbols) would help...
> That might mean that you'd need to build an unstripped pdksh and
> attempt
> to reproduce the problem again.
>   Igor
> --

Here's a stack trace of the thread where the spin is occurring.  The
other threads in the process are quiet - the signal thread is is
ReadFile as expected, and the other threads are all in stub routines
doing WaitForSingleObject.

(gdb) bt
#0  handle_sigsuspend (tempmask=0)
at ../../../../src/winsup/cygwin/exceptions.cc:694
#1  0x61094b93 in sigsuspend (set=0x42db80)
at ../../../../src/winsup/cygwin/signal.cc:477
#2  0x610917b8 in _sigfe () at
../../../../src/winsup/cygwin/cygserver.h:82
#3  0x0022c588 in ?? ()
#4  0x600301dc in ?? ()
#5  0x006854d8 in ?? ()
#6  0x0003 in ?? ()
#7  0x0022c588 in ?? ()
#8  0x006874b8 in ?? ()
#9  0x006854d8 in ?? ()
#10 0x0003 in ?? ()
#11 0x0022c5a8 in ?? ()
#12 0x004126e0 in waitlast () at ../src/jobs.c:729
#13 0x004126e0 in waitlast () at ../src/jobs.c:729
#14 0x0040b160 in expand (
cp=0x6874b8
"\001R\001M\001T\001I\001N\001S\001R\001E\001A\001S\001O\001N\001=\003$L
KBIN/ins_list -d \"$EQVRMTSYS\" -t \"$EQVRMTTAG\" 2>NUL: | cut -d\001
-f8", wp=0x22c6b0, f=32) at ../src/eval.c:533
#15 0x0040a654 in evalstr (
cp=0x6874b8
"\001R\001M\001T\001I\001N\001S\001R\001E\001A\001S\001O\001N\001=\003$L
KBIN/ins_list -d \"$EQVRMTSYS\" -t \"$EQVRMTTAG\" 2>NUL: | cut -d\001
-f8", f=32) at ../src/eval.c:113
#16 0x0040d80a in comexec (t=0x6871e0, tp=0x0, ap=0x687350, flags=0)
at ../src/exec.c:555
#17 0x0040cc7d in execute (t=0x6871e0, flags=0) at ../src/exec.c:155
#18 0x0040ce39 in execute (t=0x687778, flags=0) at ../src/exec.c:192
#19 0x0040d311 in execute (t=0x686620, flags=1) at ../src/exec.c:367
#20 0x004124c1 in ex

cygwin 1.5.20-1, spinning pdksh, 100% CPU

2007-07-31 Thread Ernie Coskrey
 
I've run into a problem with cygwin 1.5.20-1 and pdksh 5.2.14.  We've
got a pdksh.exe process that is spinning, using all the CPU.
 
This scenario is very hard to reproduce, but has happened on our test
systems occasionally.  It occurred recently, and I currently have gdb
attached to the process and have the symbols loaded.  I see that pdksh
is continually calling "sigsuspend()", which is immediately returning
from cancelable_wait due to the fact that the signal_arrived event is
set.  I also see that pdksh is waiting for a subprocess to complete, and
has a handle to the PID of that process - however the process has long
since terminated.
 
It appears that something went wrong during delivery of SIGCHLD.
 
I've got two questions related to this:
 
- have there been changes between 1.5.20-1 and 1.5.24-2, or the latest
snapshot, that might have fixed this issue?  We've done some limited
testing with 1.5.24-2 and haven't seen this happen yet, but as I said
the it only happens rarely.
- is there anything I can look at in gdb to help identify what the issue
is?
 
Any suggestions would be appreciated!
 
-
Ernie CoskreySteelEye Technology, Inc.  803-808-4275

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Cygwin build error

2006-05-31 Thread Ernie Coskrey
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] Behalf
> Of Corinna Vinschen
> Sent: Friday, April 28, 2006 4:28 AM
> To: [EMAIL PROTECTED]
> Cc: cygwin@cygwin.com
> Subject: Re: Cygwin build error
> 
> 
> This is a newlib problem.  I've redirected this mail to the 
> appropriate
> list newlib AT sourceware DOT org.
> 
> On Apr 27 15:14, Ernie Coskrey wrote:
> > I ran into the following problem building the latest cygwin 
> snapshot:
> > 
> > configure: loading cache .././config.cache
> > configure: error: `CFLAGS' has changed since the previous run:
> > configure:   former value:  -O2 -g -O2  
> > configure:   current value: -O2 -g -O2 
> > configure: error: changes in the environment can compromise 
> the build
> > configure: error: run `make distclean' and/or `rm 
> .././config.cache' and start over
> > configure: error: /bin/sh 
> '../../../../src/newlib/libc/configure' failed for libc
> > 
> > By piping the output to a file, I saw that the former value 
> of CFLAGS is "-O2 -g -O2  " (two spaces), while the current 
> value is "-O2 -g -O2 " (one space).  This causes the 
> comparison in libc/configure to fail.
> > 
> > The way I've resolved this is to replace the following line:
> > 
> >   if test "x$ac_old_val" != "x$ac_new_val"; then
> > 
> > with
> > 
> >   if test "`echo $ac_old_val`" != "`echo $ac_new_val`"; then
> > 
> > wherever it appears in any "configure" script (there are 75 
> configure scripts that contain this test, BTW).  There may be 
> a more elegant way around this, but I haven't found it.  
> Running "make distclean" or removing config.cache doesn't 
> resolve the problem.
> > 
> > -
> > Ernie Coskrey   SteelEye Technology, Inc.803-461-3875
> 
> 
> Corinna
> 

This problem isn't limited to newlib: the same fix must be applied to a number 
of non-newlib configure scripts.

However, I have found a simpler solution than patching all 70-plus configure 
scripts.  The root of the problem
is that the variable "CFLAGS_FOR_TARGET" gets defined in the top-level Makefile 
as follows:

CFLAGS_FOR_TARGET = -O2 $(CFLAGS) $(SYSROOT_CFLAGS_FOR_TARGET)

Since SYSROOT_CFLAGS_FOR_TARGET is usually empty, you end up with an extra 
space at the end of CFLAGS_FOR_TARGET (in my case, anyway).

The following patch will resolve the problem without requiring any changes in 
the underlying configure scripts.  This patch is for "src/Makefile.in" - the 
top-level Makefile.in.  It uses the "strip" command to remove the extra 
whitespace from CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET.

--- Makefile.in.ORIG2006-05-31 08:49:14.16650 -0400
+++ Makefile.in 2006-05-31 11:08:25.150875000 -0400
@@ -383,7 +383,7 @@
 # CFLAGS will be just -g.  We want to ensure that TARGET libraries
 # (which we know are built with gcc) are built with optimizations so
 # prepend -O2 when setting CFLAGS_FOR_TARGET.
-CFLAGS_FOR_TARGET = -O2 $(CFLAGS) $(SYSROOT_CFLAGS_FOR_TARGET)
+CFLAGS_FOR_TARGET = $(strip -O2 $(CFLAGS) $(SYSROOT_CFLAGS_FOR_TARGET))
 SYSROOT_CFLAGS_FOR_TARGET = @SYSROOT_CFLAGS_FOR_TARGET@

 # If GCC_FOR_TARGET is not overriden on the command line, then this
@@ -423,7 +423,7 @@
 fi; \
   fi`

-CXXFLAGS_FOR_TARGET = $(CXXFLAGS) $(SYSROOT_CFLAGS_FOR_TARGET)
+CXXFLAGS_FOR_TARGET = $(strip $(CXXFLAGS) $(SYSROOT_CFLAGS_FOR_TARGET))
 LIBCXXFLAGS_FOR_TARGET = $(CXXFLAGS_FOR_TARGET) -fno-implicit-templates

 GCJ_FOR_TARGET=$(STAGE_CC_WRAPPER) @GCJ_FOR_TARGET@ $(FLAGS_FOR_TARGET)

-
Ernie Coskrey

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Cygwin build error

2006-04-27 Thread Ernie Coskrey
I ran into the following problem building the latest cygwin snapshot:

configure: loading cache .././config.cache
configure: error: `CFLAGS' has changed since the previous run:
configure:   former value:  -O2 -g -O2  
configure:   current value: -O2 -g -O2 
configure: error: changes in the environment can compromise the build
configure: error: run `make distclean' and/or `rm .././config.cache' and start 
over
configure: error: /bin/sh '../../../../src/newlib/libc/configure' failed for 
libc

By piping the output to a file, I saw that the former value of CFLAGS is "-O2 
-g -O2  " (two spaces), while the current value is "-O2 -g -O2 " (one space).  
This causes the comparison in libc/configure to fail.

The way I've resolved this is to replace the following line:

  if test "x$ac_old_val" != "x$ac_new_val"; then

with

  if test "`echo $ac_old_val`" != "`echo $ac_new_val`"; then

wherever it appears in any "configure" script (there are 75 configure scripts 
that contain this test, BTW).  There may be a more elegant way around this, but 
I haven't found it.  Running "make distclean" or removing config.cache doesn't 
resolve the problem.

-
Ernie Coskrey   SteelEye Technology, Inc.803-461-3875

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Call for testing Cygwin snapshot

2006-04-25 Thread Ernie Coskrey
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] Behalf
> Of Christopher Faylor
> Sent: Tuesday, April 25, 2006 4:39 PM
> To: cygwin@cygwin.com
> Subject: Re: Call for testing Cygwin snapshot
> 
> 
> On Tue, Apr 25, 2006 at 04:33:37PM -0400, Ernie Coskrey wrote:
> >Well, what I got from your message was that you were pretty sure that
> >your fix may have addressed the problem, but not 100% sure.  
> That's why
> >I posted this follow-up; it's possible that Jerry has found 
> a scenario
> >that causes this problem to occur.  Maybe not, but if he can 
> reproduce
> >it it would be worth checking.
> 
> There are all sorts of "cygwin hang" bug reports out there.  
> Since this
> is a problem that showed up in a particular snapshot and the problem
> that you are talking about was something that supposedly happened for
> any version of cygwin from 2003 to (at least) February 2006, 
> I don't see
> any reason to think that this is an issue since it would also 
> show up in
> a pre 2006-03-13 version of cygwin -- unless you have some 
> insight into
> the problem that I'm missing.
> 
> cgf
> 

Nope, no additional insight, just a hunch -- that apparently has turned out to 
be wrong. :-)

BTW, we're not seeing ANY hangs in the 1.5.20 snapshots; we'd reported a few in 
1.5.19-4 and those all have been addressed.

Ernie

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Call for testing Cygwin snapshot

2006-04-25 Thread Ernie Coskrey
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] Behalf
> Of Christopher Faylor
> Sent: Tuesday, April 25, 2006 4:26 PM
> To: cygwin@cygwin.com
> Subject: Re: Call for testing Cygwin snapshot
> 
> 
> On Tue, Apr 25, 2006 at 03:37:46PM -0400, Ernie Coskrey wrote:
> >> -Original Message-
> >> From: cygwin-owner
> >> [mailto:cygwin-owner]...
> 
> Btw, to the OP: *please* don't quote raw email addresses, especially
> when it's the cygwin or cygwin-owner email address.  Adding 
> this is just
> noise and helps increase the already incredible spam burden 
> presented to
> the cygwin and (especially) postmaster mailing lists.
> 
> >> Of Jerry D. Hedden
> >> Sent: Tuesday, April 25, 2006 9:27 AM
> >> To: cygwin
>^^
> >> Subject: RE: Call for testing Cygwin snapshot
> >> 
> >> 
> >>As I said, these sort of problems started after the 2006-03-09
> >>snapshot.  I double checked, and the problem does occur with the
> >>2006-03-13 snapshot.
> >
> >I wonder if this might be related to the following:
> >
> >http://cygwin.com/ml/cygwin/2006-02/msg01062.html
> >
> >The fix suggested in the original message -
> >http://www.cygwin.com/ml/cygwin-patches/2003-q2/msg4.html - might
> >help.
> 
> You've pointed to my message which indicates that I've fixed this in
> another way.  And, the OP indicates that this hang was introduced in a
> specific snapshot so I don't see why this would be an issue in that
> snapshot.
> 
> Nevertheless, the patch in the message that you are referring to is
> still a band-aid and still will not be applied.
> 
> cgf
> 

Well, what I got from your message was that you were pretty sure that your fix 
may have addressed the problem, but not 100% sure.  That's why I posted this 
follow-up; it's possible that Jerry has found a scenario that causes this 
problem to occur.  Maybe not, but if he can reproduce it it would be worth 
checking.

I agree that the original patch is a band-aid and shouldn't be applied.  There 
were some follow-ups to that message that talked about different ways to 
address the problem, if it turns out that Jerry's problem is the same.

Ernie

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Call for testing Cygwin snapshot

2006-04-25 Thread Ernie Coskrey
> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] Behalf
> Of Jerry D. Hedden
> Sent: Tuesday, April 25, 2006 9:27 AM
> To: cygwin@cygwin.com
> Subject: RE: Call for testing Cygwin snapshot
> 
> 
> Jerry D. Hedden wrote:
> >I have a cron job (a bash script)
> >that runs every 6 minutes, polling and downloading info off the web. 
> >
> >The problem is the script hangs at various places and the stuck
> >processes keep building up.
> >
> >Further, I have to kill these processes using the task monitor:  kill
> >reports 'No such process'.
> 
> Christopher Faylor replied:
> > As mentioned above, a test case showing the problem sure 
> would be nifty.
> 
> I agree and would have provided one if I could.  However, I 
> have no idea
> what is causing this, nor how to write a test case for it.
> 
> As I said, it's a cron job running a bash script - nothing fancy.  The
> hang does not happen on every invokation of the script, but it does
> occur frequently.  Where in the script it gets stuck seems to be
> random: wget, mkdir, mv, date, diff, etc..
> 
> > Also, knowing the first snapshot which shows the problem 
> would be helpful.
> 
> As I said, these sort of problems started after the 
> 2006-03-09 snapshot.
>  I double checked, and the problem does occur with the 2006-03-13
> snapshot.
> 
> 

I wonder if this might be related to the following:

http://cygwin.com/ml/cygwin/2006-02/msg01062.html

The fix suggested in the original message - 
http://www.cygwin.com/ml/cygwin-patches/2003-q2/msg4.html - might help.

Ernie Coskrey
SteelEye Technology, Inc.

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Shells hang during script execution

2006-03-16 Thread Ernie Coskrey
>On Wed, Mar 01, 2006 at 01:01:46PM -0500, Ernie Coskrey wrote:
>>>>Here's a description of a second hang condition we were encountering, along 
>>>>with a patch for it.
>>>>
>>>>
>>>>The application (pdksh in this case) does a read on a pipe, which 
>>>>eventually 
>>>>calls pipe.cc fhandler_pipe::read in Thread 1.  This creates a new 
>>>>cygthread 
>>>>with "read_pipe()" as the function.  Then >it calls th->detach(read_state).
>>>>
>>>>When the hang occurs, the new thread gets terminated early, before
>>>>cygthread::stub() can call "callfunc()".  You see the error message
>>>>"erroneous thread activation".  I'm not sure what's causing the thread
>>>>to fail activation, but the result is, the read_state semaphore never
>>>>gets signalled.
>>>
>>>Sorry but this is another band-aid around a problem.  The real problem
>>>is that the code shouldn't get into the state that you are describing.
>>>That's why cygwin prints an error message - it is a serious problem.
>>>Making the code deal gracefully with a problem like this isn't going
>>>to solve the underlying issue.
>>>
>>>If you can figure out what's causing the erroneous thread activation
>>>then that will be the real culprit.
>>>
>>>cgf
>>>
>>
>>OK, I believe I've tracked this down.
>>
>>The problem occurs when we get into a read_pipe cygthread constructor
>>(cygthread::cygthread()) with a NULL h and an ev that is signalled.
>>When this condition exists, a hang can occur as follows:
>>
>>1) Creator thread calls detach().  This waits for pipe_state to be released 
>>twice
>>2) read_pipe thread calls read_pipe, reads data, and releases the semaphore 
>>twice
>>3) Creator thread goes to WFSO(*this, INFINITE) which returns immediately 
>>because ev was set when the thread was created.
>>4) Creator thread initiates another read_pipe cygthread to read more pipe 
>>data.
>>
>>At this point, there's a race: if the Creator thread gets past the
>>initialization part of the constuctor, which sets __name(name), BEFORE
>>the original read_pipe thread gets to the part of cygthread::stub()
>>that sets info->__name = NULL, then you'll see the hang.  The new
>>pipe_read will give the "erroneous thread activation" message, and the
>>parent will be stuck waiting for data that will never arrive.
>>
>>The only path that leaves an unused thread structure in a state where
>>h==NULL and ev is signalled is cygthread::release().  So the fix is
>>simple:
>>
>>$ cat cygthread.cc.udiff
>>--- cygthread.cc.ORIG   2006-02-22 10:57:42.123931300 -0500
>>+++ cygthread.cc    2006-03-01 12:59:23.255023000 -0500
>>@@ -268,7 +268,12 @@
>> cygthread::release (bool nuke_h)
>> {
>>   if (nuke_h)
>>+{
>> h = NULL;
>>+
>>+if (ev)
>>+  ResetEvent (ev);
>>+}
>> #ifdef DEBUGGING
>>   __oldname = __name;
>>   debug_printf ("released thread '%s'", __oldname);
>
>Nice analysis.  Thank you.  I think it's easier to fix this by just
>making the ev event auto-reset then this condition would be caught in
>terminate thread, as it was meant to be.
>
>cgf

Here's a patch for the problem that works with the latest snapshot.

-
Ernie Coskrey   SteelEye Technology, Inc.



--- cygthread.cc.ORIG   2006-03-01 17:40:44.0 -0500
+++ cygthread.cc2006-03-16 14:54:04.148312500 -0500
@@ -78,7 +78,7 @@
   debug_printf ("thread '%s', id %p, stack_ptr %p", info->name (), 
info->id, info->stack_ptr);
   if (!info->ev)
{
- info->ev = CreateEvent (&sec_none_nih, TRUE, FALSE, NULL);
+ info->ev = CreateEvent (&sec_none_nih, FALSE, FALSE, NULL);
  info->thread_sync = CreateEvent (&sec_none_nih, FALSE, FALSE, NULL);
}
 }
@@ -197,8 +197,6 @@
   HANDLE htobe;
   if (h)
 {
-  if (ev)
-   ResetEvent (ev);
   while (!thread_sync)
low_priority_sleep (0);
   SetEvent (thread_sync);
@@ -223,7 +221,6 @@
   while (!ev)
low_priority_sleep (0);
   WaitForSingleObject (ev, INFINITE);
-  ResetEvent (ev);
 }
   h = htobe;
 }


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Shells hang during script execution

2006-03-01 Thread Ernie Coskrey
>>Here's a description of a second hang condition we were encountering, along 
>>with a patch for it.
>>
>>
>>The application (pdksh in this case) does a read on a pipe, which eventually 
>>calls pipe.cc fhandler_pipe::read in Thread 1.  This creates a new cygthread 
>>with "read_pipe()" as the function.  Then >it calls th->detach(read_state).
>>
>>When the hang occurs, the new thread gets terminated early, before
>>cygthread::stub() can call "callfunc()".  You see the error message
>>"erroneous thread activation".  I'm not sure what's causing the thread
>>to fail activation, but the result is, the read_state semaphore never
>>gets signalled.
>
>Sorry but this is another band-aid around a problem.  The real problem
>is that the code shouldn't get into the state that you are describing.
>That's why cygwin prints an error message - it is a serious problem.
>Making the code deal gracefully with a problem like this isn't going
>to solve the underlying issue.
>
>If you can figure out what's causing the erroneous thread activation
>then that will be the real culprit.
>
>cgf
>

OK, I believe I've tracked this down.

The problem occurs when we get into a read_pipe cygthread constructor 
(cygthread::cygthread()) with a NULL h and an ev that is signalled.  When this 
condition exists, a hang can occur as follows:

1) Creator thread calls detach().  This waits for pipe_state to be released 
twice
2) read_pipe thread calls read_pipe, reads data, and releases the semaphore 
twice
3) Creator thread goes to WFSO(*this, INFINITE) which returns immediately 
because ev was set when the thread was created.
4) Creator thread initiates another read_pipe cygthread to read more pipe data.

At this point, there's a race: if the Creator thread gets past the 
initialization part of the constuctor, which sets __name(name), BEFORE the 
original read_pipe thread gets to the part of cygthread::stub() that sets 
info->__name = NULL, then you'll see the hang.  The new pipe_read will give the 
"erroneous thread activation" message, and the parent will be stuck waiting for 
data that will never arrive.

The only path that leaves an unused thread structure in a state where h==NULL 
and ev is signalled is cygthread::release().  So the fix is simple:

$ cat cygthread.cc.udiff
--- cygthread.cc.ORIG   2006-02-22 10:57:42.123931300 -0500
+++ cygthread.cc2006-03-01 12:59:23.255023000 -0500
@@ -268,7 +268,12 @@
 cygthread::release (bool nuke_h)
 {
   if (nuke_h)
+{
 h = NULL;
+
+if (ev)
+  ResetEvent (ev);
+}
 #ifdef DEBUGGING
   __oldname = __name;
   debug_printf ("released thread '%s'", __oldname);


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Shells hang during script execution

2006-02-23 Thread Ernie Coskrey
Here's a description of a second hang condition we were encountering, along 
with a patch for it.


The application (pdksh in this case) does a read on a pipe, which eventually 
calls pipe.cc fhandler_pipe::read in Thread 1.  This creates a new cygthread 
with "read_pipe()" as the function.  Then it calls th->detach(read_state).

When the hang occurs, the new thread gets terminated early, before 
cygthread::stub() can call "callfunc()".  You see the error message "erroneous 
thread activation".  I'm not sure what's causing the thread to fail activation, 
but the result is, the read_state semaphore never gets signalled.

So Thread 1 goes into cygthread::detach(read_state).  The first thing that 
happens is signal_arrived is set.  The old code would then set n=1, but leave 
howlong=INFINITE.  My change sets howlong=100 in this case.  Then, when TIMEOUT 
occurs, we look to see if __name is not NULL.  Since the thread was terminated, 
its name is now NULL, so it doesn't decrement i, and eventually you break out 
of the loop and clean up as expected.



--- cygthread.cc.ORIG   2006-02-22 10:57:42.123931300 -0500
+++ cygthread.cc2006-02-23 15:50:23.894461500 -0500
@@ -374,10 +374,12 @@
break;
  case WAIT_OBJECT_0 + 1:
n = 1;
-   if (i--)
- howlong = 50;
+   i--;
+   howlong = 100;
break;
  case WAIT_TIMEOUT:
+   if(!i && __name)
+   i--;
break;
  default:
        if (!exiting)

> -Original Message-
> From: Ernie Coskrey 
> Sent: Friday, February 10, 2006 1:31 PM
> To: Ernie Coskrey; 'cygwin@cygwin.com'
> Subject: RE: Shells hang during script execution
> 
> 
> We've been able to narrow this down some more.  The shell 
> gets hung in sigsuspend(), waiting for SIGCHLD.  We've 
> verified that the process that's executed as part of the 
> command substitution does complete, and returns EOF, and the 
> shell (we're testing with pdksh) goes into sigsuspend and 
> never comes out.
> 
> If we execute "kill -CHLD ", the shell resumes its processing.
> 
> I'm going to continue to look into this - if anybody has any 
> insight into how SIGCHLD might be getting lost, please let me 
> know.  Thanks!
> 
> Ernie Coskrey
> 
> 
> -Original Message-
> From: Ernie Coskrey
> Sent: Wed 2/1/2006 3:27 PM
> To: 'cygwin@cygwin.com'
> Subject: Shells hang during script execution
>  
> I've run into problems with shell scripts hanging during 
> execution for no apparent reason.  I've narrowed down my test 
> case to two simple shell scripts.  To reproduce the problem, 
> I ran three instances of the "top.sh" script included here, 
> and after a bit (30 minutes to an hour or so) I'll see that 
> one or two of the shells have just stopped in their tracks.
> 
> Here are the scripts:
> 
> 
> dir=$1
> loops=$2
> 
> for loop in `seq 1 $loops`
> do
> x=`./subtest.sh $dir`
> date
> echo loop $loop
> done
> 
> 
> for j in `ls $1`
> do
> if [ `echo $j | egrep -i "A|B" | wc -l` -ne 0 ]
> then
> echo $j
> fi
> done
> echo subtest1 done >&2
> 
> 
> 
> I then ran three bash shells.  The commands I ran, 
> simultaneously, were:
> 
> 1) ./top.sh C:/ 600
> 2) ./top.sh C:/windows 300
> 3) ./top.sh C:/windows/system32 100
> 
> These ran for about 45 minutes, and then I noticed that two 
> of them (1 and 2 above) had stopped printing any output.  The 
> third was still moving along.  The third completed, but the 
> first two never progressed any further.  I used Process 
> Explorer from ntinternals.com, and saw that the two hung 
> shells were not using any CPU, and did not have any child 
> processes created; they were simply stopped.  If a process 
> dump would be helpful, I can generate one with Windbg or gdb.
> 
> 
> -
> Ernie Coskrey   SteelEye Technology, Inc.803-461-3875
> 
> 

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Shells hang during script execution

2006-02-23 Thread Ernie Coskrey
There are two hang conditions that we've identified and have developed fixes 
for.  This is a description of the first of the two along with a patch; I'll 
follow up with a description and patch for the second.


If a signal can't be handled because it is blocked, it gets queued (on 
the process's "sigq") to be handled later. Now, whenever the process's 
signal mask changes (e.g., the signal in question gets unblocked), an 
attempt is made to handle all the queued signals (i.e., a signal flush 
occurs). However, if the queueing of the blocked signal happens right 
after the signal mask change, then we miss the signal. This causes the 
process to hang. The signal is on the queue, but the process doesn't 
know to check for it. The process just hangs until another signal gets 
sent to it.

The workaround is basically to force the signal queue to be rescanned 
(flushed) whenever we add something to it, so a queued signal is never 
missed.


--- sigproc.cc.ORIG 2006-02-16 14:02:42.81432 -0500
+++ sigproc.cc  2006-02-22 10:55:20.327209900 -0500
@@ -1130,6 +1130,7 @@
case __SIGNOHOLD:
case __SIGFLUSH:
case __SIGFLUSHFAST:
+flush:
  sigq.reset ();
  while ((q = sigq.next ()))
{
@@ -1150,6 +1151,8 @@
  else
{
  int sig = pack.si.si_signo;
+ if (sig == SIGCHLD)
+   clearwait = true;
  // FIXME: REALLY not right when taking threads into consideration.
  // We need a per-thread queue since each thread can have its own
  // list of blocked signals.  CGF 2005-08-24
@@ -1165,10 +1168,11 @@
system_printf ("Failed to arm signal %d from pid %d", 
pack.sig, pack.pid);
 #endif
  sigq.add (pack);  // FIXME: Shouldn't add this in !sh 
condition
+ goto flush; // signal may have become unblocked while
+ // we were processing it (before we added
+ // it to the sigq) -- flush sigq to be sure   
}
}
- if (sig == SIGCHLD)
-   clearwait = true;
}
  break;
    }

> -Original Message-
> From: Ernie Coskrey 
> Sent: Friday, February 10, 2006 1:31 PM
> To: Ernie Coskrey; 'cygwin@cygwin.com'
> Subject: RE: Shells hang during script execution
> 
> 
> We've been able to narrow this down some more.  The shell 
> gets hung in sigsuspend(), waiting for SIGCHLD.  We've 
> verified that the process that's executed as part of the 
> command substitution does complete, and returns EOF, and the 
> shell (we're testing with pdksh) goes into sigsuspend and 
> never comes out.
> 
> If we execute "kill -CHLD ", the shell resumes its processing.
> 
> I'm going to continue to look into this - if anybody has any 
> insight into how SIGCHLD might be getting lost, please let me 
> know.  Thanks!
> 
> Ernie Coskrey
> 
> 
> -Original Message-
> From: Ernie Coskrey
> Sent: Wed 2/1/2006 3:27 PM
> To: 'cygwin@cygwin.com'
> Subject: Shells hang during script execution
>  
> I've run into problems with shell scripts hanging during 
> execution for no apparent reason.  I've narrowed down my test 
> case to two simple shell scripts.  To reproduce the problem, 
> I ran three instances of the "top.sh" script included here, 
> and after a bit (30 minutes to an hour or so) I'll see that 
> one or two of the shells have just stopped in their tracks.
> 
> Here are the scripts:
> 
> 
> dir=$1
> loops=$2
> 
> for loop in `seq 1 $loops`
> do
> x=`./subtest.sh $dir`
> date
> echo loop $loop
> done
> 
> 
> for j in `ls $1`
> do
> if [ `echo $j | egrep -i "A|B" | wc -l` -ne 0 ]
> then
> echo $j
> fi
> done
> echo subtest1 done >&2
> 
> 
> 
> I then ran three bash shells.  The commands I ran, 
> simultaneously, were:
> 
> 1) ./top.sh C:/ 600
> 2) ./top.sh C:/windows 300
> 3) ./top.sh C:/windows/system32 100
> 
> These ran for about 45 minutes, and then I noticed that two 
> of them (1 and 2 above) had stopped printing any output.  The 
> third was still moving along.  The third completed, but the 
> first two never progressed any further.  I used Process 
> Explorer from ntinternals.com, and saw that the two hung 
> shells were not using any CPU, and did not have any child 
> processes created; they were simply stopped.  If a process 
> dump would be helpful, I can generate one with Windbg or gdb.
> 
> -
> Ernie Coskrey   SteelEye Technology, Inc.803-461-3875
> 
> 

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Shells hang during script execution

2006-02-10 Thread Ernie Coskrey
We've been able to narrow this down some more.  The shell gets hung in 
sigsuspend(), waiting for SIGCHLD.  We've verified that the process that's 
executed as part of the command substitution does complete, and returns EOF, 
and the shell (we're testing with pdksh) goes into sigsuspend and never comes 
out.

If we execute "kill -CHLD ", the shell resumes its processing.

I'm going to continue to look into this - if anybody has any insight into how 
SIGCHLD might be getting lost, please let me know.  Thanks!

Ernie Coskrey


-----Original Message-
From: Ernie Coskrey
Sent: Wed 2/1/2006 3:27 PM
To: 'cygwin@cygwin.com'
Subject: Shells hang during script execution
 
I've run into problems with shell scripts hanging during execution for no 
apparent reason.  I've narrowed down my test case to two simple shell scripts.  
To reproduce the problem, I ran three instances of the "top.sh" script included 
here, and after a bit (30 minutes to an hour or so) I'll see that one or two of 
the shells have just stopped in their tracks.

Here are the scripts:


dir=$1
loops=$2

for loop in `seq 1 $loops`
do
x=`./subtest.sh $dir`
date
echo loop $loop
done


for j in `ls $1`
do
if [ `echo $j | egrep -i "A|B" | wc -l` -ne 0 ]
then
echo $j
fi
done
echo subtest1 done >&2



I then ran three bash shells.  The commands I ran, simultaneously, were:

1) ./top.sh C:/ 600
2) ./top.sh C:/windows 300
3) ./top.sh C:/windows/system32 100

These ran for about 45 minutes, and then I noticed that two of them (1 and 2 
above) had stopped printing any output.  The third was still moving along.  The 
third completed, but the first two never progressed any further.  I used 
Process Explorer from ntinternals.com, and saw that the two hung shells were 
not using any CPU, and did not have any child processes created; they were 
simply stopped.  If a process dump would be helpful, I can generate one with 
Windbg or gdb.


Here's my cygcheck output:

Cygwin Configuration Diagnostics
Current System Time: Wed Feb 01 15:07:43 2006

Windows 2003 Server Ver 5.2 Build 3790 Service Pack 1

Path:   C:\WINDOWS\system32
C:\WINDOWS
C:\WINDOWS\System32\Wbem
C:\Program Files\Microsoft SQL Server\80\Tools\BINN
C:\Program Files\SUperior SU

Output from C:\cygwin\bin\id.exe (nontsec)
UID: 500(Administrator) GID: 513(None)
0(root) 513(None)   544(Administrators)
545(Users)

Output from C:\cygwin\bin\id.exe (ntsec)
UID: 500(Administrator) GID: 513(None)
0(root) 513(None)   544(Administrators)
545(Users)

SysDir: C:\WINDOWS\system32
WinDir: C:\Documents and Settings\Administrator\WINDOWS

Here's some environment variables that may affect cygwin:
PWD = '/usr/bin'
HOME = '/home/Administrator'

Here's the rest of your environment variables:
HOMEPATH = '\Documents and Settings\Administrator'
APPDATA = 'C:\Documents and Settings\Administrator\Application Data'
TERM = 'cygwin'
PROCESSOR_IDENTIFIER = 'x86 Family 15 Model 2 Stepping 7, GenuineIntel'
WINDIR = 'C:\WINDOWS'
TMPDIR = '/cygdrive/c/Documents and Settings/Administrator/Local Settings/Temp'
USERDOMAIN = 'EAGLE'
OS = 'Windows_NT'
ALLUSERSPROFILE = 'C:\Documents and Settings\All Users'
TEMP = '/cygdrive/c/DOCUME~1/ADMINI~1/LOCALS~1/Temp'
COMMONPROGRAMFILES = 'C:\Program Files\Common Files'
USERNAME = 'Administrator'
CLUSTERLOG = 'C:\WINDOWS\Cluster\cluster.log'
PROCESSOR_LEVEL = '15'
FP_NO_HOST_CHECK = 'NO'
SYSTEMDRIVE = 'C:'
USERPROFILE = 'C:\Documents and Settings\Administrator'
LOGONSERVER = '\\EAGLE'
PROCESSOR_ARCHITECTURE = 'x86'
!C: = 'C:\cygwin\bin'
EXTMIRRBASE = 'C:\LKDR'
SHLVL = '1'
PATHEXT = '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH'
HOMEDRIVE = 'C:'
PROMPT = '$P$G'
COMSPEC = 'C:\WINDOWS\system32\cmd.exe'
TMP = '/cygdrive/c/DOCUME~1/ADMINI~1/LOCALS~1/Temp'
SYSTEMROOT = 'C:\WINDOWS'
PROCESSOR_REVISION = '0207'
PROGRAMFILES = 'C:\Program Files'
NUMBER_OF_PROCESSORS = '2'
SESSIONNAME = 'Console'
COMPUTERNAME = 'EAGLE'
!EXITCODE = '0001'
_ = './cygcheck'
POSIXLY_CORRECT = '1'

Scanning registry for keys with 'Cygnus' in them...
HKEY_CURRENT_USER\Software\Cygnus Solutions
HKEY_CURRENT_USER\Software\Cygnus Solutions\Cygwin
HKEY_CURRENT_USER\Software\Cygnus Solutions\Cygwin\mounts v2
HKEY_CURRENT_USER\Software\Cygnus Solutions\Cygwin\Program Options
HKEY_CURRENT_USER\Software\SteelEye\LifeKeeper\Cygnus Solutions
HKEY_CURRENT_USER\Softwa

Shells hang during script execution

2006-02-01 Thread Ernie Coskrey
win Package Information
Last downloaded files to: C:\cygwinpkg
Last downloaded files from: ftp://ftp.cise.ufl.edu/pub/mirrors/cygwin

Package  Version
_update-info-dir 00352-1
alternatives 1.3.20a-2
ash  20040127-3
base-files   3.7-1
base-passwd  2.2-1
bash 3.0-14
bzip21.0.3-1
coreutils5.93-3
crypt1.1-1
cygutils 1.2.9-1
cygwin   1.5.19-4
cygwin-doc   1.4-3
diffutils2.8.7-1
editrights   1.01-1
findutils4.2.27-1
gawk 3.1.5-2
gdb  20041228-3
gdbm 1.8.3-7
grep 2.5.1a-2
groff1.18.1-2
gzip 1.3.5-1
less 381-1
libbz2_1 1.0.3-1
libcharset1  1.9.2-2
libgdbm  1.8.0-5
libgdbm-devel    1.8.3-7
libgdbm3 1.8.3-3
libgdbm4 1.8.3-7
libiconv 1.9.2-2
libiconv21.9.2-2
libintl  0.10.38-3
libintl1 0.10.40-1
libintl2 0.12.1-3
libintl3 0.14.5-1
libncurses5  5.2-1
libncurses6  5.2-8
libncurses7  5.3-4
libncurses8  5.4-4
libpcre0 6.3-1
libpopt0 1.6.4-4
libreadline4 4.1-2
libreadline5 4.3-5
libreadline6 5.1-2
login1.9-7
man  1.5p-1
mktemp   1.5-3
ncurses  5.4-4
pdksh5.2.14-3
run  1.1.6-1
sed  4.1.4-1
tar  1.15.1-3
tcltk20030901-1
termcap  20050421-1
terminfo 5.4_20041009-1
texinfo  4.8-1
vim  6.4-4
which1.7-1
zlib 1.2.3-1

Thanks for any help you can provide on this!

-
Ernie Coskrey   SteelEye Technology, Inc.803-461-3875

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/