Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-10-19 Thread Christopher Faylor
On Wed, Oct 19, 2005 at 03:45:30PM -0400, Volker Quetschke wrote:
>Testing my favorite testcase takes a while, especially if I'm still
>not able to reproduce on one of my systems. This is for the third
>version of yesterdays snapshot (20051018) .
>
>Basically we still see the same problem as in
> .
>
>The build hangs in a tcsh like this:
>  PIDPPIDPGID WINPID  TTY  UIDSTIME COMMAND
>  220   1 220220  con 11290 08:29:27 /usr/bin/bash
>  752 220 752268  con 11290 08:29:32 /usr/bin/tcsh
> 1236 7521236   2040  con 11290 08:29:52 /usr/bin/perl
> 222812361236   3740  con 11290 12:05:48 
> /cygdrive/e/work/OOo/SRC680/solenv/wntmsci10/bin/dmake
> 367622281236   2828  con 11290 12:05:48 /usr/bin/tcsh
> 398036761236   3980  con 11290 12:05:48 /usr/bin/tcsh
> 1696   11696   1696  con 11290 12:30:42 /usr/bin/bash
>I2396   12396   2396  con 11290 12:30:45 /usr/bin/bash
> 356016963560   2668  con 11290 12:30:54 /usr/bin/tcsh
> 374835603748   2292  con 11290 12:31:30 /usr/bin/ps
>
>And by attaching strace to the hung pid and using "ls /proc/*/fd"
>the attached (behind the cygcheck.log) strace was created.

Given the number of changes that have been made to cygwin, particularly
in /proc handling, it's very difficult for me to believe that you are
not seeing *any* differences in behavior and I'm wondering if you're
actually seeing what you think you're seeing, i.e., I'm wondering if the
process is just timing out and you are attributing it coming "unstuck"
to the fact that you're doing "ls /proc/*/fd".  I can't see any reason
why inspecting /proc should cause any kind of special behavior in the
latest snapshots since /proc handling now occurs in its own thread.

I could almost convince myself that there was a race in /proc handling
before but I could never convince myself that doing something like "ls 
/proc/*/fd"
would have any effect on it.  Nevertheless, I did make some changes to
eliminate the potential source of hangs in this code.  So, I can't
understand why you wouldn't see something different.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-10-19 Thread Volker Quetschke
Christopher Faylor wrote:
> On Wed, Oct 19, 2005 at 03:45:30PM -0400, Volker Quetschke wrote:
> (snip)
>>Basically we still see the same problem as in
>>.
>>
>>The build hangs in a tcsh like this:
>>(snip)
>>And by attaching strace to the hung pid and using "ls /proc/*/fd"
>>the attached (behind the cygcheck.log) strace was created.
> 
> Given the number of changes that have been made to cygwin, particularly
> in /proc handling, it's very difficult for me to believe that you are
> not seeing *any* differences in behavior and I'm wondering if you're
> actually seeing what you think you're seeing, i.e., I'm wondering if the
> process is just timing out and you are attributing it coming "unstuck"
> to the fact that you're doing "ls /proc/*/fd".  I can't see any reason
> why inspecting /proc should cause any kind of special behavior in the
> latest snapshots since /proc handling now occurs in its own thread.
I can completely understand your worries. My problem is that I cannot
reproduce the problem myself and all I can do is ask the people who
have this problem to try get some debug information.

I just asked for a confirmation that it really is the "ls /proc/*/fd"
that "unstucks" the process. I don't believe that "/usr/bin/tcsh -fc pwd"
needs a long time to finish so that we're getting a coincidence there.

Having said that, I never realized that before, maybe the problem really
lies in this special command. I mean due to some historic quirks every
makefile in the OOo tree has a line that sets a macro to the current path
using that command, but there are still lots of other commands (also executed
in a tcsh shell) in these makefiles that I never heard of to hang.
(I'll also verify that what I just said is really true, it's just an idea.)

> I could almost convince myself that there was a race in /proc handling
> before but I could never convince myself that doing something like "ls 
> /proc/*/fd"
> would have any effect on it.  Nevertheless, I did make some changes to
> eliminate the potential source of hangs in this code.  So, I can't
> understand why you wouldn't see something different.

I have no clue either, especially as I also cannot reproduce and therefore
cannot pinpoint the problem. :(

Anyway, thanks for all your efforts!

   Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-10-19 Thread Volker Quetschke
Volker Quetschke wrote:
> Having said that, I never realized that before, maybe the problem really
> lies in this special command. I mean due to some historic quirks every
> makefile in the OOo tree has a line that sets a macro to the current path
> using that command, but there are still lots of other commands (also executed
> in a tcsh shell) in these makefiles that I never heard of to hang.
> (I'll also verify that what I just said is really true, it's just an idea.)
Nice idea, but a look into the strace showed me that it was hanging before
executing the pwd command :(

Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-10-20 Thread Volker Quetschke

Volker Quetschke wrote:

Christopher Faylor wrote:

On Wed, Oct 19, 2005 at 03:45:30PM -0400, Volker Quetschke wrote:
(snip)
Given the number of changes that have been made to cygwin, particularly
in /proc handling, it's very difficult for me to believe that you are
not seeing *any* differences in behavior and

Well, there are differences in the frequency of occurrence of the hangs.


I'm wondering if you're
actually seeing what you think you're seeing, i.e., I'm wondering if the
process is just timing out and you are attributing it coming "unstuck"
to the fact that you're doing "ls /proc/*/fd".  I can't see any reason
why inspecting /proc should cause any kind of special behavior in the
latest snapshots since /proc handling now occurs in its own thread.


I can completely understand your worries. My problem is that I cannot
reproduce the problem myself and all I can do is ask the people who
have this problem to try get some debug information.

I just asked for a confirmation that it really is the "ls /proc/*/fd"
that "unstucks" the process. I don't believe that "/usr/bin/tcsh -fc pwd"
needs a long time to finish so that we're getting a coincidence there.

I got some information back:
It is done like this, the build is running/hanging in one shell (1).

When it hangs, start a new tcsh shell (2) and get the ps and cygcheck
information. Then open a new bash (3) and start "strace -p "
Now in (2) start
while 1
ls /proc//fd
end
until the strace is ready.

Some details: The build is running on a local NTFS drive. It's a dedicated
machine, not much is running beside the build.

He wrote that 20051019 also produced a hang and that I'll get the next ;)
strace.

Clueless

 Volker



Having said that, I never realized that before, maybe the problem really
lies in this special command. I mean due to some historic quirks every
makefile in the OOo tree has a line that sets a macro to the current path
using that command, but there are still lots of other commands (also executed
in a tcsh shell) in these makefiles that I never heard of to hang.
(I'll also verify that what I just said is really true, it's just an idea.)



I could almost convince myself that there was a race in /proc handling
before but I could never convince myself that doing something like "ls 
/proc/*/fd"
would have any effect on it.  Nevertheless, I did make some changes to
eliminate the potential source of hangs in this code.  So, I can't
understand why you wouldn't see something different.



I have no clue either, especially as I also cannot reproduce and therefore
cannot pinpoint the problem. :(

Anyway, thanks for all your efforts!

   Volker




--
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-10-20 Thread Christopher Faylor
On Thu, Oct 20, 2005 at 08:48:05AM -0400, Volker Quetschke wrote:
>Volker Quetschke wrote:
>>Christopher Faylor wrote:
>>>On Wed, Oct 19, 2005 at 03:45:30PM -0400, Volker Quetschke wrote:
>>>(snip)
>>>Given the number of changes that have been made to cygwin, particularly
>>>in /proc handling, it's very difficult for me to believe that you are
>>>not seeing *any* differences in behavior and
>Well, there are differences in the frequency of occurrence of the hangs.
>
>>>I'm wondering if you're
>>>actually seeing what you think you're seeing, i.e., I'm wondering if the
>>>process is just timing out and you are attributing it coming "unstuck"
>>>to the fact that you're doing "ls /proc/*/fd".  I can't see any reason
>>>why inspecting /proc should cause any kind of special behavior in the
>>>latest snapshots since /proc handling now occurs in its own thread.
>>
>>I can completely understand your worries. My problem is that I cannot
>>reproduce the problem myself and all I can do is ask the people who
>>have this problem to try get some debug information.
>>
>>I just asked for a confirmation that it really is the "ls /proc/*/fd"
>>that "unstucks" the process. I don't believe that "/usr/bin/tcsh -fc pwd"
>>needs a long time to finish so that we're getting a coincidence there.
>I got some information back:
>It is done like this, the build is running/hanging in one shell (1).
>
>When it hangs, start a new tcsh shell (2) and get the ps and cygcheck
>information. Then open a new bash (3) and start "strace -p "
>Now in (2) start
>   while 1
>   ls /proc//fd
>   end
>until the strace is ready.

I wonder what would happen if the strace was just allowed to sit.  I
don't see anything in the strace would indicate that the process is
stalled and that looking at /proc/...  is fixing it.

>He wrote that 20051019 also produced a hang and that I'll get the next ;)
>strace.

I wouldn't expect 20051019 to be any different.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-10-23 Thread Christopher Faylor
On Thu, Oct 20, 2005 at 08:48:05AM -0400, Volker Quetschke wrote:
>Volker Quetschke wrote:
>>Christopher Faylor wrote:
>>>On Wed, Oct 19, 2005 at 03:45:30PM -0400, Volker Quetschke wrote:
>>>(snip)
>>>Given the number of changes that have been made to cygwin, particularly
>>>in /proc handling, it's very difficult for me to believe that you are
>>>not seeing *any* differences in behavior and
>Well, there are differences in the frequency of occurrence of the hangs.

I missed this the first time.  Are you saying that hangs are more likely
with recent snapshots?

In any event, could you try the 2005-10-22 snapshot?  It doesn't fix anything
but I moved some of the strace printfs around in a probably vain attempt to
see what was hanging.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-10-25 Thread Christopher Faylor
On Tue, Oct 25, 2005 at 12:07:21PM -0400, Volker Quetschke wrote:
>Christopher Faylor wrote:
>>>Volker Quetschke wrote:
Christopher Faylor wrote:
>On Wed, Oct 19, 2005 at 03:45:30PM -0400, Volker Quetschke wrote:
>(snip)
>Given the number of changes that have been made to cygwin, particularly
>in /proc handling, it's very difficult for me to believe that you are
>not seeing *any* differences in behavior and
>>>
>>>Well, there are differences in the frequency of occurrence of the hangs.
>> 
>> I missed this the first time.  Are you saying that hangs are more likely
>> with recent snapshots?
>> 
>> In any event, could you try the 2005-10-22 snapshot?  It doesn't fix anything
>> but I moved some of the strace printfs around in a probably vain attempt to
>> see what was hanging.
>
>We tried the 20051023 and 20051024 snapshots. The 20051024 hangs significantly
>faster than the 20051023 and also at different commands (Not only the standard
>"tcsh -fc pwd") even though the example here hangs again at that place.
>
>As a sidenote, these last two snapshots are also easier to "unhang",
>one "ls /proc//fd" is enough.
>
>I only paste/attach the 20051024 info, if there is interest I can also send
>the 20051023 info.
>
>  PIDPPIDPGID WINPID  TTY  UIDSTIME COMMAND
>  540   1 540540  con 11290 16:37:13 /usr/bin/bash
> 1452 5401452   3836  con 11290 16:37:18 /usr/bin/tcsh
> 396014523960   2508  con 11290 17:35:23 /usr/bin/perl
> 3180   13180   3180  con 11290 17:37:19 /usr/bin/bash
> 338439603960   3416  con 11290 17:37:23 
> /cygdrive/e/work/OOo/SRC680/solenv/wntmsci10/bin/dmake
> 262433843960   2912  con 11290 17:37:23 /usr/bin/tcsh
> 359626243960   3596  con 11290 17:37:23 /usr/bin/tcsh
> 210031802100   1000  con 11290 17:37:49 /usr/bin/tcsh
> 400821004008   2520  con 11290 17:38:15 /usr/bin/ps

I would like to see the old strace and any other straces you have to see
if there's any pattern to something I'm noticing.

I don't see any large times being reported at the beginning of the strace.
I'd expect that if you notice the hang, attach to the process, and then
do the "ls /proc//fd".  Can you give me a feel for times of:

- noticed the problem

- attached to process with strace

- performed ls

?

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-10-27 Thread Volker Quetschke
(BIG-SNIP)
>>I only paste/attach the 20051024 info, if there is interest I can also send
>>the 20051023 info.
>>(snip)
> 
> I would like to see the old strace and any other straces you have to see
> if there's any pattern to something I'm noticing.
I got a few more, but before spamming this list with straces I have
some news. "We" managed to reproduce the hangs on that particular
machine more easily now, but I didn't have the time yet to try
to reproduce it on my machines. I hope tonight ...

But I can relay the answers to the following questions:
> I don't see any large times being reported at the beginning of the strace.
> I'd expect that if you notice the hang, attach to the process, and then
> do the "ls /proc//fd".  Can you give me a feel for times of:

First reproducibility: Initially the 20051024 hung every ~ 10 minutes,
but "now" (At the time I got the email) it is running for more than
15 hours.
> 
> - noticed the problem
1 min - several hours, then doing ps and cygcheck.

> - attached to process with strace
5-10 minutes after noticing (max)

> - performed ls
 < 1 min (right after...)


Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-10-27 Thread Christopher Faylor
On Thu, Oct 27, 2005 at 01:04:25PM -0400, Volker Quetschke wrote:
>(BIG-SNIP)
>>>I only paste/attach the 20051024 info, if there is interest I can also send
>>>the 20051023 info.
>>>(snip)
>> 
>> I would like to see the old strace and any other straces you have to see
>> if there's any pattern to something I'm noticing.
>I got a few more, but before spamming this list with straces I have
>some news. "We" managed to reproduce the hangs on that particular
>machine more easily now, but I didn't have the time yet to try
>to reproduce it on my machines. I hope tonight ...
>
>But I can relay the answers to the following questions:
>> I don't see any large times being reported at the beginning of the strace.
>> I'd expect that if you notice the hang, attach to the process, and then
>> do the "ls /proc//fd".  Can you give me a feel for times of:
>
>First reproducibility: Initially the 20051024 hung every ~ 10 minutes,
>but "now" (At the time I got the email) it is running for more than
>15 hours.
>> 
>> - noticed the problem
>1 min - several hours, then doing ps and cygcheck.
>
>> - attached to process with strace
>5-10 minutes after noticing (max)
>
>> - performed ls
> < 1 min (right after...)

Are you sure that attaching to the process with strace isn't what actually
what caused the process to start up?  I don't see any 1 minute delays in
the strace log.

However, if you could wait for a couple minutes between attaching via
strace and doing the "ls" that might be instructive.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-10-28 Thread Volker Quetschke

Hi!


Are you sure that attaching to the process with strace isn't what actually
what caused the process to start up?  I don't see any 1 minute delays in
the strace log.


We got a strace file that includes the gap. See:
 and search for:
110111469 110230380 [sig] tcsh

This is a shorted trace file. I'll try to get the full version and the
corresponding ps output but maybe this helps already.

Volker

--
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-01 Thread Volker Quetschke
Volker Quetschke wrote:
>> Are you sure that attaching to the process with strace isn't what
>> actually
>> what caused the process to start up?  I don't see any 1 minute delays in
>> the strace log.
> 
> We got a strace file that includes the gap. See:
>  and search for:
> 110111469 110230380 [sig] tcsh
> 
> This is a shorted trace file. I'll try to get the full version and the
> corresponding ps output but maybe this helps already.
OK, here is the full version:
 

It contains the usual cygcheck and ps output plus:

  strace.log_while  - the strace file until it hangs
  strace.log- the complete strace including the previous part.

It hangs/waits in "dmake 2436 talktome: pid 632 wants some information"

Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-01 Thread Volker Quetschke
Volker Quetschke wrote:
> OK, here is the full version:
>  
> 
> It contains the usual cygcheck and ps output plus:
> 
>   strace.log_while  - the strace file until it hangs
>   strace.log- the complete strace including the previous part.
> 
> It hangs/waits in "dmake 2436 talktome: pid 632 wants some information"

I found something funny in the strace. dmake *and* tcsh are waiting
on something:

186919618 187255700 [sig] dmake 2436 talktome: pid 632 wants some information

and later in the file:

186869833 187019671 [sig] tcsh 2724 talktome: pid 632 wants some information

Plus some more "talktome: pid 632 wants some information" lines later. Not 
really
knowing what's going on here I would speculate that they are coming from some
  wait(&status);
command. (At least that's most propably the place where it's "hanging" in
dmake).

Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-01 Thread Christopher Faylor
On Tue, Nov 01, 2005 at 05:15:31PM -0500, Volker Quetschke wrote:
>Volker Quetschke wrote:
>> OK, here is the full version:
>>  
>> 
>> It contains the usual cygcheck and ps output plus:
>> 
>>   strace.log_while  - the strace file until it hangs
>>   strace.log- the complete strace including the previous part.
>> 
>> It hangs/waits in "dmake 2436 talktome: pid 632 wants some information"
>
>I found something funny in the strace. dmake *and* tcsh are waiting
>on something:
>
>186919618 187255700 [sig] dmake 2436 talktome: pid 632 wants some information
>
>and later in the file:
>
>186869833 187019671 [sig] tcsh 2724 talktome: pid 632 wants some information
>
>Plus some more "talktome: pid 632 wants some information" lines later. Not 
>really
>knowing what's going on here I would speculate that they are coming from some
>  wait(&status);
>command. (At least that's most propably the place where it's "hanging" in
>dmake).

No, talktome has nothing to do with wait.  That's the interface which is
called when (among other things) you look at things in /proc.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-02 Thread Christopher Faylor
On Tue, Nov 01, 2005 at 05:40:30PM -0500, Christopher Faylor wrote:
>On Tue, Nov 01, 2005 at 05:15:31PM -0500, Volker Quetschke wrote:
>>Volker Quetschke wrote:
>>> OK, here is the full version:
>>>  
>>> 
>>> It contains the usual cygcheck and ps output plus:
>>> 
>>>   strace.log_while  - the strace file until it hangs
>>>   strace.log- the complete strace including the previous part.
>>> 
>>> It hangs/waits in "dmake 2436 talktome: pid 632 wants some information"
>>
>>I found something funny in the strace. dmake *and* tcsh are waiting
>>on something:
>>
>>186919618 187255700 [sig] dmake 2436 talktome: pid 632 wants some information
>>
>>and later in the file:
>>
>>186869833 187019671 [sig] tcsh 2724 talktome: pid 632 wants some information
>>
>>Plus some more "talktome: pid 632 wants some information" lines later. Not 
>>really
>>knowing what's going on here I would speculate that they are coming from some
>>  wait(&status);
>>command. (At least that's most propably the place where it's "hanging" in
>>dmake).
>
>No, talktome has nothing to do with wait.  That's the interface which is
>called when (among other things) you look at things in /proc.

Could you try today's snapshot (when it shows up)?  It has more debugging
which might help show where the hang is occurring.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-02 Thread Volker Quetschke
Hi!

(snip)
>>No, talktome has nothing to do with wait.  That's the interface which is
>>called when (among other things) you look at things in /proc.
> 
> Could you try today's snapshot (when it shows up)?  It has more debugging
> which might help show where the hang is occurring.

Here it is: 

It contains the usual information

Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-02 Thread Christopher Faylor
On Wed, Nov 02, 2005 at 02:49:50PM -0500, Volker Quetschke wrote:
>Hi!
>
>(snip)
>>>No, talktome has nothing to do with wait.  That's the interface which is
>>>called when (among other things) you look at things in /proc.
>> 
>> Could you try today's snapshot (when it shows up)?  It has more debugging
>> which might help show where the hang is occurring.
>
>Here it is: 
>
>It contains the usual information

It wasn't quite the usual information.  It narrows down where the problem
is occurring and it is still quite puzzling.

There is a new snapshot with even more information.  Could you try getting
another strace from this one?  Nothing has changed but the addition of
more strace output.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-03 Thread Christopher Faylor
On Thu, Nov 03, 2005 at 12:00:00AM -0500, Christopher Faylor wrote:
>On Wed, Nov 02, 2005 at 02:49:50PM -0500, Volker Quetschke wrote:
>>Hi!
>>
>>(snip)
No, talktome has nothing to do with wait.  That's the interface which is
called when (among other things) you look at things in /proc.
>>> 
>>> Could you try today's snapshot (when it shows up)?  It has more debugging
>>> which might help show where the hang is occurring.
>>
>>Here it is: 
>>
>>It contains the usual information
>
>It wasn't quite the usual information.  It narrows down where the problem
>is occurring and it is still quite puzzling.
>
>There is a new snapshot with even more information.  Could you try getting
>another strace from this one?  Nothing has changed but the addition of
>more strace output.

Corinna has informed me that I added the debugging output to the wrong place
so I'm generating a new snapshot with the right debugging.  Please use the
November 3 snapshot in your tests.

Sorry for the confusion.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-03 Thread Volker Quetschke
(snip)
> Corinna has informed me that I added the debugging output to the wrong place
> so I'm generating a new snapshot with the right debugging.  Please use the
> November 3 snapshot in your tests.

Just FYI, I tried the 20051102 snapshot and strace doesn't seem to
work at all in that version. For example:

$ strace ls

doesn't produce *any* output.

I'll try the 03 snap when it arrives.

Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-03 Thread Volker Quetschke
Volker Quetschke wrote:
> (snip)
> 
>>Corinna has informed me that I added the debugging output to the wrong place
>>so I'm generating a new snapshot with the right debugging.  Please use the
>>November 3 snapshot in your tests.
> 
> Just FYI, I tried the 20051102 snapshot and strace doesn't seem to
> work at all in that version. For example:
> 
> $ strace ls
> 
> doesn't produce *any* output.
> 
> I'll try the 03 snap when it arrives.

Still nothing. strace even eats the regular output:

[EMAIL PROTECTED] /tmp/nix
$ ls
emptydir.txt

[EMAIL PROTECTED] /tmp/nix
$ strace ls

[EMAIL PROTECTED] /tmp/nix
$ uname -a
CYGWIN_NT-5.1 Macros 1.5.19s(0.141/4/2) 20051103 10:52:21 i686 unknown unknown 
Cygwin

Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-03 Thread Christopher Faylor
On Thu, Nov 03, 2005 at 06:46:31PM -0500, Volker Quetschke wrote:
>Volker Quetschke wrote:
>> (snip)
>> 
>>>Corinna has informed me that I added the debugging output to the wrong place
>>>so I'm generating a new snapshot with the right debugging.  Please use the
>>>November 3 snapshot in your tests.
>> 
>> Just FYI, I tried the 20051102 snapshot and strace doesn't seem to
>> work at all in that version. For example:
>> 
>> $ strace ls
>> 
>> doesn't produce *any* output.
>> 
>> I'll try the 03 snap when it arrives.
>
>Still nothing. strace even eats the regular output:
>
>[EMAIL PROTECTED] /tmp/nix
>$ ls
>emptydir.txt
>
>[EMAIL PROTECTED] /tmp/nix
>$ strace ls
>
>[EMAIL PROTECTED] /tmp/nix
>$ uname -a
>CYGWIN_NT-5.1 Macros 1.5.19s(0.141/4/2) 20051103 10:52:21 i686 unknown unknown 
>Cygwin

GAH! What a stupid mistake.  I made a change and didn't check it.  There
was a reason why there was no debugging statements in the code that I
modified.

The latest snapshot seems to work.  I even (gasp!) tested it.

Apologies for wasting your time.  I'd appreciate it if you'd try this
again.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-04 Thread Volker Quetschke
(snip)
Corinna has informed me that I added the debugging output to the wrong place
so I'm generating a new snapshot with the right debugging.  Please use the
November 3 snapshot in your tests.
(snip)
> GAH! What a stupid mistake.  I made a change and didn't check it.  There
> was a reason why there was no debugging statements in the code that I
> modified.
> 
> The latest snapshot seems to work.  I even (gasp!) tested it.
:) Yes, it works.

Here is the latest strace: 

Even though it is the correct snapshot I didn't see many of your
debug statements in the strace.

Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-04 Thread Christopher Faylor
On Fri, Nov 04, 2005 at 10:44:56AM -0500, Volker Quetschke wrote:
>(snip)
>Corinna has informed me that I added the debugging output to the wrong 
>place
>so I'm generating a new snapshot with the right debugging.  Please use the
>November 3 snapshot in your tests.
>(snip)
>> GAH! What a stupid mistake.  I made a change and didn't check it.  There
>> was a reason why there was no debugging statements in the code that I
>> modified.
>> 
>> The latest snapshot seems to work.  I even (gasp!) tested it.
>:) Yes, it works.
>
>Here is the latest strace: 
>
>Even though it is the correct snapshot I didn't see many of your
>debug statements in the strace.

Yes.  So much for that theory.

There's a new snapshot with even more debugging available now.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-04 Thread Volker Quetschke
(snip)
>>Even though it is the correct snapshot I didn't see many of your
>>debug statements in the strace.
> 
> Yes.  So much for that theory.
> 
> There's a new snapshot with even more debugging available now.

Here is the new strace: 

Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-04 Thread Christopher Faylor
On Fri, Nov 04, 2005 at 02:15:10PM -0500, Volker Quetschke wrote:
>(snip)
>>>Even though it is the correct snapshot I didn't see many of your
>>>debug statements in the strace.
>> 
>> Yes.  So much for that theory.
>> 
>> There's a new snapshot with even more debugging available now.
>
>Here is the new strace: 

That one showed that cygwin was hanging in a windows function that
shouldn't really hang.  I can't explain why but the new snapshot avoids
calling that function so much.

Please give it a try.  If it still hangs an strace will, as always, be
interesting.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-04 Thread Christopher Faylor
On Sat, Nov 05, 2005 at 01:05:36AM -0500, Christopher Faylor wrote:
>On Fri, Nov 04, 2005 at 02:15:10PM -0500, Volker Quetschke wrote:
Even though it is the correct snapshot I didn't see many of your
debug statements in the strace.
>>>
>>>Yes.  So much for that theory.
>>>
>>>There's a new snapshot with even more debugging available now.
>>
>>Here is the new strace: 
>
>That one showed that cygwin was hanging in a windows function that
>shouldn't really hang.  I can't explain why but the new snapshot avoids
>calling that function so much.

Actually, to clarify, I can explain why the new snapshot avoids calling
the function because that's what I did to "fix" things.  I can't explain
why the windows function "timeGetDevCaps" would hang.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-05 Thread Shaddy Baddah
Hi,

Christopher Faylor wrote:
[snip]
> Actually, to clarify, I can explain why the new snapshot avoids calling
[more snip]

I just want to give you a bit of encouragement, in that at least I
follow closely, this (rather long, necessarily so) thread,

I really hope that it gets solved, just as a fillip for the project (I
doubt that Open Office compilation in itself is enough to hold back
release of the new cygwin dll).

Regards,
Shaddy

PS: I have had one or two, but I think I am OK enough to mean what I say.

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-07 Thread Volker Quetschke
(snip)
>>>There's a new snapshot with even more debugging available now.
>>
>>Here is the new strace: 
> 
> That one showed that cygwin was hanging in a windows function that
> shouldn't really hang.  I can't explain why but the new snapshot avoids
> calling that function so much.
> 
> Please give it a try.  If it still hangs an strace will, as always, be
> interesting.
Unfortunately it still does hang. See here:
 for details.

Volker


-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-10 Thread Christopher Faylor
On Mon, Nov 07, 2005 at 11:18:24AM -0500, Volker Quetschke wrote:
>(snip)
There's a new snapshot with even more debugging available now.
>>>
>>>Here is the new strace: 
>> 
>> That one showed that cygwin was hanging in a windows function that
>> shouldn't really hang.  I can't explain why but the new snapshot avoids
>> calling that function so much.
>> 
>> Please give it a try.  If it still hangs an strace will, as always, be
>> interesting.
>Unfortunately it still does hang. See here:
> for details.

It's still hanging in a multimedia timer call, which is "interesting".

The latest snapshot comments out the part of the code which sets the
timer resolution, on the off chance that setting it to 1ms is what is
causing the problem.

It's a long shot but please try out the latest snapshot.

Have you ever mentioned what kind of system this is, btw?  Is it hyperthreaded,
SMP, what clock speed, how much memory...?

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-11 Thread Corinna Vinschen
On Nov  7 11:18, Volker Quetschke wrote:
> (snip)
> >>>There's a new snapshot with even more debugging available now.
> >>
> >>Here is the new strace: 
> > 
> > That one showed that cygwin was hanging in a windows function that
> > shouldn't really hang.  I can't explain why but the new snapshot avoids
> > calling that function so much.
> > 
> > Please give it a try.  If it still hangs an strace will, as always, be
> > interesting.
> Unfortunately it still does hang. See here:
>  for details.

I don't know how long every try takes, but would you be able to
repeat it a couple of times so that we can see if it always hangs
in the same spot?

You can reproduce this on more than one machine, right?  Otherwise
there would be a chance of a corrupted system ...


Corinna

-- 
Corinna Vinschen  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader  cygwin AT cygwin DOT com
Red Hat, Inc.

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-11 Thread Volker Quetschke

> (snip)
>>Unfortunately it still does hang. See here:
>> for details.
> 
> It's still hanging in a multimedia timer call, which is "interesting".
> 
> The latest snapshot comments out the part of the code which sets the
> timer resolution, on the off chance that setting it to 1ms is what is
> causing the problem.
> 
> It's a long shot but please try out the latest snapshot.
Testing will happen soon ...

> Have you ever mentioned what kind of system this is, btw?  Is it 
> hyperthreaded,
> SMP, what clock speed, how much memory...?
It is a system with 1.8GHz, no hyperthreading, no smp, 512Mb.

The only abnormal thing we could find is that it is running in
a Terminal Service session.

On *that* system it is pretty easy to reproduce the hang. It hangs
when you move another window over the one that is currently building
OOo. The "build window" is a cygwin bash prompt started with cygwin.bat.

I just asked in the OOo development ml who else can reproduce the problem.

Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-15 Thread Volker Quetschke

(snip)

It's still hanging in a multimedia timer call, which is "interesting".

The latest snapshot comments out the part of the code which sets the
timer resolution, on the off chance that setting it to 1ms is what is
causing the problem.

It's a long shot but please try out the latest snapshot.


Testing will happen soon ...


Took a little longer, not to reproduce but to send, this time:
  


Have you ever mentioned what kind of system this is, btw?  Is it hyperthreaded,
SMP, what clock speed, how much memory...?


It is a system with 1.8GHz, no hyperthreading, no smp, 512Mb.

The only abnormal thing we could find is that it is running in
a Terminal Service session.

On *that* system it is pretty easy to reproduce the hang. It hangs
when you move another window over the one that is currently building
OOo. The "build window" is a cygwin bash prompt started with cygwin.bat.

I just asked in the OOo development ml who else can reproduce the problem.


Volker

--
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-16 Thread Christopher Faylor
On Tue, Nov 15, 2005 at 09:12:42AM -0500, Volker Quetschke wrote:
>(snip)
>>>It's still hanging in a multimedia timer call, which is "interesting".
>>>
>>>The latest snapshot comments out the part of the code which sets the
>>>timer resolution, on the off chance that setting it to 1ms is what is
>>>causing the problem.
>>>
>>>It's a long shot but please try out the latest snapshot.
>>
>>Testing will happen soon ...
>
>Took a little longer, not to reproduce but to send, this time:
>  

Ok.  Still hanging in the same place.

I've taken one more shot in the dark in the latest snapshot but I don't
have much confidence that it will make any difference.

I don't remember.  Did you ever confirm/deny that this problem happens
with other systems?

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-21 Thread Volker Quetschke
Hi!

Christopher Faylor wrote:
> On Tue, Nov 15, 2005 at 09:12:42AM -0500, Volker Quetschke wrote:
(snips)
>>Took a little longer, not to reproduce but to send, this time:
>> 
> 
> Ok.  Still hanging in the same place.
> 
> I've taken one more shot in the dark in the latest snapshot but I don't
> have much confidence that it will make any difference.
> 
> I don't remember.  Did you ever confirm/deny that this problem happens
> with other systems?
Yes and no. The problem is reproducible on two systems that I know
of (Not many people are building OOo for windows :( And Sun uses
a different, without cygwin tcsh, build environment.)
and both systems have identical hardware and MS terminal service
installed.

But with the 20051114 snapshot the problem escalated on that system(s).

tcsh segfaults. To reproduce (on that system):
open bash windows, start tcsh, "ls" -> "Segmentation fault (core dumped)"

I asked him to catch the dump, here is what I got:
  

I'll ask for tests with more current snapshots now.

Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature


Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-21 Thread Christopher Faylor
On Mon, Nov 21, 2005 at 01:08:25PM -0500, Volker Quetschke wrote:
>Christopher Faylor wrote:
>> On Tue, Nov 15, 2005 at 09:12:42AM -0500, Volker Quetschke wrote:
>(snips)
>>>Took a little longer, not to reproduce but to send, this time:
>>> 
>> 
>> Ok.  Still hanging in the same place.
>> 
>> I've taken one more shot in the dark in the latest snapshot but I don't
>> have much confidence that it will make any difference.
>> 
>> I don't remember.  Did you ever confirm/deny that this problem happens
>> with other systems?
>Yes and no. The problem is reproducible on two systems that I know
>of (Not many people are building OOo for windows :( And Sun uses
>a different, without cygwin tcsh, build environment.)
>and both systems have identical hardware and MS terminal service
>installed.
>
>But with the 20051114 snapshot the problem escalated on that system(s).

I sent the above message on 20051116.  The was a brief problem in the
20051114 snapshot which was fixed.  I don't know if that is what you saw
or not but I'm really only interested in the most recent snapshot.

cgf

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Hang with 20051018 (3rd version) snapshot while building OOo

2005-11-21 Thread Volker Quetschke
Christopher Faylor wrote:
> On Mon, Nov 21, 2005 at 01:08:25PM -0500, Volker Quetschke wrote:
(snip)
> I sent the above message on 20051116.  The was a brief problem in the
> 20051114 snapshot which was fixed.  I don't know if that is what you saw
> or not but I'm really only interested in the most recent snapshot.

Ok, same as before, now 20051117:


Volker

-- 
PGP/GPG key  (ID: 0x9F8A785D)  available  from  wwwkeys.de.pgp.net
key-fingerprint 550D F17E B082 A3E9 F913  9E53 3D35 C9BA 9F8A 785D


signature.asc
Description: OpenPGP digital signature