Re: Druid test hangs?

2022-02-09 Thread Stamatis Zampetakis
I bumped into the following thread about dumping stack traces with Gradle
[1] and thought that may be worth sharing in case someone decides to
implement something along these lines for Calcite.

Best,
Stamatis

[1] https://discuss.gradle.org/t/dump-stack-trace-for-tests/33524

On Mon, Dec 13, 2021 at 6:26 PM Jacques Nadeau  wrote:

> I wonder if we can create a simple shell script that runs a jstack once an
> hour (starting after one hour) and then run it using
> https://github.com/psxpaul/gradle-execfork-plugin? Since none of our jobs
> run an hour, most of the time it wouldn't do anything. In the cases where
> the job hung, we'd hopefully get a jstack.
>
>
> On Mon, Dec 13, 2021 at 12:17 AM Stamatis Zampetakis 
> wrote:
>
> > If there is a systematic way to do it I would be interested to know.
> >
> > In the past, when I encountered similar hangs in CI what I ended-up doing
> > is adding debugging commits in the PR with a thread printing stack traces
> > of other threads at some intervals.
> >
> > Best,
> > Stamatis
> >
> > On Sun, Dec 12, 2021 at 7:00 PM Jacques Nadeau 
> wrote:
> >
> > > It could be infra but I'm wondering if it is some kind of concurrency
> > bug.
> > >
> > > Anyone know if there is a straightforward way to add a secondary
> process
> > in
> > > a github workflow that takes a jstack after an hour or something (if
> the
> > > tests run that long). Trying to jump on an instance when this happens
> and
> > > do this manually sounds like an effort in frustration.
> > >
> > > I guess another option would be to modify the druid job to provide info
> > on
> > > tests that are running so that we can see if it always locks on the
> same
> > > test.
> > >
> > > On Sat, Dec 11, 2021 at 11:39 PM Alessandro Solimando <
> > > alessandro.solima...@gmail.com> wrote:
> > >
> > > > I started noticing that intermittently around a month ago, I had a
> > quick
> > > > look back then but I could not pinpoint the root cause.
> > > >
> > > > I don't think it is expected, and I guess it comes from test infra
> > setup
> > > > rather than the Calcite code itself.
> > > >
> > > > Il Dom 12 Dic 2021, 05:43 Jacques Nadeau  ha
> > > scritto:
> > > >
> > > > > I see a couple of recent builds with Druid tests hanging. Is that a
> > > > normal
> > > > > thing or something that has started recently.
> > > > >
> > > > > Examples:
> > > > >
> > >
> https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true
> > > > >
> > > >
> > >
> >
> https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true
> > > > >
> > > >
> > >
> >
>


Re: Druid test hangs?

2021-12-13 Thread Jacques Nadeau
I wonder if we can create a simple shell script that runs a jstack once an
hour (starting after one hour) and then run it using
https://github.com/psxpaul/gradle-execfork-plugin? Since none of our jobs
run an hour, most of the time it wouldn't do anything. In the cases where
the job hung, we'd hopefully get a jstack.


On Mon, Dec 13, 2021 at 12:17 AM Stamatis Zampetakis 
wrote:

> If there is a systematic way to do it I would be interested to know.
>
> In the past, when I encountered similar hangs in CI what I ended-up doing
> is adding debugging commits in the PR with a thread printing stack traces
> of other threads at some intervals.
>
> Best,
> Stamatis
>
> On Sun, Dec 12, 2021 at 7:00 PM Jacques Nadeau  wrote:
>
> > It could be infra but I'm wondering if it is some kind of concurrency
> bug.
> >
> > Anyone know if there is a straightforward way to add a secondary process
> in
> > a github workflow that takes a jstack after an hour or something (if the
> > tests run that long). Trying to jump on an instance when this happens and
> > do this manually sounds like an effort in frustration.
> >
> > I guess another option would be to modify the druid job to provide info
> on
> > tests that are running so that we can see if it always locks on the same
> > test.
> >
> > On Sat, Dec 11, 2021 at 11:39 PM Alessandro Solimando <
> > alessandro.solima...@gmail.com> wrote:
> >
> > > I started noticing that intermittently around a month ago, I had a
> quick
> > > look back then but I could not pinpoint the root cause.
> > >
> > > I don't think it is expected, and I guess it comes from test infra
> setup
> > > rather than the Calcite code itself.
> > >
> > > Il Dom 12 Dic 2021, 05:43 Jacques Nadeau  ha
> > scritto:
> > >
> > > > I see a couple of recent builds with Druid tests hanging. Is that a
> > > normal
> > > > thing or something that has started recently.
> > > >
> > > > Examples:
> > > >
> > https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true
> > > >
> > >
> >
> https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true
> > > >
> > >
> >
>


Re: Druid test hangs?

2021-12-13 Thread Stamatis Zampetakis
If there is a systematic way to do it I would be interested to know.

In the past, when I encountered similar hangs in CI what I ended-up doing
is adding debugging commits in the PR with a thread printing stack traces
of other threads at some intervals.

Best,
Stamatis

On Sun, Dec 12, 2021 at 7:00 PM Jacques Nadeau  wrote:

> It could be infra but I'm wondering if it is some kind of concurrency bug.
>
> Anyone know if there is a straightforward way to add a secondary process in
> a github workflow that takes a jstack after an hour or something (if the
> tests run that long). Trying to jump on an instance when this happens and
> do this manually sounds like an effort in frustration.
>
> I guess another option would be to modify the druid job to provide info on
> tests that are running so that we can see if it always locks on the same
> test.
>
> On Sat, Dec 11, 2021 at 11:39 PM Alessandro Solimando <
> alessandro.solima...@gmail.com> wrote:
>
> > I started noticing that intermittently around a month ago, I had a quick
> > look back then but I could not pinpoint the root cause.
> >
> > I don't think it is expected, and I guess it comes from test infra setup
> > rather than the Calcite code itself.
> >
> > Il Dom 12 Dic 2021, 05:43 Jacques Nadeau  ha
> scritto:
> >
> > > I see a couple of recent builds with Druid tests hanging. Is that a
> > normal
> > > thing or something that has started recently.
> > >
> > > Examples:
> > >
> https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true
> > >
> >
> https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true
> > >
> >
>


Re: Druid test hangs?

2021-12-12 Thread Jacques Nadeau
It could be infra but I'm wondering if it is some kind of concurrency bug.

Anyone know if there is a straightforward way to add a secondary process in
a github workflow that takes a jstack after an hour or something (if the
tests run that long). Trying to jump on an instance when this happens and
do this manually sounds like an effort in frustration.

I guess another option would be to modify the druid job to provide info on
tests that are running so that we can see if it always locks on the same
test.

On Sat, Dec 11, 2021 at 11:39 PM Alessandro Solimando <
alessandro.solima...@gmail.com> wrote:

> I started noticing that intermittently around a month ago, I had a quick
> look back then but I could not pinpoint the root cause.
>
> I don't think it is expected, and I guess it comes from test infra setup
> rather than the Calcite code itself.
>
> Il Dom 12 Dic 2021, 05:43 Jacques Nadeau  ha scritto:
>
> > I see a couple of recent builds with Druid tests hanging. Is that a
> normal
> > thing or something that has started recently.
> >
> > Examples:
> > https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true
> >
> https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true
> >
>


Re: Druid test hangs?

2021-12-11 Thread Alessandro Solimando
I started noticing that intermittently around a month ago, I had a quick
look back then but I could not pinpoint the root cause.

I don't think it is expected, and I guess it comes from test infra setup
rather than the Calcite code itself.

Il Dom 12 Dic 2021, 05:43 Jacques Nadeau  ha scritto:

> I see a couple of recent builds with Druid tests hanging. Is that a normal
> thing or something that has started recently.
>
> Examples:
> https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true
> https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true
>


Druid test hangs?

2021-12-11 Thread Jacques Nadeau
I see a couple of recent builds with Druid tests hanging. Is that a normal
thing or something that has started recently.

Examples:
https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true
https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true