Re: Druid test hangs?
I bumped into the following thread about dumping stack traces with Gradle [1] and thought that may be worth sharing in case someone decides to implement something along these lines for Calcite. Best, Stamatis [1] https://discuss.gradle.org/t/dump-stack-trace-for-tests/33524 On Mon, Dec 13, 2021 at 6:26 PM Jacques Nadeau wrote: > I wonder if we can create a simple shell script that runs a jstack once an > hour (starting after one hour) and then run it using > https://github.com/psxpaul/gradle-execfork-plugin? Since none of our jobs > run an hour, most of the time it wouldn't do anything. In the cases where > the job hung, we'd hopefully get a jstack. > > > On Mon, Dec 13, 2021 at 12:17 AM Stamatis Zampetakis > wrote: > > > If there is a systematic way to do it I would be interested to know. > > > > In the past, when I encountered similar hangs in CI what I ended-up doing > > is adding debugging commits in the PR with a thread printing stack traces > > of other threads at some intervals. > > > > Best, > > Stamatis > > > > On Sun, Dec 12, 2021 at 7:00 PM Jacques Nadeau > wrote: > > > > > It could be infra but I'm wondering if it is some kind of concurrency > > bug. > > > > > > Anyone know if there is a straightforward way to add a secondary > process > > in > > > a github workflow that takes a jstack after an hour or something (if > the > > > tests run that long). Trying to jump on an instance when this happens > and > > > do this manually sounds like an effort in frustration. > > > > > > I guess another option would be to modify the druid job to provide info > > on > > > tests that are running so that we can see if it always locks on the > same > > > test. > > > > > > On Sat, Dec 11, 2021 at 11:39 PM Alessandro Solimando < > > > alessandro.solima...@gmail.com> wrote: > > > > > > > I started noticing that intermittently around a month ago, I had a > > quick > > > > look back then but I could not pinpoint the root cause. > > > > > > > > I don't think it is expected, and I guess it comes from test infra > > setup > > > > rather than the Calcite code itself. > > > > > > > > Il Dom 12 Dic 2021, 05:43 Jacques Nadeau ha > > > scritto: > > > > > > > > > I see a couple of recent builds with Druid tests hanging. Is that a > > > > normal > > > > > thing or something that has started recently. > > > > > > > > > > Examples: > > > > > > > > > https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true > > > > > > > > > > > > > > > https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true > > > > > > > > > > > > > > >
Re: Druid test hangs?
I wonder if we can create a simple shell script that runs a jstack once an hour (starting after one hour) and then run it using https://github.com/psxpaul/gradle-execfork-plugin? Since none of our jobs run an hour, most of the time it wouldn't do anything. In the cases where the job hung, we'd hopefully get a jstack. On Mon, Dec 13, 2021 at 12:17 AM Stamatis Zampetakis wrote: > If there is a systematic way to do it I would be interested to know. > > In the past, when I encountered similar hangs in CI what I ended-up doing > is adding debugging commits in the PR with a thread printing stack traces > of other threads at some intervals. > > Best, > Stamatis > > On Sun, Dec 12, 2021 at 7:00 PM Jacques Nadeau wrote: > > > It could be infra but I'm wondering if it is some kind of concurrency > bug. > > > > Anyone know if there is a straightforward way to add a secondary process > in > > a github workflow that takes a jstack after an hour or something (if the > > tests run that long). Trying to jump on an instance when this happens and > > do this manually sounds like an effort in frustration. > > > > I guess another option would be to modify the druid job to provide info > on > > tests that are running so that we can see if it always locks on the same > > test. > > > > On Sat, Dec 11, 2021 at 11:39 PM Alessandro Solimando < > > alessandro.solima...@gmail.com> wrote: > > > > > I started noticing that intermittently around a month ago, I had a > quick > > > look back then but I could not pinpoint the root cause. > > > > > > I don't think it is expected, and I guess it comes from test infra > setup > > > rather than the Calcite code itself. > > > > > > Il Dom 12 Dic 2021, 05:43 Jacques Nadeau ha > > scritto: > > > > > > > I see a couple of recent builds with Druid tests hanging. Is that a > > > normal > > > > thing or something that has started recently. > > > > > > > > Examples: > > > > > > https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true > > > > > > > > > > https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true > > > > > > > > > >
Re: Druid test hangs?
If there is a systematic way to do it I would be interested to know. In the past, when I encountered similar hangs in CI what I ended-up doing is adding debugging commits in the PR with a thread printing stack traces of other threads at some intervals. Best, Stamatis On Sun, Dec 12, 2021 at 7:00 PM Jacques Nadeau wrote: > It could be infra but I'm wondering if it is some kind of concurrency bug. > > Anyone know if there is a straightforward way to add a secondary process in > a github workflow that takes a jstack after an hour or something (if the > tests run that long). Trying to jump on an instance when this happens and > do this manually sounds like an effort in frustration. > > I guess another option would be to modify the druid job to provide info on > tests that are running so that we can see if it always locks on the same > test. > > On Sat, Dec 11, 2021 at 11:39 PM Alessandro Solimando < > alessandro.solima...@gmail.com> wrote: > > > I started noticing that intermittently around a month ago, I had a quick > > look back then but I could not pinpoint the root cause. > > > > I don't think it is expected, and I guess it comes from test infra setup > > rather than the Calcite code itself. > > > > Il Dom 12 Dic 2021, 05:43 Jacques Nadeau ha > scritto: > > > > > I see a couple of recent builds with Druid tests hanging. Is that a > > normal > > > thing or something that has started recently. > > > > > > Examples: > > > > https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true > > > > > > https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true > > > > > >
Re: Druid test hangs?
It could be infra but I'm wondering if it is some kind of concurrency bug. Anyone know if there is a straightforward way to add a secondary process in a github workflow that takes a jstack after an hour or something (if the tests run that long). Trying to jump on an instance when this happens and do this manually sounds like an effort in frustration. I guess another option would be to modify the druid job to provide info on tests that are running so that we can see if it always locks on the same test. On Sat, Dec 11, 2021 at 11:39 PM Alessandro Solimando < alessandro.solima...@gmail.com> wrote: > I started noticing that intermittently around a month ago, I had a quick > look back then but I could not pinpoint the root cause. > > I don't think it is expected, and I guess it comes from test infra setup > rather than the Calcite code itself. > > Il Dom 12 Dic 2021, 05:43 Jacques Nadeau ha scritto: > > > I see a couple of recent builds with Druid tests hanging. Is that a > normal > > thing or something that has started recently. > > > > Examples: > > https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true > > > https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true > > >
Re: Druid test hangs?
I started noticing that intermittently around a month ago, I had a quick look back then but I could not pinpoint the root cause. I don't think it is expected, and I guess it comes from test infra setup rather than the Calcite code itself. Il Dom 12 Dic 2021, 05:43 Jacques Nadeau ha scritto: > I see a couple of recent builds with Druid tests hanging. Is that a normal > thing or something that has started recently. > > Examples: > https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true > https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true >
Druid test hangs?
I see a couple of recent builds with Druid tests hanging. Is that a normal thing or something that has started recently. Examples: https://github.com/apache/calcite/runs/4487013505?check_suite_focus=true https://github.com/jacques-n/calcite/runs/4494836558?check_suite_focus=true