[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101202#comment-17101202 ] Laurent Goujon commented on CALCITE-3965: - Sounds reasonable > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101195#comment-17101195 ] Julian Hyde commented on CALCITE-3965: -- Thanks for fixing both this and CALCITE-3517. Both PRs look good. (I haven't run them.) How about merging this PR ({{synchronized}}) first, and let it run in CI for 2 or 3 days before merging the other PR? Just in case it causes some instability. > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101161#comment-17101161 ] Laurent Goujon commented on CALCITE-3965: - I guess after all this time, I still don't know how the JIRA/Github integration works: https://github.com/apache/calcite/pull/1954 > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097079#comment-17097079 ] Laurent Goujon commented on CALCITE-3965: - I agree with the unnecessary xml writing, but adding a synchronized keyword when it is not necessary (which is my analysis, and maybe I wrong here, so I would appreciate a second pair of eyes) is also a cause for lock contention. > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097077#comment-17097077 ] Julian Hyde commented on CALCITE-3965: -- I agree, very likely a duplicate of 3517. Please fix 3517, CPU use will go way down, and lock contention will reduce. Lock contention is just a symptom, not the cause. > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097072#comment-17097072 ] Haisheng Yuan commented on CALCITE-3965: Is it duplicate with CALCITE-3517? > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock
[ https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097071#comment-17097071 ] Laurent Goujon commented on CALCITE-3965: - This is somehow related to CALCITE-3517, but after a look at the code, {{DiffRepository#expand}} do not do write to disk, and the main issue is really contention around the instance lock. The method is synchronized, but most operations look thread-safe. They are some calls to get and set (which are also synchronized), but it doesn't look like they need to done "atomically". Removing {{synchronized}} on the expand() method results in the build completing in 2min30s with no test failures. > Excessive time waiting on DiffRepository lock > - > > Key: CALCITE-3965 > URL: https://issues.apache.org/jira/browse/CALCITE-3965 > Project: Calcite > Issue Type: Bug > Components: core >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Major > > When running the whole test suite from commandline, tests are parallelized > and gradle/junit tries to use as many cores as possible (16 on my machine). > But the tests take a very long time, approximatevely 90minutes on my machine, > and several of them failed because they took too long to complete. > Using jstack to look at the threads state while tests are running show that > most of them are waiting on {{DiffRepository}} methods > ({{DiffRepository#expand}} in most cases) while one of the thread obtained > the lock (and is usually flushing data on disk). -- This message was sent by Atlassian Jira (v8.3.4#803005)