[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-05-06 Thread Laurent Goujon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101202#comment-17101202
 ] 

Laurent Goujon commented on CALCITE-3965:
-

Sounds reasonable

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-05-06 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101195#comment-17101195
 ] 

Julian Hyde commented on CALCITE-3965:
--

Thanks for fixing both this and CALCITE-3517. Both PRs look good. (I haven't 
run them.)

How about merging this PR ({{synchronized}}) first, and let it run in CI for 2 
or 3 days before merging the other PR? Just in case it causes some instability.

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-05-06 Thread Laurent Goujon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101161#comment-17101161
 ] 

Laurent Goujon commented on CALCITE-3965:
-

I guess after all this time, I still don't know how the JIRA/Github integration 
works: https://github.com/apache/calcite/pull/1954

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-04-30 Thread Laurent Goujon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097079#comment-17097079
 ] 

Laurent Goujon commented on CALCITE-3965:
-

I agree with the unnecessary xml writing, but adding a synchronized keyword 
when it is not necessary (which is my analysis, and maybe I wrong here, so I 
would appreciate a second pair of eyes) is also a cause for lock contention.

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-04-30 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097077#comment-17097077
 ] 

Julian Hyde commented on CALCITE-3965:
--

I agree, very likely a duplicate of 3517. Please fix 3517, CPU use will go way 
down, and lock contention will reduce. Lock contention is just a symptom, not 
the cause.

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-04-30 Thread Haisheng Yuan (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097072#comment-17097072
 ] 

Haisheng Yuan commented on CALCITE-3965:


Is it duplicate with CALCITE-3517?

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3965) Excessive time waiting on DiffRepository lock

2020-04-30 Thread Laurent Goujon (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097071#comment-17097071
 ] 

Laurent Goujon commented on CALCITE-3965:
-

This is somehow related to CALCITE-3517, but after a look at the code, 
{{DiffRepository#expand}} do not do write to disk, and the main issue is really 
contention around the instance lock. The method is synchronized, but most 
operations look thread-safe. They are some calls to get and set (which are also 
synchronized), but it doesn't look like they need to done "atomically".

Removing {{synchronized}} on the expand() method results in the build 
completing in 2min30s with no test failures.

> Excessive time waiting on DiffRepository lock
> -
>
> Key: CALCITE-3965
> URL: https://issues.apache.org/jira/browse/CALCITE-3965
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Major
>
> When running the whole test suite from commandline, tests are parallelized 
> and gradle/junit tries to use as many cores as possible (16 on my machine). 
> But the tests take a very long time, approximatevely 90minutes on my machine, 
> and several of them failed because they took too long to complete.
> Using jstack to look at the threads state while tests are running show that 
> most of them are waiting on {{DiffRepository}} methods 
> ({{DiffRepository#expand}} in most cases) while one of the thread obtained 
> the lock (and is usually flushing data on disk).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)