[ 
https://issues.apache.org/jira/browse/SLING-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Mueller updated SLING-4477:
----------------------------------
    Description: 
There JcrInstaller calls Thread.interrupt() where it's dangerous and not 
necessary. Thread.interrupt is dangerous because it closes files (when using 
the FileChannel API), including Lucene files, and the Oak persistent cache 
files. All further I/O operations with that file, including I/O operations on 
other threads, will then fail (see ClosedByInterruptException for details). 
OAK-2571 protects against closing persistent cache files, by reopening the 
files. But it results in slower performance and ugly log messages. 

Thread.interrupt is also dangerous because it does not work as expected if some 
code catches InterruptedException and does not re-throw it. See 
http://stackoverflow.com/questions/2020992/is-thread-interrupt-evil
Thread.interrupt is not necessary in most cases. Instead, a simple "volatile 
boolean" flag is sufficient, and much safer.

The JCR installer uses a boolean flag (active), but also Thread.interrupt, to 
stop the sleep period.

Just before the problem occurs, I see the following messages in the log file:

{noformat}
05.03.2015 15:22:26.642 *INFO* [CM Event Dispatcher (Fire ConfigurationEvent: 
pid=org.apache.sling.installer.provider.jcr.impl.JcrInstaller)] 
org.apache.sling.installer.provider.jcr.impl.JcrInstaller Deactivating Apache 
Sling JCR Installer
{noformat}

The very last message is from the 
org.apache.sling.installer.provider.jcr.impl.JcrInstaller, who calls:

{noformat}
backgroundThread.interrupt();
{noformat}

One possible solution is: in JcrInstaller, instead of:
{noformat}
        try {
            Thread.sleep(RUN_LOOP_DELAY_MSEC);
        } catch (final InterruptedException ignore) {
            // ignore
        }
{noformat}

use:
{noformat}
        synchronized (this) {
            if (active) {
                try {
                    wait(RUN_LOOP_DELAY_MSEC);
                } catch (final InterruptedException ignore) {
                    // ignore
                }
            }
        } 
{noformat}

and instead of:
{noformat}
        backgroundThread.active = false;
        logger.debug("Waiting for " + backgroundThread.getName() + " Thread to 
end...");
        backgroundThread.interrupt();
{noformat}

use:
{noformat}
        synchronized (backgroundThread) {
            backgroundThread.active = false;
            backgroundThread.notifyAll();
        } 
        logger.debug("Waiting for " + backgroundThread.getName() + " Thread to 
end...");
{noformat}

That's also better than what we have now, because right now, _any_ code within 
"runOneCycle" (including library and Oak code) that is doing "catch 
(InterruptedException x) { }" will let the "Thread.sleep(RUN_LOOP_DELAY_MSEC)" 
sleep one second too long.
And people catch and ignore InterruptedException a _lot_. Using wait and 
notifyAll does not suffer from this, and has no risk of trouble.

An alternative is to simply remove backgroundThread.interrupt(), and change the 
loop to:

{noformat}
        for (int i = 0; i < 100 && active; i++) {
            try {
                Thread.sleep(RUN_LOOP_DELAY_MSEC / 100);
            } catch (final InterruptedException ignore) {
                // ignore
            }
        }
{noformat}

This will let it sleep 10 ms too long at most. It requires that 
RUN_LOOP_DELAY_MSEC is >= 100, which is ugly.

In this case, the field "active" needs to be volatile. But is needed even for 
the current code (it's better to fix that in all cases).


  was:
There JcrInstaller calls Thread.interrupt() where it's dangerous and not 
necessary. Thread.interrupt is dangerous because it closes files (when using 
the FileChannel API), including Lucene files, and the Oak persistent cache 
files. All further I/O operations with that file, including I/O operations on 
other threads, will then fail (see ClosedByInterruptException for details). 
OAK-2571 protects against closing persistent cache files, by reopening the 
files. But it results in slower performance and ugly log messages. 

Thread.interrupt is also dangerous because it does not work as expected if some 
code catches InterruptedException and does not re-throw it. See 
http://stackoverflow.com/questions/2020992/is-thread-interrupt-evil
Thread.interrupt is not necessary in most cases. Instead, a simple "volatile 
boolean" flag is sufficient, and much safer.

The JCR installer uses a boolean flag (active), but also Thread.interrupt, to 
stop the sleep period.

Just before the problem occurs, I see the following messages in the log file:

{noformat}
05.03.2015 15:22:26.642 *INFO* [CM Event Dispatcher (Fire ConfigurationEvent: 
pid=org.apache.sling.installer.provider.jcr.impl.JcrInstaller)] 
org.apache.sling.installer.provider.jcr.impl.JcrInstaller Deactivating Apache 
Sling JCR Installer
{noformat}

The very last message is from the 
org.apache.sling.installer.provider.jcr.impl.JcrInstaller, who calls:

{noformat}
backgroundThread.interrupt();
{noformat}

One possible solution is: in JcrInstaller, instead of:
{noformat}
        try {
            Thread.sleep(RUN_LOOP_DELAY_MSEC);
        } catch (final InterruptedException ignore) {
            // ignore
        }
{noformat}

use:
{noformat}
        synchronized (this) {
            if (active) {
                try {
                    wait(RUN_LOOP_DELAY_MSEC);
                } catch (final InterruptedException ignore) {
                    // ignore
                }
            }
        } 
{noformat}

and instead of:
{noformat}
        backgroundThread.interrupt();
{noformat}

use:
{noformat}
        synchronized (backgroundThread) {
            backgroundThread.notifyAll();
        } 
{noformat}

That's also better than what we have now, because right now, _any_ code within 
"runOneCycle" (including library and Oak code) that is doing "catch 
(InterruptedException x) { }" will let the "Thread.sleep(RUN_LOOP_DELAY_MSEC)" 
sleep one second too long.
And people catch and ignore InterruptedException a _lot_. Using wait and 
notifyAll does not suffer from this, and has no risk of trouble.

An alternative is to simply remove backgroundThread.interrupt(), and change the 
loop to:

{noformat}
        for (int i = 0; i < 100 && active; i++) {
            try {
                Thread.sleep(RUN_LOOP_DELAY_MSEC / 100);
            } catch (final InterruptedException ignore) {
                // ignore
            }
        }
{noformat}

This will let it sleep 10 ms too long at most. In this case, the field "active" 
needs to be volatile. But is needed even for the current code.



> JcrInstaller should not call Thread.interrupt()
> -----------------------------------------------
>
>                 Key: SLING-4477
>                 URL: https://issues.apache.org/jira/browse/SLING-4477
>             Project: Sling
>          Issue Type: Improvement
>          Components: Installer, JCR
>            Reporter: Thomas Mueller
>
> There JcrInstaller calls Thread.interrupt() where it's dangerous and not 
> necessary. Thread.interrupt is dangerous because it closes files (when using 
> the FileChannel API), including Lucene files, and the Oak persistent cache 
> files. All further I/O operations with that file, including I/O operations on 
> other threads, will then fail (see ClosedByInterruptException for details). 
> OAK-2571 protects against closing persistent cache files, by reopening the 
> files. But it results in slower performance and ugly log messages. 
> Thread.interrupt is also dangerous because it does not work as expected if 
> some code catches InterruptedException and does not re-throw it. See 
> http://stackoverflow.com/questions/2020992/is-thread-interrupt-evil
> Thread.interrupt is not necessary in most cases. Instead, a simple "volatile 
> boolean" flag is sufficient, and much safer.
> The JCR installer uses a boolean flag (active), but also Thread.interrupt, to 
> stop the sleep period.
> Just before the problem occurs, I see the following messages in the log file:
> {noformat}
> 05.03.2015 15:22:26.642 *INFO* [CM Event Dispatcher (Fire ConfigurationEvent: 
> pid=org.apache.sling.installer.provider.jcr.impl.JcrInstaller)] 
> org.apache.sling.installer.provider.jcr.impl.JcrInstaller Deactivating Apache 
> Sling JCR Installer
> {noformat}
> The very last message is from the 
> org.apache.sling.installer.provider.jcr.impl.JcrInstaller, who calls:
> {noformat}
> backgroundThread.interrupt();
> {noformat}
> One possible solution is: in JcrInstaller, instead of:
> {noformat}
>         try {
>             Thread.sleep(RUN_LOOP_DELAY_MSEC);
>         } catch (final InterruptedException ignore) {
>             // ignore
>         }
> {noformat}
> use:
> {noformat}
>         synchronized (this) {
>             if (active) {
>                 try {
>                     wait(RUN_LOOP_DELAY_MSEC);
>                 } catch (final InterruptedException ignore) {
>                     // ignore
>                 }
>             }
>         } 
> {noformat}
> and instead of:
> {noformat}
>         backgroundThread.active = false;
>         logger.debug("Waiting for " + backgroundThread.getName() + " Thread 
> to end...");
>         backgroundThread.interrupt();
> {noformat}
> use:
> {noformat}
>         synchronized (backgroundThread) {
>             backgroundThread.active = false;
>             backgroundThread.notifyAll();
>         } 
>         logger.debug("Waiting for " + backgroundThread.getName() + " Thread 
> to end...");
> {noformat}
> That's also better than what we have now, because right now, _any_ code 
> within "runOneCycle" (including library and Oak code) that is doing "catch 
> (InterruptedException x) { }" will let the 
> "Thread.sleep(RUN_LOOP_DELAY_MSEC)" sleep one second too long.
> And people catch and ignore InterruptedException a _lot_. Using wait and 
> notifyAll does not suffer from this, and has no risk of trouble.
> An alternative is to simply remove backgroundThread.interrupt(), and change 
> the loop to:
> {noformat}
>         for (int i = 0; i < 100 && active; i++) {
>             try {
>                 Thread.sleep(RUN_LOOP_DELAY_MSEC / 100);
>             } catch (final InterruptedException ignore) {
>                 // ignore
>             }
>         }
> {noformat}
> This will let it sleep 10 ms too long at most. It requires that 
> RUN_LOOP_DELAY_MSEC is >= 100, which is ugly.
> In this case, the field "active" needs to be volatile. But is needed even for 
> the current code (it's better to fix that in all cases).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to