[jira] [Commented] (ARROW-8379) [R] Investigate/fix thread safety issues (esp. Windows)

2021-09-28 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421666#comment-17421666
 ] 

Neal Richardson commented on ARROW-8379:


I created ARROW-14159 to explore more mulithreading controls after 6.0.0.

> [R] Investigate/fix thread safety issues (esp. Windows)
> ---
>
> Key: ARROW-8379
> URL: https://issues.apache.org/jira/browse/ARROW-8379
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 6.0.0
>
>
> There have been a number of issues where the R bindings' multithreading has 
> been implicated in unstable behavior (ARROW-7844 for example). In ARROW-8375 
> I disabled {{use_threads}} in the Windows tests, and it appeared that the 
> mysterious Windows segfaults stopped. We should fix whatever the underlying 
> issues are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8379) [R] Investigate/fix thread safety issues (esp. Windows)

2021-09-28 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421635#comment-17421635
 ] 

Neal Richardson commented on ARROW-8379:


> Dataset scanning, etc. has been running in a multi threaded fashion on CI 
> rather reliably on RTools >= 4.

Actually it hasn't been--we set use_threads = FALSE on Windows in the test 
setup. I'm essentially proposing to move this to the package .onLoad hook, 
which means that users will experience the (relative) stability we see on CI.

How about we do that for 6.0, and then after 6.0 we explore more controls for 
multithreading. That way, we'll have time to see how it behaves on CI for a 
couple of months before we release next.

> [R] Investigate/fix thread safety issues (esp. Windows)
> ---
>
> Key: ARROW-8379
> URL: https://issues.apache.org/jira/browse/ARROW-8379
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
>
> There have been a number of issues where the R bindings' multithreading has 
> been implicated in unstable behavior (ARROW-7844 for example). In ARROW-8375 
> I disabled {{use_threads}} in the Windows tests, and it appeared that the 
> mysterious Windows segfaults stopped. We should fix whatever the underlying 
> issues are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8379) [R] Investigate/fix thread safety issues (esp. Windows)

2021-09-28 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421615#comment-17421615
 ] 

Weston Pace commented on ARROW-8379:


> On 32-bit Rtools 3.5, async multithreading just does not work. This means we 
> should disable the dataset features entirely on this build.

+1.  When I dug into this in detail I believe I found that mingw on 32-bit 
windows is using a rather custom backport / bespoke implementation of pthreads 
and there were other issues where it does not properly implement the 
std::thread expectations.  I'd really rather not dig into this.

> Multithreaded conversion from Arrow to R is prone to issues on Windows across 
> the board, regardless of rtools version or 32/64bits. We should set 
> option.use_threads = FALSE on Windows in .onLoad.
> This will have the side effect of disabling multithreading in some other 
> parts of C++ code where use_threads is an option. It is not clear that that 
> is strictly required, but it will be a side effect unless we distinguish the 
> use_threads controls we expose.

+1 on additional controls.  I don't think we should abandon threading on 
Windows entirely.  Dataset scanning, etc. has been running in a multi threaded 
fashion on CI rather reliably on RTools >= 4.

> There may be more work to be done with CPU and IO threadpools, which get used 
> internally in Arrow C++, but I think it might be best to release with these 
> fixes and see if we still get error reports.

+1 to fix the issues as they come.  Again, I don't want to abandon threading on 
Windows entirely.

> Relatedly, an alternative to setting use_threads = FALSE globally would be to 
> leave multithreading on but reduce the size of the CPU and IO threadpools; 
> some reports suggest that setting them to less than the number of CPUs allow 
> them to work. It's not clear though whether this fixes the problem or just 
> decreases the frequency of deadlock.

We currently set the I/O thread pool to 8 and the CPU thread pool to "# of 
cores".  There are API methods to set the size of each pool.  Threads are 
created lazily so as long as you call these methods early enough you won't have 
too many threads.  My hunch would be that we are in the "decreases the 
frequency of deadlock" camp myself though.

> [R] Investigate/fix thread safety issues (esp. Windows)
> ---
>
> Key: ARROW-8379
> URL: https://issues.apache.org/jira/browse/ARROW-8379
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
>
> There have been a number of issues where the R bindings' multithreading has 
> been implicated in unstable behavior (ARROW-7844 for example). In ARROW-8375 
> I disabled {{use_threads}} in the Windows tests, and it appeared that the 
> mysterious Windows segfaults stopped. We should fix whatever the underlying 
> issues are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8379) [R] Investigate/fix thread safety issues (esp. Windows)

2021-09-28 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421606#comment-17421606
 ] 

Jonathan Keane commented on ARROW-8379:
---

Overall this sounds good and in line with what I've been experiencing. A few 
comments on specific points (in no particular order):

> just decreases the frequency of deadlock.

I suspect this is true — with my work on DuckDB I had a similar experience 
where fewer threads meant fewer opportunities to deadlock (and not that it was 
fixed by some number greater than 1). 

> On 32-bit Rtools 3.5 disabling datasets

I don't have a better solution, but part of me wonders if there's much of a 
point to install Arrow if one doesn't have datasets enabled (this would include 
basically any dplyr on tables as well, right?) Maybe it's time to end support 
for this (though that's earlier than I think is typical / we strive for!). At 
least, maybe we should include a message that says something like "32bit 
windows 3.5 has substantially reduced functionality, please consider upgrading 
your R to take full advantage of..."

> [R] Investigate/fix thread safety issues (esp. Windows)
> ---
>
> Key: ARROW-8379
> URL: https://issues.apache.org/jira/browse/ARROW-8379
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
>
> There have been a number of issues where the R bindings' multithreading has 
> been implicated in unstable behavior (ARROW-7844 for example). In ARROW-8375 
> I disabled {{use_threads}} in the Windows tests, and it appeared that the 
> mysterious Windows segfaults stopped. We should fix whatever the underlying 
> issues are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-8379) [R] Investigate/fix thread safety issues (esp. Windows)

2021-09-28 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421595#comment-17421595
 ] 

Neal Richardson commented on ARROW-8379:


Surveying the open issues (now linked here), it looks like:

* On 32-bit Rtools 3.5, async multithreading just does not work. This means we 
should disable the dataset features entirely on this build. It does not appear 
that we can conditionally disable ARROW_DATASET only on this build, based on 
how configure.win works, so we should check this in the R code. Make 
arrow_with_dataset() check os, R version, and arch, which will get tests and 
examples to skip appropriately, and then we could check that inside 
Dataset$create() and error informatively to prevent users on this setup from 
trying. 
* Multithreaded conversion from Arrow to R is prone to issues on Windows across 
the board, regardless of rtools version or 32/64bits. We should set 
option.use_threads = FALSE on Windows in .onLoad. 
* This will have the side effect of disabling multithreading in some other 
parts of C++ code where use_threads is an option. It is not clear that that is 
strictly required, but it will be a side effect unless we distinguish the 
use_threads controls we expose.
* There may be more work to be done with CPU and IO threadpools, which get used 
internally in Arrow C++, but I think it might be best to release with these 
fixes and see if we still get error reports. 
* Relatedly, an alternative to setting use_threads = FALSE globally would 
be to leave multithreading on but reduce the size of the CPU and IO 
threadpools; some reports suggest that setting them to less than the number of 
CPUs allow them to work. It's not clear though whether this fixes the problem 
or just decreases the frequency of deadlock. 

> [R] Investigate/fix thread safety issues (esp. Windows)
> ---
>
> Key: ARROW-8379
> URL: https://issues.apache.org/jira/browse/ARROW-8379
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
>
> There have been a number of issues where the R bindings' multithreading has 
> been implicated in unstable behavior (ARROW-7844 for example). In ARROW-8375 
> I disabled {{use_threads}} in the Windows tests, and it appeared that the 
> mysterious Windows segfaults stopped. We should fix whatever the underlying 
> issues are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)