Panagiotis Nezis created ARROW-7451: ---------------------------------------
Summary: pyarrow.hdfs.connect crashes when executed asynchronously in processes Key: ARROW-7451 URL: https://issues.apache.org/jira/browse/ARROW-7451 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.15.1 Reporter: Panagiotis Nezis When trying to connect to {{hdfs}} from a {{ProcessPoolExecutor}} then the first call raises an Exception and the function never returns (potential deadlock?). On the other hand it works as expected with a {{ThreadPoolExecutor}}. Sample code that reproduces the problem follows: {code:python} import pyarrow as pa from concurrent.futures import ( ThreadPoolExecutor, ProcessPoolExecutor, wait, ALL_COMPLETED) def ls(): fs = pa.hdfs.connect('hdfs://host') print(fs.ls('/')) # This works as expected ls() # Running in parallel thread_pool = ThreadPoolExecutor(max_workers=4) process_pool = ProcessPoolExecutor(max_workers=4) def run(pool): futures = [pool.submit(ls) for _ in range(5)] wait(futures, return_when=ALL_COMPLETED) # The thread_pool works as expected run(thread_pool) # The process_pool raises an exception run(process_pool) {code} The following exception is raised: {noformat} java.lang.ClassFormatError: Incompatible magic value 1347093252 in class file org/xml/sax/helpers/LocatorImpl at java.lang.ClassLoader.findBootstrapClass(Native Method) at java.lang.ClassLoader.findBootstrapClassOrNull(ClassLoader.java:1015) at java.lang.ClassLoader.loadClass(ClassLoader.java:413) at java.lang.ClassLoader.loadClass(ClassLoader.java:411) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:150) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2684) at org.apache.hadoop.conf.Configuration.parse(Configuration.java:2672) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2746) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2696) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2579) at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091) at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:404) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)