Tiger068 created ARROW-5049: ------------------------------- Summary: [Python] org/apache/hadoop/fs/FileSystem class not found when pyarrow FileSystem used in spark Key: ARROW-5049 URL: https://issues.apache.org/jira/browse/ARROW-5049 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.12.1, 0.12.0 Reporter: Tiger068 Assignee: Tiger068 Fix For: 0.13.0
when i init pyarrow filesystem to connect hdfs clusfter in spark,the libhdfs throws error: {code:java} org/apache/hadoop/fs/FileSystem class not found {code} I print out the CLASSPATH, the classpath value is wildcard mode {code:java} ../share/hadoop/hdfs;spark/spark-2.0.2-bin-hadoop2.7/jars... {code} Than value is set by spark,but libhdfs must load class from jar files. Root cause is: we just check the string ''hadoop" in classpath,but not jar file {code:java} def _maybe_set_hadoop_classpath(): if 'hadoop' in os.environ.get('CLASSPATH', ''): return{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)