[ https://issues.apache.org/jira/browse/ARROW-8154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062082#comment-17062082 ]
Wes McKinney commented on ARROW-8154: ------------------------------------- I think this is a dup of ARROW-7841, a regression that has been fixed since 0.16.0 was released > [Python] HDFS Filesystem does not set environment variables in pyarrow > 0.16.0 release > -------------------------------------------------------------------------------------- > > Key: ARROW-8154 > URL: https://issues.apache.org/jira/browse/ARROW-8154 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Affects Versions: 0.16.0 > Reporter: Eric Henry > Priority: Major > > In pyarrow 0.15.x, HDFS filesystem works as follows: > If you set HADOOP_HOME env var, it looks for libhdfs.so in > $HADOOP_HOME/lib/native. > In pyarrow 0.16.x, if you set HADOOP_HOME, it looks for libhdfs.so in > $HADOOP_HOME, which is incorrect behaviour on all systems I am using. > Also, CLASSPATH no longer gets set automatically, which is very convenient. > The issue here is that I need to set hadoop home correctly to be able to use > other libraries, but have to reset it to use apache arrow. e.g. > os.environ["HADOOP_HOME"] = "/usr/lib/hadoop" > ..do stuff here.. > ...then connect to arrow... > os.environ["HADOOP_HOME"] = "/usr/lib/hadoop/lib/native" > hdfs = pyarrow.hdfs.connect(host, port) > ...then reset my hadoop home... > os.environ["HADOOP_HOME"] = "/usr/lib/hadoop" > etc. > > Example: > >>> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop" > >>> hdfs = pyarrow.hdfs.connect(host, port) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File > "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py", > line 215, in connect > extra_conf=extra_conf) > File > "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py", > line 40, in __init__ > self._connect(host, port, user, kerb_ticket, driver, extra_conf) > File "pyarrow/io-hdfs.pxi", line 89, in > pyarrow.lib.HadoopFileSystem._connect > File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status > OSError: Unable to load libhdfs: /usr/lib/hadoop/libhdfs.so: cannot open > shared object file: No such file or directory > -- This message was sent by Atlassian Jira (v8.3.4#803005)