[ 
https://issues.apache.org/jira/browse/ARROW-8154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062082#comment-17062082
 ] 

Wes McKinney commented on ARROW-8154:
-------------------------------------

I think this is a dup of ARROW-7841, a regression that has been fixed since 
0.16.0 was released

> [Python] HDFS Filesystem does not set environment variables in  pyarrow 
> 0.16.0 release
> --------------------------------------------------------------------------------------
>
>                 Key: ARROW-8154
>                 URL: https://issues.apache.org/jira/browse/ARROW-8154
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.16.0
>            Reporter: Eric Henry
>            Priority: Major
>
> In pyarrow 0.15.x, HDFS filesystem works as follows:
> If you set HADOOP_HOME env var, it looks for libhdfs.so in 
> $HADOOP_HOME/lib/native.
> In pyarrow 0.16.x, if you set HADOOP_HOME, it looks for libhdfs.so in 
> $HADOOP_HOME, which is incorrect behaviour on all systems I am using.
> Also, CLASSPATH no longer gets set automatically, which is very convenient. 
> The issue here is that I need to set hadoop home correctly to be able to use 
> other libraries, but have to reset it to use apache arrow. e.g.
> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"
> ..do stuff here..
> ...then connect to arrow...
> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop/lib/native"
> hdfs = pyarrow.hdfs.connect(host, port)
> ...then reset my hadoop home...
> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"
> etc.
>  
> Example:
> >>> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"
> >>> hdfs = pyarrow.hdfs.connect(host, port)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File 
> "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py",
>  line 215, in connect
>     extra_conf=extra_conf)
>   File 
> "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py",
>  line 40, in __init__
>     self._connect(host, port, user, kerb_ticket, driver, extra_conf)
>   File "pyarrow/io-hdfs.pxi", line 89, in 
> pyarrow.lib.HadoopFileSystem._connect
>   File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
> OSError: Unable to load libhdfs: /usr/lib/hadoop/libhdfs.so: cannot open 
> shared object file: No such file or directory
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to