Jim Fulton created ARROW-3957:
---------------------------------
Summary: pyarrow.hdfs.connect fails silently
Key: ARROW-3957
URL: https://issues.apache.org/jira/browse/ARROW-3957
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.11.1
Environment: centos 7
Reporter: Jim Fulton
I'm trying to connect to HDFS using libhdfs and Kerberos.
I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets
CLASSPATH correctly.
My connect call looks like:
{{import pyarrow.hdfs c = pyarrow.hdfs.connect(host='MYHOST', port=42424,
user='ME', kerb_ticket="/tmp/krb5cc_498970") }}
This doesn't error but the resulting connection can't do anything. They either
error like this:
{{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }}
Or swallow errors (e.g. {{exists}} returning {{False}}).
Note that {{connect}} errors if the host is wrong but doesn't error if the
port, user, or kerb_ticket are wrong. I have no idea how to debug this, because
no useful errors.
Note that I _can_ connect using the hdfs Python package. (Of course, that
doesn't provide the API I need to read Parquet files.).
Any help would be appreciated greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)