[
https://issues.apache.org/jira/browse/ARROW-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17661965#comment-17661965
]
Rok Mihevc commented on ARROW-4943:
-----------------------------------
This issue has been migrated to [issue
#21450|https://github.com/apache/arrow/issues/21450] on GitHub. Please see the
[migration documentation|https://github.com/apache/arrow/issues/14542] for
further details.
> pyarrow.lib.HadoopFileSystem._connect failed due to TypeError
> -------------------------------------------------------------
>
> Key: ARROW-4943
> URL: https://issues.apache.org/jira/browse/ARROW-4943
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.12.1
> Environment: Kernel: 4.4.95.x86_64
> Python: 2.7.5
> Reporter: vanderliang
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
>
> When run [https://github.com/uber/petastorm.git] pytorch_hello_world.py
> script, it fails due to TypeError as following.
> It seems that the pyarrow.lib.HadoopFileSystem._connect require unicode
> argument, however, the argument input is aways a string type. So add a
> unicode() convert to make sure that the argument is a unicode type.
> Traceback (most recent call last):
> File "pytorch_hello_world.py", line 31, in <module>
> pytorch_hello_world()
> File "pytorch_hello_world.py", line 25, in pytorch_hello_world
> with DataLoader(make_reader(dataset_url)) as train_loader:
> File "/usr/lib/python2.7/site-packages/petastorm/reader.py", line 132, in
> make_reader
> resolver = FilesystemResolver(dataset_url, hdfs_driver=hdfs_driver)
> File "/usr/lib/python2.7/site-packages/petastorm/fs_utils.py", line 83, in
> __init__
> self._filesystem = connector.connect_to_either_namenode(namenodes)
> File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line
> 266, in connect_to_either_namenode
> return HAHdfsClient(cls, list_of_namenodes)
> File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line
> 224, in __init__
> self._do_connect()
> File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line
> 233, in _do_connect
> self._connector_cls._try_next_namenode(self._index_of_nn,
> self._list_of_namenodes)
> File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line
> 289, in _try_next_namenode
> cls.hdfs_connect_namenode(urlparse('hdfs://' + str(host or 'default')))
> File "/usr/lib/python2.7/site-packages/petastorm/hdfs/namenode.py", line
> 250, in hdfs_connect_namenode
> return pyarrow.hdfs.connect(url.hostname or 'default', url.port or 8020,
> driver=driver)
> File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 209, in
> connect
> extra_conf=extra_conf)
> File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 39, in
> __init__
> self._connect(host, port, user, kerb_ticket, driver, extra_conf)
> File "pyarrow/io-hdfs.pxi", line 97, in pyarrow.lib.HadoopFileSystem._connect
> TypeError: Expected unicode, got str
--
This message was sent by Atlassian Jira
(v8.20.10#820010)