[Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray

Kartik Kumar Perisetla Tue, 10 Feb 2015 18:40:01 -0800

Hi all,

for one of my projects I am using basically using NLTK for pos tagging,
which internally uses a 'english.pickle' file. I managed to package the
nltk library with these pickle files to make them available to mapper and
reducer for hadoop streaming job using -file option.


However, when nltk library is trying to load that pickle file, it gives
error for numpy- since the cluster I am running this job does not have
numpy installed. Also, I don't have root access thus, can't install numpy
or any other package on cluster. So the only way is to package the python
modules to make it available for mapper and reducer. I successfully managed
to do that. But now the problem is when numpy is imported, it imports
multiarray by default( as seen in *init*.py) and this is where I am getting
the error:

File "/usr/lib64/python2.6/pickle.py", line 1370, in load
        return Unpickler(file).load()
      File "/usr/lib64/python2.6/pickle.py", line 858, in load
        dispatch[key](self)
      File "/usr/lib64/python2.6/pickle.py", line 1090, in load_global
        klass = self.find_class(module, name)
      File "/usr/lib64/python2.6/pickle.py", line 1124, in find_class
        __import__(module)
      File "numpy.mod/numpy/__init__.py", line 170, in <module>
      File "numpy.mod/numpy/add_newdocs.py", line 13, in <module>
      File "numpy.mod/numpy/lib/__init__.py", line 8, in <module>
      File "numpy.mod/numpy/lib/type_check.py", line 11, in <module>
      File "numpy.mod/numpy/core/__init__.py", line 6, in <module>
    ImportError: cannot import name multiarray

I tried moving numpy directory on my local machine that contains
multiarray.pyd, to the cluster to make it available to mapper and reducer
but this didn't help.

Any input on how to resolve this(keeping the constraint that I cannot
install anything on cluster machines)?

Thanks!

-- 
Regards,

Kartik Perisetla

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Using numpy on hadoop streaming: ImportError: cannot import name multiarray

Reply via email to