Christopher Brooks created ARROW-2966:
-----------------------------------------
Summary: Data type conversion error
Key: ARROW-2966
URL: https://issues.apache.org/jira/browse/ARROW-2966
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.9.0
Environment: linux
Reporter: Christopher Brooks
I have a big pandas dataframe. I try and convert that to a pyarrow table and it
fails with a conversion error. Not sure if this is a bug or is expected?
I realize the code below showing the error is pretty useless as is. *What can I
do to help identify the cause in my pandas dataframe?*
Here's the error:
{code:java}
In [17]: pa.Table.from_pandas(df)
---------------------------------------------------------------------------
ArrowInvalid Traceback (most recent call last)
<ipython-input-17-6eac5d0eec08> in <module>()
----> 1 pa.Table.from_pandas(df)
table.pxi in pyarrow.lib.Table.from_pandas()
~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py
in dataframe_to_arrays(df, schema, preserve_index, nthreads)
375 arrays = list(executor.map(convert_column,
376 columns_to_convert,
--> 377 convert_types))
378
379 types = [x.type for x in arrays]
~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
--> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.time())
~/anaconda3/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
423 raise CancelledError()
424 elif self._state == FINISHED:
--> 425 return self.__get_result()
426
427 self._condition.wait(timeout)
~/anaconda3/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result
~/anaconda3/lib/python3.6/concurrent/futures/thread.py in run(self)
54
55 try:
---> 56 result = self.fn(*self.args, **self.kwargs)
57 except BaseException as exc:
58 self.future.set_exception(exc)
~/.local/share/virtualenvs/iq-si-grade-prediction-zHTZ6n2S/lib/python3.6/site-packages/pyarrow/pandas_compat.py
in convert_column(col, ty)
364
365 def convert_column(col, ty):
--> 366 return pa.array(col, from_pandas=True, type=ty)
367
368 if nthreads == 1:
array.pxi in pyarrow.lib.array()
error.pxi in pyarrow.lib.check_status()
error.pxi in pyarrow.lib.check_status()
ArrowInvalid: Error converting from Python objects to Double: Got Python object
of type str but can only handle these types: float
In [18]: pa.__version__
Out[18]: '0.9.0'
In [19]: pd.__version__
Out[19]: '0.23.3'
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)