Andrew Redd created ARROW-8731: ---------------------------------- Summary: Error when using toPandas with PyArrow Key: ARROW-8731 URL: https://issues.apache.org/jira/browse/ARROW-8731 Project: Apache Arrow Issue Type: Bug Environment: Python Environment on the worker and driver
- jupyter==1.0.0 - pandas==1.0.3 - pyarrow==0.14.0 - pyspark==2.4.0 - py4j==0.10.7 - pyarrow==0.14.0 Reporter: Andrew Redd I'm getting the following error when calling toPandas on a spark dataframe * This is a blocker to our use of pyarrow on a project {code:java} --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-8-e2ed63d96b43> in <module> ----> 1 s.load_table_to_df('csn_customer.tblcustomerpro').limit(100).toPandas() /venv/lib/python3.6/site-packages/pyspark/sql/dataframe.py in toPandas(self) 2119 _check_dataframe_localize_timestamps 2120 import pyarrow -> 2121 batches = self._collectAsArrow() 2122 if len(batches) > 0: 2123 table = pyarrow.Table.from_batches(batches) /venv/lib/python3.6/site-packages/pyspark/sql/dataframe.py in _collectAsArrow(self) 2177 with SCCallSiteSync(self._sc) as css: 2178 sock_info = self._jdf.collectAsArrowToPython() -> 2179 return list(_load_from_socket(sock_info, ArrowStreamSerializer())) 2180 2181 ########################################################################################## /venv/lib/python3.6/site-packages/pyspark/rdd.py in _load_from_socket(sock_info, serializer) 142 143 def _load_from_socket(sock_info, serializer): --> 144 (sockfile, sock) = local_connect_and_auth(*sock_info) 145 # The RDD materialization time is unpredicable, if we set a timeout for socket reading 146 # operation, it will very possibly fail. See SPARK-18281. TypeError: local_connect_and_auth() takes 2 positional arguments but 3 were given {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)