[ https://issues.apache.org/jira/browse/SPARK-44670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17750959#comment-17750959 ]
Madhukar commented on SPARK-44670: ---------------------------------- Raised a PR for using openpyxl instead of xlrd - [https://github.com/apache/spark/pull/42339] > Fix the `test_to_excel` tests for python3.7 > ------------------------------------------- > > Key: SPARK-44670 > URL: https://issues.apache.org/jira/browse/SPARK-44670 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark > Affects Versions: 3.4.1 > Reporter: Madhukar > Priority: Minor > > With python3.7 and openpyxl installed got error: > ====================================================================== > ERROR: test_to_excel > (pyspark.pandas.tests.test_dataframe_conversion.DataFrameConversionTest) > Traceback (most recent call last): > File > "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py", > line 102, in test_to_excel > dataframes = self.get_excel_dfs(pandas_on_spark_location, pandas_location) > File > "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py", > line 89, in get_excel_dfs > "got": pd.read_excel(pandas_on_spark_location, index_col=0), > File "/opt/conda/lib/python3.7/site-packages/pandas/util/_decorators.py", > line 296, in wrapper > return func(*args, **kwargs) > File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", > line 304, in read_excel > io = ExcelFile(io, engine=engine) > File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", > line 867, in __init__ > self._reader = self._engines[engine](self._io) > File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", > line 21, in __init__ > import_optional_dependency("xlrd", extra=err_msg) > File "/opt/conda/lib/python3.7/site-packages/pandas/compat/_optional.py", > line 110, in import_optional_dependency > raise ImportError(msg) from None > ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for > Excel support Use pip or conda to install xlrd. > ---------------------------------------------------------------------- > > > > But with xlrd 2.0.1 installed getting error > ====================================================================== > ERROR: test_to_excel > (pyspark.pandas.tests.test_dataframe_conversion.DataFrameConversionTest) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py", > line 102, in test_to_excel > dataframes = self.get_excel_dfs(pandas_on_spark_location, pandas_location) > File > "/workspace/apache-spark/python/pyspark/pandas/tests/test_dataframe_conversion.py", > line 89, in get_excel_dfs > "got": pd.read_excel(pandas_on_spark_location, index_col=0), > File "/opt/conda/lib/python3.7/site-packages/pandas/util/_decorators.py", > line 296, in wrapper > return func(*args, **kwargs) > File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", > line 304, in read_excel > io = ExcelFile(io, engine=engine) > File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", > line 867, in __init__ > self._reader = self._engines[engine](self._io) > File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", > line 22, in __init__ > super().__init__(filepath_or_buffer) > File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_base.py", > line 353, in __init__ > self.book = self.load_workbook(filepath_or_buffer) > File "/opt/conda/lib/python3.7/site-packages/pandas/io/excel/_xlrd.py", > line 37, in load_workbook > return open_workbook(filepath_or_buffer) > File "/opt/conda/lib/python3.7/site-packages/xlrd/__init__.py", line 170, > in open_workbook > raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported') > xlrd.biffh.XLRDError: Excel xlsx file; not supported > ---------------------------------------------------------------------- > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org