[ https://issues.apache.org/jira/browse/SPARK-46143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matheus Pavanetti updated SPARK-46143: -------------------------------------- Environment: pyspark 3.4.1.5.3 build 20230713. Running on Microsoft Fabric workspace. was: Apache spark 3.4.1.5.3 build 20230713. Running on Microsoft Fabric workspace. > pyspark.pandas read_excel implementation at version 3.4.1 > --------------------------------------------------------- > > Key: SPARK-46143 > URL: https://issues.apache.org/jira/browse/SPARK-46143 > Project: Spark > Issue Type: Bug > Components: Build > Affects Versions: 3.4.1 > Environment: pyspark 3.4.1.5.3 build 20230713. > Running on Microsoft Fabric workspace. > > > Reporter: Matheus Pavanetti > Priority: Major > Attachments: MicrosoftTeams-image.png, > image-2023-11-28-13-20-40-275.png, image-2023-11-28-13-20-51-291.png > > > Hello, > I would like to report an issue with pyspark.pandas implementation on > read_excel function. > Microsoft Fabric spark environment 1.2 (runtime) uses pyspark 3.4.1 which > potentially uses an older version of pandas on it's implementations of > pyspark.pandas. > The function read_excel from pandas doesn't expect a parameter called > "squeeze" however it's implemented as part of pyspark.pandas and the > parameter "squeeze" is being passed to the pandas function. > > !image-2023-11-28-13-20-40-275.png! > > I've been digging into it for further investigation into pyspark 3.4.1 > documentation > [https://spark.apache.org/docs/3.4.1/api/python/_modules/pyspark/pandas/namespace.html#read_excel|https://mcas-proxyweb.mcas.ms/certificate-checker?login=false&originalUrl=https%3A%2F%2Fspark.apache.org.mcas.ms%2Fdocs%2F3.4.1%2Fapi%2Fpython%2F_modules%2Fpyspark%2Fpandas%2Fnamespace.html%3FMcasTsid%3D20893%23read_excel&McasCSRF=92c0f0a0811f59386edd92fd5f3fcb0ac451ce363b3f2e01ed076f45e2b20500] > > This is the point I found that "squeeze" parameter is being passed to pandas > read_excel function which is not expected. > It seems like it was deprecated as part of pyspark 3.4.0 but still being used > in the implementation. > > !image-2023-11-28-13-20-51-291.png! > > I believe this is an issue with pyspark implementation 3.4.1 not necessaily > with fabric. However fabric uses this version as its 1.2 build. > > I am able to work around that for now by download the excel from the one lake > to the spark driver, loading that to the memory with pandas and then > converting to a spark dataframe etc or I made it work downgrading the build > I downloaded the pyspark build 20230713 to my local, made the changes and > re-compiled it and it worked locally. So it means that is related to the > implementation and they would have to fix or I do a downgrade to older > version like 3.3.3 or try the latest 3.5.0 which is not the case for fabric > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org