[ https://issues.apache.org/jira/browse/SPARK-46143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matheus Pavanetti updated SPARK-46143: -------------------------------------- Description: Hello, I would like to report an issue with pyspark.pandas implementation on read_excel function. Microsoft Fabric spark environment 1.2 (runtime) uses pyspark 3.4.1 which potentially uses an older version of pandas on it's implementations of pyspark.pandas. The function read_excel from pandas doesn't expect a parameter called "squeeze" however it's implemented as part of pyspark.pandas and the parameter "squeeze" is being passed to the pandas function. !image-2023-11-28-13-20-40-275.png! I've been digging into it for further investigation into pyspark 3.4.1 documentation [https://spark.apache.org/docs/3.4.1/api/python/_modules/pyspark/pandas/namespace.html#read_excel|https://mcas-proxyweb.mcas.ms/certificate-checker?login=false&originalUrl=https%3A%2F%2Fspark.apache.org.mcas.ms%2Fdocs%2F3.4.1%2Fapi%2Fpython%2F_modules%2Fpyspark%2Fpandas%2Fnamespace.html%3FMcasTsid%3D20893%23read_excel&McasCSRF=92c0f0a0811f59386edd92fd5f3fcb0ac451ce363b3f2e01ed076f45e2b20500] This is the point I found that "squeeze" parameter is being passed to pandas read_excel function which is not expected. It seems like it was deprecated as part of pyspark 3.4.0 but still being used in the implementation. !image-2023-11-28-13-20-51-291.png! I believe this is an issue with pyspark implementation 3.4.1 not necessaily with fabric. However fabric uses this version as its 1.2 build. I am able to work around that for now by download the excel from the one lake to the spark driver, loading that to the memory with pandas and then converting to a spark dataframe etc or I made it work downgrading the build I downloaded the pyspark build 20230713 to my local, made the changes and re-compiled it and it worked locally. So it means that is related to the implementation and they would have to fix or I do a downgrade to older version like 3.3.0 or try the latest 3.5.0 which is not the case for fabric was: Hello, I would like to report an issue with pyspark.pandas implementation on read_excel function. Microsoft Fabric spark environment 1.2 (runtime) uses pyspark 3.4.1 which potentially uses an older version of pandas on it's implementations of pyspark.pandas. The function read_excel from pandas doesn't expect a parameter called "squeeze" however it's implemented as part of pyspark.pandas and the parameter "squeeze" is being passed to the pandas function. !Z! I've been digging into it for further investigation into pyspark 3.4.1 documentation [https://spark.apache.org/docs/3.4.1/api/python/_modules/pyspark/pandas/namespace.html#read_excel|https://mcas-proxyweb.mcas.ms/certificate-checker?login=false&originalUrl=https%3A%2F%2Fspark.apache.org.mcas.ms%2Fdocs%2F3.4.1%2Fapi%2Fpython%2F_modules%2Fpyspark%2Fpandas%2Fnamespace.html%3FMcasTsid%3D20893%23read_excel&McasCSRF=92c0f0a0811f59386edd92fd5f3fcb0ac451ce363b3f2e01ed076f45e2b20500] This is the point I found that "squeeze" parameter is being passed to pandas read_excel function which is not expected. It seems like it was deprecated as part of pyspark 3.4.0 but still being used in the implementation. !9k=! I believe this is an issue with pyspark implementation 3.4.1 not necessaily with fabric. However fabric uses this version as its 1.2 build. I am able to work around that for now by download the excel from the one lake to the spark driver, loading that to the memory with pandas and then converting to a spark dataframe etc or I made it work downgrading the build I downloaded the pyspark build 20230713 to my local, made the changes and re-compiled it and it worked locally. So it means that is related to the implementation and they would have to fix or I do a downgrade to older version like 3.3.0 or try the latest 3.5.0 which is not the case for fabric > pyspark.pandas read_excel implementation at version 3.4.1 > --------------------------------------------------------- > > Key: SPARK-46143 > URL: https://issues.apache.org/jira/browse/SPARK-46143 > Project: Spark > Issue Type: Bug > Components: Build > Affects Versions: 3.4.1 > Environment: Apache spark 3.4.1.5.3 build 20230713. > Running on Microsoft Fabric workspace. > > > Reporter: Matheus Pavanetti > Priority: Major > Attachments: MicrosoftTeams-image.png, > image-2023-11-28-13-20-40-275.png, image-2023-11-28-13-20-51-291.png > > > Hello, > I would like to report an issue with pyspark.pandas implementation on > read_excel function. > Microsoft Fabric spark environment 1.2 (runtime) uses pyspark 3.4.1 which > potentially uses an older version of pandas on it's implementations of > pyspark.pandas. > The function read_excel from pandas doesn't expect a parameter called > "squeeze" however it's implemented as part of pyspark.pandas and the > parameter "squeeze" is being passed to the pandas function. > > !image-2023-11-28-13-20-40-275.png! > > I've been digging into it for further investigation into pyspark 3.4.1 > documentation > [https://spark.apache.org/docs/3.4.1/api/python/_modules/pyspark/pandas/namespace.html#read_excel|https://mcas-proxyweb.mcas.ms/certificate-checker?login=false&originalUrl=https%3A%2F%2Fspark.apache.org.mcas.ms%2Fdocs%2F3.4.1%2Fapi%2Fpython%2F_modules%2Fpyspark%2Fpandas%2Fnamespace.html%3FMcasTsid%3D20893%23read_excel&McasCSRF=92c0f0a0811f59386edd92fd5f3fcb0ac451ce363b3f2e01ed076f45e2b20500] > > This is the point I found that "squeeze" parameter is being passed to pandas > read_excel function which is not expected. > It seems like it was deprecated as part of pyspark 3.4.0 but still being used > in the implementation. > > !image-2023-11-28-13-20-51-291.png! > > I believe this is an issue with pyspark implementation 3.4.1 not necessaily > with fabric. However fabric uses this version as its 1.2 build. > > I am able to work around that for now by download the excel from the one lake > to the spark driver, loading that to the memory with pandas and then > converting to a spark dataframe etc or I made it work downgrading the build > I downloaded the pyspark build 20230713 to my local, made the changes and > re-compiled it and it worked locally. So it means that is related to the > implementation and they would have to fix or I do a downgrade to older > version like 3.3.0 or try the latest 3.5.0 which is not the case for fabric > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org