Good day, May i know what is the different between pyspark.sql.dataframe.DataFrame versus pyspark.pandas.frame.DataFrame ? Are both store in Spark dataframe format? I'm looking for a way to load a huge excel file (4-10GB), i wonder should i use third party library spark-excel or just use native pyspark.pandas ? I prefer to use Spark dataframe so that it uses the parallelization feature of Spark in the executors instead of running it on the driver.
Can help to advice ? Detail---df = spark.read \ .format("com.crealytics.spark.excel") \ .option("header", "true") \ .load("/path/big_excel.xls") print(type(df)) # output pyspark.sql.dataframe.DataFrame import pyspark.pandas as ps from pyspark.sql import DataFrame path="/path/big-excel.xls" df= ps.read_excel(path) # output pyspark.pandas.frame.DataFrame Thank you.