PySpark row_number Question

2017-04-14 Thread infa elance
Hi All, I trying to understand how row_number is applied In the below code, does spark store data in a dataframe and then perform row_number function or does it apply while reading from hive ? from pyspark.sql import HiveContext hiveContext = HiveContext(sc) hiveContext.sql(" ( SELECT colunm1

PySpark row_number Question

2017-04-14 Thread infa elance
Hi All, I trying to understand how row_number is applied In the below code, does spark store data in a dataframe and then perform row_number function or does it apply while reading from hive ? from pyspark.sql import HiveContext hiveContext = HiveContext(sc) hiveContext.sql(" ( SELECT colunm1