[jira] [Commented] (SPARK-31525) Inconsistent result of df.head(1) and df.head()
[ https://issues.apache.org/jira/browse/SPARK-31525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164086#comment-17164086 ] Apache Spark commented on SPARK-31525: -- User 'tianshizz' has created a pull request for this issue: https://github.com/apache/spark/pull/29214 > Inconsistent result of df.head(1) and df.head() > --- > > Key: SPARK-31525 > URL: https://issues.apache.org/jira/browse/SPARK-31525 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Joshua Hendinata >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > In this line > [https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1339], > if you are calling `df.head()` and dataframe is empty, it will return *None* > but if you are calling `df.head(1)` and dataframe is empty, it will return > *empty list* instead. > This particular behaviour is not consistent and can create confusion. > Especially when you are calling `len(df.head())` which will throw an > exception for empty dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31525) Inconsistent result of df.head(1) and df.head()
[ https://issues.apache.org/jira/browse/SPARK-31525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164084#comment-17164084 ] Apache Spark commented on SPARK-31525: -- User 'tianshizz' has created a pull request for this issue: https://github.com/apache/spark/pull/29214 > Inconsistent result of df.head(1) and df.head() > --- > > Key: SPARK-31525 > URL: https://issues.apache.org/jira/browse/SPARK-31525 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Joshua Hendinata >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > In this line > [https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1339], > if you are calling `df.head()` and dataframe is empty, it will return *None* > but if you are calling `df.head(1)` and dataframe is empty, it will return > *empty list* instead. > This particular behaviour is not consistent and can create confusion. > Especially when you are calling `len(df.head())` which will throw an > exception for empty dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31525) Inconsistent result of df.head(1) and df.head()
[ https://issues.apache.org/jira/browse/SPARK-31525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146030#comment-17146030 ] Hyukjin Kwon commented on SPARK-31525: -- [~joshuahendinata] are you interested in submitting a PR? > Inconsistent result of df.head(1) and df.head() > --- > > Key: SPARK-31525 > URL: https://issues.apache.org/jira/browse/SPARK-31525 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Joshua Hendinata >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > In this line > [https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1339], > if you are calling `df.head()` and dataframe is empty, it will return *None* > but if you are calling `df.head(1)` and dataframe is empty, it will return > *empty list* instead. > This particular behaviour is not consistent and can create confusion. > Especially when you are calling `len(df.head())` which will throw an > exception for empty dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31525) Inconsistent result of df.head(1) and df.head()
[ https://issues.apache.org/jira/browse/SPARK-31525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146029#comment-17146029 ] Hyukjin Kwon commented on SPARK-31525: -- Yeah, I think it should return an empty list instead of None. > Inconsistent result of df.head(1) and df.head() > --- > > Key: SPARK-31525 > URL: https://issues.apache.org/jira/browse/SPARK-31525 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Joshua Hendinata >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > In this line > [https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1339], > if you are calling `df.head()` and dataframe is empty, it will return *None* > but if you are calling `df.head(1)` and dataframe is empty, it will return > *empty list* instead. > This particular behaviour is not consistent and can create confusion. > Especially when you are calling `len(df.head())` which will throw an > exception for empty dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31525) Inconsistent result of df.head(1) and df.head()
[ https://issues.apache.org/jira/browse/SPARK-31525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146026#comment-17146026 ] Xiao Li commented on SPARK-31525: - cc [~hyukjin.kwon] and [~ueshin] > Inconsistent result of df.head(1) and df.head() > --- > > Key: SPARK-31525 > URL: https://issues.apache.org/jira/browse/SPARK-31525 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Joshua Hendinata >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > In this line > [https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1339], > if you are calling `df.head()` and dataframe is empty, it will return *None* > but if you are calling `df.head(1)` and dataframe is empty, it will return > *empty list* instead. > This particular behaviour is not consistent and can create confusion. > Especially when you are calling `len(df.head())` which will throw an > exception for empty dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31525) Inconsistent result of df.head(1) and df.head()
[ https://issues.apache.org/jira/browse/SPARK-31525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094028#comment-17094028 ] Holden Karau commented on SPARK-31525: -- I agree it's inconsistent, also the docs are a little misleading. I think the root cause is we're using head as both `peek` and `take` which is why we've got mixed metaphores. cc [~davies] who worked on this code most recently (2015) for his thoughts. > Inconsistent result of df.head(1) and df.head() > --- > > Key: SPARK-31525 > URL: https://issues.apache.org/jira/browse/SPARK-31525 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.6, 3.0.0 >Reporter: Joshua Hendinata >Priority: Minor > Original Estimate: 24h > Remaining Estimate: 24h > > In this line > [https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1339], > if you are calling `df.head()` and dataframe is empty, it will return *None* > but if you are calling `df.head(1)` and dataframe is empty, it will return > *empty list* instead. > This particular behaviour is not consistent and can create confusion. > Especially when you are calling `len(df.head())` which will throw an > exception for empty dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org