[jira] [Commented] (SPARK-26433) Tail method for spark DataFrame
[ https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991520#comment-16991520 ] Hyukjin Kwon commented on SPARK-26433: -- Looking back this JIRA, I realised that I underestimated it given multiple requests and other systems. I re-created a JIRA and made a PR. > Tail method for spark DataFrame > --- > > Key: SPARK-26433 > URL: https://issues.apache.org/jira/browse/SPARK-26433 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Jan Gorecki >Priority: Major > > There is a head method for spark dataframes which work fine but there doesn't > seems to be tail method. > ``` > >>> ans > >>> > DataFrame[v1: bigint] > > >>> ans.head(3) > >>> > [Row(v1=299443), Row(v1=299493), Row(v1=300751)] > >>> ans.tail(3) > Traceback (most recent call last): > File "", line 1, in > File > "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py > spark/sql/dataframe.py", line 1300, in __getattr__ > "'%s' object has no attribute '%s'" % (self.__class__.__name__, name)) > AttributeError: 'DataFrame' object has no attribute 'tail' > ``` > I would like to feature request Tail method for spark dataframe -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26433) Tail method for spark DataFrame
[ https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732895#comment-16732895 ] Hyukjin Kwon commented on SPARK-26433: -- There are few potential workarounds. For instnace, http://www.swi.com/spark-rdd-getting-bottom-records/ or {{df.sort($"ColumnName".desc).show()}}. BTW, usually tail or head are used in Scala as below (IMHO): {code} scala> Seq(1, 2, 3).tail res10: Seq[Int] = List(2, 3) scala> Seq(1, 2, 3).head res11: Int = 1 {code} > Tail method for spark DataFrame > --- > > Key: SPARK-26433 > URL: https://issues.apache.org/jira/browse/SPARK-26433 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Jan Gorecki >Priority: Major > > There is a head method for spark dataframes which work fine but there doesn't > seems to be tail method. > ``` > >>> ans > >>> > DataFrame[v1: bigint] > > >>> ans.head(3) > >>> > [Row(v1=299443), Row(v1=299493), Row(v1=300751)] > >>> ans.tail(3) > Traceback (most recent call last): > File "", line 1, in > File > "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py > spark/sql/dataframe.py", line 1300, in __getattr__ > "'%s' object has no attribute '%s'" % (self.__class__.__name__, name)) > AttributeError: 'DataFrame' object has no attribute 'tail' > ``` > I would like to feature request Tail method for spark dataframe -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26433) Tail method for spark DataFrame
[ https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730887#comment-16730887 ] Jan Gorecki commented on SPARK-26433: - [~hyukjin.kwon] Thank you for your comment but not sure if I understood correctly. You mean I should first collect data to client and then extract last few rows of dataframe? If so it doesn't seems to be a feasible solution, as data in spark are likely to not fit into client machine. `Tail` is exactly the operation that one would want to perform BEFORE collecting data to client. Could you confirm? > Tail method for spark DataFrame > --- > > Key: SPARK-26433 > URL: https://issues.apache.org/jira/browse/SPARK-26433 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Jan Gorecki >Priority: Major > > There is a head method for spark dataframes which work fine but there doesn't > seems to be tail method. > ``` > >>> ans > >>> > DataFrame[v1: bigint] > > >>> ans.head(3) > >>> > [Row(v1=299443), Row(v1=299493), Row(v1=300751)] > >>> ans.tail(3) > Traceback (most recent call last): > File "", line 1, in > File > "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py > spark/sql/dataframe.py", line 1300, in __getattr__ > "'%s' object has no attribute '%s'" % (self.__class__.__name__, name)) > AttributeError: 'DataFrame' object has no attribute 'tail' > ``` > I would like to feature request Tail method for spark dataframe -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26433) Tail method for spark DataFrame
[ https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730180#comment-16730180 ] Hyukjin Kwon commented on SPARK-26433: -- You can simply do it after {{collect()}}. Let's avoid to add APIs when workarounds are easy. Spark already has a lot of APIs and I think we should focus on deprecading and reducing it, and only add APIs when they're absolutely worth. > Tail method for spark DataFrame > --- > > Key: SPARK-26433 > URL: https://issues.apache.org/jira/browse/SPARK-26433 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 2.4.0 >Reporter: Jan Gorecki >Priority: Major > > There is a head method for spark dataframes which work fine but there doesn't > seems to be tail method. > ``` > >>> ans > >>> > DataFrame[v1: bigint] > > >>> ans.head(3) > >>> > [Row(v1=299443), Row(v1=299493), Row(v1=300751)] > >>> ans.tail(3) > Traceback (most recent call last): > File "", line 1, in > File > "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py > spark/sql/dataframe.py", line 1300, in __getattr__ > "'%s' object has no attribute '%s'" % (self.__class__.__name__, name)) > AttributeError: 'DataFrame' object has no attribute 'tail' > ``` > I would like to feature request Tail method for spark dataframe -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org