[jira] [Commented] (SPARK-26433) Tail method for spark DataFrame

2019-12-09 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991520#comment-16991520
 ] 

Hyukjin Kwon commented on SPARK-26433:
--

Looking back this JIRA, I realised that I underestimated it given multiple 
requests and other systems. I re-created a JIRA and made a PR.

> Tail method for spark DataFrame
> ---
>
> Key: SPARK-26433
> URL: https://issues.apache.org/jira/browse/SPARK-26433
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Jan Gorecki
>Priority: Major
>
> There is a head method for spark dataframes which work fine but there doesn't 
> seems to be tail method.
> ```
> >>> ans   
> >>>   
> DataFrame[v1: bigint] 
>   
> >>> ans.head(3)   
> >>>  
> [Row(v1=299443), Row(v1=299493), Row(v1=300751)]
> >>> ans.tail(3)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py
> spark/sql/dataframe.py", line 1300, in __getattr__
> "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
> AttributeError: 'DataFrame' object has no attribute 'tail'
> ```
> I would like to feature request Tail method for spark dataframe



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26433) Tail method for spark DataFrame

2019-01-03 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732895#comment-16732895
 ] 

Hyukjin Kwon commented on SPARK-26433:
--

There are few potential workarounds. For instnace, 
http://www.swi.com/spark-rdd-getting-bottom-records/ or 
{{df.sort($"ColumnName".desc).show()}}.
BTW, usually tail or head are used in Scala as below (IMHO):

{code}
scala> Seq(1, 2, 3).tail
res10: Seq[Int] = List(2, 3)

scala> Seq(1, 2, 3).head
res11: Int = 1
{code}



> Tail method for spark DataFrame
> ---
>
> Key: SPARK-26433
> URL: https://issues.apache.org/jira/browse/SPARK-26433
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Jan Gorecki
>Priority: Major
>
> There is a head method for spark dataframes which work fine but there doesn't 
> seems to be tail method.
> ```
> >>> ans   
> >>>   
> DataFrame[v1: bigint] 
>   
> >>> ans.head(3)   
> >>>  
> [Row(v1=299443), Row(v1=299493), Row(v1=300751)]
> >>> ans.tail(3)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py
> spark/sql/dataframe.py", line 1300, in __getattr__
> "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
> AttributeError: 'DataFrame' object has no attribute 'tail'
> ```
> I would like to feature request Tail method for spark dataframe



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26433) Tail method for spark DataFrame

2018-12-29 Thread Jan Gorecki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730887#comment-16730887
 ] 

Jan Gorecki commented on SPARK-26433:
-

[~hyukjin.kwon] Thank you for your comment but not sure if I understood 
correctly. You mean I should first collect data to client and then extract last 
few rows of dataframe? If so it doesn't seems to be a feasible solution, as 
data in spark are likely to not fit into client machine. `Tail` is exactly the 
operation that one would want to perform BEFORE collecting data to client. 
Could you confirm?

> Tail method for spark DataFrame
> ---
>
> Key: SPARK-26433
> URL: https://issues.apache.org/jira/browse/SPARK-26433
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Jan Gorecki
>Priority: Major
>
> There is a head method for spark dataframes which work fine but there doesn't 
> seems to be tail method.
> ```
> >>> ans   
> >>>   
> DataFrame[v1: bigint] 
>   
> >>> ans.head(3)   
> >>>  
> [Row(v1=299443), Row(v1=299493), Row(v1=300751)]
> >>> ans.tail(3)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py
> spark/sql/dataframe.py", line 1300, in __getattr__
> "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
> AttributeError: 'DataFrame' object has no attribute 'tail'
> ```
> I would like to feature request Tail method for spark dataframe



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26433) Tail method for spark DataFrame

2018-12-28 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730180#comment-16730180
 ] 

Hyukjin Kwon commented on SPARK-26433:
--

You can simply do it after {{collect()}}. Let's avoid to add APIs when 
workarounds are easy. Spark already has a lot of APIs and I think we should 
focus on deprecading and reducing it, and only add APIs when they're absolutely 
worth.

> Tail method for spark DataFrame
> ---
>
> Key: SPARK-26433
> URL: https://issues.apache.org/jira/browse/SPARK-26433
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Jan Gorecki
>Priority: Major
>
> There is a head method for spark dataframes which work fine but there doesn't 
> seems to be tail method.
> ```
> >>> ans   
> >>>   
> DataFrame[v1: bigint] 
>   
> >>> ans.head(3)   
> >>>  
> [Row(v1=299443), Row(v1=299493), Row(v1=300751)]
> >>> ans.tail(3)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py
> spark/sql/dataframe.py", line 1300, in __getattr__
> "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
> AttributeError: 'DataFrame' object has no attribute 'tail'
> ```
> I would like to feature request Tail method for spark dataframe



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org