[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

Akhil Thatipamula (JIRA) Wed, 03 Jun 2015 04:49:45 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570704#comment-14570704
 ]


Akhil Thatipamula commented on SPARK-7993:
------------------------------------------

[~rxin] 
I have come up with 2 methods:
a)We can get the class type of a particlar data cell of a row using 'getClass' 
and if it comes under 'scala.collection', which basically is the set of all 
'container data types' and then act accordingly.
b)We can check whether the data type for a given column is 
primitive[StringType, FloatType, IntegerType, ByteType, ShortType, DoubleType, 
LongType, BinaryType, BooleanType, DateType, DecimalType, TimestampType] 
according to classification given in DataType.scala, and act accordingly.
I have implemented both.
Which is better method?

> Improve DataFrame.show() output
> -------------------------------
>
>                 Key: SPARK-7993
>                 URL: https://issues.apache.org/jira/browse/SPARK-7993
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>            Reporter: Reynold Xin
>            Priority: Blocker
>              Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   <---- add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +------------------+------------------+-----------------+
> |       a_freqItems|       b_freqItems|      c_freqItems|
> +------------------+------------------+-----------------+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +------------------+------------------+-----------------+
> {code}
> should be
> {code}
> +-----------+-----------+-----------+
> |a_freqItems|b_freqItems|c_freqItems|
> +-----------+-----------+-----------+
> |    [11, 1]|    [2, 22]|     [1, 3]|
> +-----------+-----------+-----------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

Reply via email to