[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-12 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583218#comment-14583218
 ] 

Apache Spark commented on SPARK-7993:
-

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/6784

> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-04 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572334#comment-14572334
 ] 

Apache Spark commented on SPARK-7993:
-

User 'akhilthatipamula' has created a pull request for this issue:
https://github.com/apache/spark/pull/6633

> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-03 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571424#comment-14571424
 ] 

Reynold Xin commented on SPARK-7993:


I think the 2nd way is better, since it is slightly less decoupled from the 
internal types.


> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-03 Thread Akhil Thatipamula (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570704#comment-14570704
 ] 

Akhil Thatipamula commented on SPARK-7993:
--

[~rxin] 
I have come up with 2 methods:
a)We can get the class type of a particlar data cell of a row using 'getClass' 
and if it comes under 'scala.collection', which basically is the set of all 
'container data types' and then act accordingly.
b)We can check whether the data type for a given column is 
primitive[StringType, FloatType, IntegerType, ByteType, ShortType, DoubleType, 
LongType, BinaryType, BooleanType, DateType, DecimalType, TimestampType] 
according to classification given in DataType.scala, and act accordingly.
I have implemented both.
Which is better method?

> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-02 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570358#comment-14570358
 ] 

Reynold Xin commented on SPARK-7993:


that sounds good.

> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-02 Thread Akhil Thatipamula (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568717#comment-14568717
 ] 

Akhil Thatipamula commented on SPARK-7993:
--

I am planning to check whether the data type for a given column is primitive. 
And if it turns out to be non primitive, I am modifying the value of 
string['cell.toString']. 

Is that legitimate?

> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-01 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567887#comment-14567887
 ] 

Reynold Xin commented on SPARK-7993:


Yes would be great to handle those as well. Maybe we can just hande Seq, which 
is a common base data type.


> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-01 Thread Akhil Thatipamula (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567135#comment-14567135
 ] 

Akhil Thatipamula commented on SPARK-7993:
--

[~rxin] Does the 3rd modification effect 'List' as well.
For instance,
++
|modules|
++
|List(mllib, sql, ...|
++
should it be
++
|   modules|
++
| [mllib, sql, ...|
++
?

> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-01 Thread Akhil Thatipamula (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567040#comment-14567040
 ] 

Akhil Thatipamula commented on SPARK-7993:
--

Thanks for mentioning, I will take of care of that.

> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-01 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567036#comment-14567036
 ] 

Reynold Xin commented on SPARK-7993:


Please cc me on your pull request (my github id is @rxin)

> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-01 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567020#comment-14567020
 ] 

Reynold Xin commented on SPARK-7993:


Thanks. Note that once you change the show output, you might need to update 
some Python unit tests since some of the functions use show's output.


> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output

2015-06-01 Thread Akhil Thatipamula (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567017#comment-14567017
 ] 

Akhil Thatipamula commented on SPARK-7993:
--

[~rxin] I will work on this.

> Improve DataFrame.show() output
> ---
>
> Key: SPARK-7993
> URL: https://issues.apache.org/jira/browse/SPARK-7993
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Priority: Blocker
>  Labels: starter
>
> 1. Each column should be at the minimum 3 characters wide. Right now if the 
> widest value is 1, it is just 1 char wide, which looks ugly. Example below:
> 2. If a DataFrame have more than N number of rows (N = 20 by default for 
> show), at the end we should display a message like "only showing the top 20 
> rows".
> {code}
> +--+--+-+
> | a| b|c|
> +--+--+-+
> | 1| 2|3|
> | 1| 2|1|
> | 1| 2|3|
> | 3| 6|3|
> | 1| 2|3|
> | 5|10|1|
> | 1| 2|3|
> | 7|14|3|
> | 1| 2|3|
> | 9|18|1|
> | 1| 2|3|
> |11|22|3|
> | 1| 2|3|
> |13|26|1|
> | 1| 2|3|
> |15|30|3|
> | 1| 2|3|
> |17|34|1|
> | 1| 2|3|
> |19|38|3|
> +--+--+-+
> only showing top 20 rows   < add this at the end
> {code}
> 3. For array values, instead of printing "ArrayBuffer", we should just print 
> square brackets:
> {code}
> +--+--+-+
> |   a_freqItems|   b_freqItems|  c_freqItems|
> +--+--+-+
> |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)|
> +--+--+-+
> {code}
> should be
> {code}
> +---+---+---+
> |a_freqItems|b_freqItems|c_freqItems|
> +---+---+---+
> |[11, 1]|[2, 22]| [1, 3]|
> +---+---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org