[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583218#comment-14583218 ] Apache Spark commented on SPARK-7993: - User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/6784 > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572334#comment-14572334 ] Apache Spark commented on SPARK-7993: - User 'akhilthatipamula' has created a pull request for this issue: https://github.com/apache/spark/pull/6633 > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14571424#comment-14571424 ] Reynold Xin commented on SPARK-7993: I think the 2nd way is better, since it is slightly less decoupled from the internal types. > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570704#comment-14570704 ] Akhil Thatipamula commented on SPARK-7993: -- [~rxin] I have come up with 2 methods: a)We can get the class type of a particlar data cell of a row using 'getClass' and if it comes under 'scala.collection', which basically is the set of all 'container data types' and then act accordingly. b)We can check whether the data type for a given column is primitive[StringType, FloatType, IntegerType, ByteType, ShortType, DoubleType, LongType, BinaryType, BooleanType, DateType, DecimalType, TimestampType] according to classification given in DataType.scala, and act accordingly. I have implemented both. Which is better method? > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570358#comment-14570358 ] Reynold Xin commented on SPARK-7993: that sounds good. > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568717#comment-14568717 ] Akhil Thatipamula commented on SPARK-7993: -- I am planning to check whether the data type for a given column is primitive. And if it turns out to be non primitive, I am modifying the value of string['cell.toString']. Is that legitimate? > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567887#comment-14567887 ] Reynold Xin commented on SPARK-7993: Yes would be great to handle those as well. Maybe we can just hande Seq, which is a common base data type. > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567135#comment-14567135 ] Akhil Thatipamula commented on SPARK-7993: -- [~rxin] Does the 3rd modification effect 'List' as well. For instance, ++ |modules| ++ |List(mllib, sql, ...| ++ should it be ++ | modules| ++ | [mllib, sql, ...| ++ ? > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567040#comment-14567040 ] Akhil Thatipamula commented on SPARK-7993: -- Thanks for mentioning, I will take of care of that. > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567036#comment-14567036 ] Reynold Xin commented on SPARK-7993: Please cc me on your pull request (my github id is @rxin) > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567020#comment-14567020 ] Reynold Xin commented on SPARK-7993: Thanks. Note that once you change the show output, you might need to update some Python unit tests since some of the functions use show's output. > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7993) Improve DataFrame.show() output
[ https://issues.apache.org/jira/browse/SPARK-7993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567017#comment-14567017 ] Akhil Thatipamula commented on SPARK-7993: -- [~rxin] I will work on this. > Improve DataFrame.show() output > --- > > Key: SPARK-7993 > URL: https://issues.apache.org/jira/browse/SPARK-7993 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Priority: Blocker > Labels: starter > > 1. Each column should be at the minimum 3 characters wide. Right now if the > widest value is 1, it is just 1 char wide, which looks ugly. Example below: > 2. If a DataFrame have more than N number of rows (N = 20 by default for > show), at the end we should display a message like "only showing the top 20 > rows". > {code} > +--+--+-+ > | a| b|c| > +--+--+-+ > | 1| 2|3| > | 1| 2|1| > | 1| 2|3| > | 3| 6|3| > | 1| 2|3| > | 5|10|1| > | 1| 2|3| > | 7|14|3| > | 1| 2|3| > | 9|18|1| > | 1| 2|3| > |11|22|3| > | 1| 2|3| > |13|26|1| > | 1| 2|3| > |15|30|3| > | 1| 2|3| > |17|34|1| > | 1| 2|3| > |19|38|3| > +--+--+-+ > only showing top 20 rows < add this at the end > {code} > 3. For array values, instead of printing "ArrayBuffer", we should just print > square brackets: > {code} > +--+--+-+ > | a_freqItems| b_freqItems| c_freqItems| > +--+--+-+ > |ArrayBuffer(11, 1)|ArrayBuffer(2, 22)|ArrayBuffer(1, 3)| > +--+--+-+ > {code} > should be > {code} > +---+---+---+ > |a_freqItems|b_freqItems|c_freqItems| > +---+---+---+ > |[11, 1]|[2, 22]| [1, 3]| > +---+---+---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org