[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18347 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18347#discussion_r123322259 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala --- @@ -60,5 +70,23 @@ class ConsoleSinkProvider extends StreamSinkProvider with DataSourceRegister { new ConsoleSink(parameters) } + def createRelation( + sqlContext: SQLContext, + mode: SaveMode, + parameters: Map[String, String], + data: DataFrame): BaseRelation = { +// Number of rows to display, by default 20 rows +val numRowsToShow = parameters.get("numRows").map(_.toInt).getOrElse(20) + +// Truncate the displayed data if it is too long, by default it is true +val isTruncated = parameters.get("truncate").map(_.toBoolean).getOrElse(true) + +data.sparkSession.createDataFrame( --- End diff -- You can just call `data.showInternal(numRowsToShow, isTruncated)`. This is a hack in ConsoleSink to avoid using a wrong planner. That's not a problem in the batch DataFrames. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18347#discussion_r123321846 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala --- @@ -51,7 +53,15 @@ class ConsoleSink(options: Map[String, String]) extends Sink with Logging { } } -class ConsoleSinkProvider extends StreamSinkProvider with DataSourceRegister { +case class ConsoleRelation(Context: SQLContext, data: DataFrame) extends BaseRelation { --- End diff -- nit: you can use ``` case class ConsoleRelation(override val sqlContext: SQLContext, data: DataFrame) extends BaseRelation { override def schema: StructType = data.schema } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
Github user lubozhan commented on a diff in the pull request: https://github.com/apache/spark/pull/18347#discussion_r122639236 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -465,6 +465,8 @@ case class DataSource( providingClass.newInstance() match { case dataSource: CreatableRelationProvider => SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode) + case dataSource: ConsoleSinkProvider => +data.show(data.count().toInt, false) --- End diff -- Sorry for late reply. Yes, it is right to use underscore since dataSource is not used. Considering it is no need to create a new ConsoleSink and no access to the private variable, i will use caseInsensitiveOptions instead to extract the numRows and truncate, Thanks for your comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18347#discussion_r122617147 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -465,6 +465,8 @@ case class DataSource( providingClass.newInstance() match { case dataSource: CreatableRelationProvider => SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode) + case dataSource: ConsoleSinkProvider => +data.show(data.count().toInt, false) --- End diff -- `ConsoleSink` [has two options](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/console.scala#L27-L30) that could be used here -- `numRows` and `truncate`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/18347#discussion_r122616876 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -465,6 +465,8 @@ case class DataSource( providingClass.newInstance() match { case dataSource: CreatableRelationProvider => SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode) + case dataSource: ConsoleSinkProvider => --- End diff -- Underscore `dataSource` since it's not used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18347: [SPARK-20599][SS] ConsoleSink should work with (b...
GitHub user lubozhan opened a pull request: https://github.com/apache/spark/pull/18347 [SPARK-20599][SS] ConsoleSink should work with (batch) ## What changes were proposed in this pull request? Currently, if we read a batch and want to display it on the console sink, it will lead a runtime exception. Changes: - In this PR, we add a match rule to check whether it is a ConsoleSinkProvider, we will display the Dataset if using console format. ## How was this patch tested? spark.read.schema().json(path).write.format("console").save You can merge this pull request into a Git repository by running: $ git pull https://github.com/lubozhan/spark dev Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18347.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18347 commit dfd81b22061ab9bcbe5f7b511b929de5d31b636a Author: Lubo Zhang Date: 2017-06-15T07:01:31Z support console for write batch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org