[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support a vertical display mod...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/17733#discussion_r113377134 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -277,43 +279,73 @@ class Dataset[T] private[sql]( val sb = new StringBuilder val numCols = schema.fieldNames.length +// We set a minimum column width at '3' +val minimumColWidth = 3 -// Initialise the width of each column to a minimum value of '3' -val colWidths = Array.fill(numCols)(3) +if (!vertical) { + // Initialise the width of each column to a minimum value + val colWidths = Array.fill(numCols)(minimumColWidth) -// Compute the width of each column -for (row <- rows) { - for ((cell, i) <- row.zipWithIndex) { -colWidths(i) = math.max(colWidths(i), cell.length) - } -} - -// Create SeparateLine -val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() - -// column names -rows.head.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell, colWidths(i)) - } else { -StringUtils.rightPad(cell, colWidths(i)) + // Compute the width of each column + for (row <- rows) { +for ((cell, i) <- row.zipWithIndex) { + colWidths(i) = math.max(colWidths(i), cell.length) +} } -}.addString(sb, "|", "|", "|\n") -sb.append(sep) + // Create SeparateLine + val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() -// data -rows.tail.map { - _.zipWithIndex.map { case (cell, i) => + // column names + rows.head.zipWithIndex.map { case (cell, i) => if (truncate > 0) { - StringUtils.leftPad(cell.toString, colWidths(i)) + StringUtils.leftPad(cell, colWidths(i)) } else { - StringUtils.rightPad(cell.toString, colWidths(i)) + StringUtils.rightPad(cell, colWidths(i)) } }.addString(sb, "|", "|", "|\n") -} -sb.append(sep) + sb.append(sep) + + // data + rows.tail.foreach { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell.toString, colWidths(i)) + } else { +StringUtils.rightPad(cell.toString, colWidths(i)) + } +}.addString(sb, "|", "|", "|\n") + } + + sb.append(sep) +} else { + // Extended display mode enabled + val fieldNames = rows.head + val dataRows = rows.tail + + // Compute the width of field name and data columns + val fieldNameColWidth = fieldNames.foldLeft(minimumColWidth) { case (curMax, fieldName) => +math.max(curMax, fieldName.length) + } + val dataColWidth = dataRows.foldLeft(minimumColWidth) { case (curMax, row) => +math.max(curMax, row.map(_.length).reduceLeftOption[Int] { case (cellMax, cell) => + math.max(cellMax, cell) +}.getOrElse(0)) + } + + dataRows.zipWithIndex.foreach { case (row, i) => --- End diff -- +1 for indicating that result set is empty. About displaying column names if the output is empty, perhaps it'd be best to stick with the postgres/mysql semantics. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support a vertical display mod...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17733#discussion_r113376654 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -277,43 +279,73 @@ class Dataset[T] private[sql]( val sb = new StringBuilder val numCols = schema.fieldNames.length +// We set a minimum column width at '3' +val minimumColWidth = 3 -// Initialise the width of each column to a minimum value of '3' -val colWidths = Array.fill(numCols)(3) +if (!vertical) { + // Initialise the width of each column to a minimum value + val colWidths = Array.fill(numCols)(minimumColWidth) -// Compute the width of each column -for (row <- rows) { - for ((cell, i) <- row.zipWithIndex) { -colWidths(i) = math.max(colWidths(i), cell.length) - } -} - -// Create SeparateLine -val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() - -// column names -rows.head.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell, colWidths(i)) - } else { -StringUtils.rightPad(cell, colWidths(i)) + // Compute the width of each column + for (row <- rows) { +for ((cell, i) <- row.zipWithIndex) { + colWidths(i) = math.max(colWidths(i), cell.length) +} } -}.addString(sb, "|", "|", "|\n") -sb.append(sep) + // Create SeparateLine + val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() -// data -rows.tail.map { - _.zipWithIndex.map { case (cell, i) => + // column names + rows.head.zipWithIndex.map { case (cell, i) => if (truncate > 0) { - StringUtils.leftPad(cell.toString, colWidths(i)) + StringUtils.leftPad(cell, colWidths(i)) } else { - StringUtils.rightPad(cell.toString, colWidths(i)) + StringUtils.rightPad(cell, colWidths(i)) } }.addString(sb, "|", "|", "|\n") -} -sb.append(sep) + sb.append(sep) + + // data + rows.tail.foreach { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell.toString, colWidths(i)) + } else { +StringUtils.rightPad(cell.toString, colWidths(i)) + } +}.addString(sb, "|", "|", "|\n") + } + + sb.append(sep) +} else { + // Extended display mode enabled + val fieldNames = rows.head + val dataRows = rows.tail + + // Compute the width of field name and data columns + val fieldNameColWidth = fieldNames.foldLeft(minimumColWidth) { case (curMax, fieldName) => +math.max(curMax, fieldName.length) + } + val dataColWidth = dataRows.foldLeft(minimumColWidth) { case (curMax, row) => +math.max(curMax, row.map(_.length).reduceLeftOption[Int] { case (cellMax, cell) => + math.max(cellMax, cell) +}.getOrElse(0)) + } + + dataRows.zipWithIndex.foreach { case (row, i) => --- End diff -- Aha, I see. I'll update. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support a vertical display mod...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17733#discussion_r113375813 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -277,43 +279,73 @@ class Dataset[T] private[sql]( val sb = new StringBuilder val numCols = schema.fieldNames.length +// We set a minimum column width at '3' +val minimumColWidth = 3 -// Initialise the width of each column to a minimum value of '3' -val colWidths = Array.fill(numCols)(3) +if (!vertical) { + // Initialise the width of each column to a minimum value + val colWidths = Array.fill(numCols)(minimumColWidth) -// Compute the width of each column -for (row <- rows) { - for ((cell, i) <- row.zipWithIndex) { -colWidths(i) = math.max(colWidths(i), cell.length) - } -} - -// Create SeparateLine -val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() - -// column names -rows.head.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell, colWidths(i)) - } else { -StringUtils.rightPad(cell, colWidths(i)) + // Compute the width of each column + for (row <- rows) { +for ((cell, i) <- row.zipWithIndex) { + colWidths(i) = math.max(colWidths(i), cell.length) +} } -}.addString(sb, "|", "|", "|\n") -sb.append(sep) + // Create SeparateLine + val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() -// data -rows.tail.map { - _.zipWithIndex.map { case (cell, i) => + // column names + rows.head.zipWithIndex.map { case (cell, i) => if (truncate > 0) { - StringUtils.leftPad(cell.toString, colWidths(i)) + StringUtils.leftPad(cell, colWidths(i)) } else { - StringUtils.rightPad(cell.toString, colWidths(i)) + StringUtils.rightPad(cell, colWidths(i)) } }.addString(sb, "|", "|", "|\n") -} -sb.append(sep) + sb.append(sep) + + // data + rows.tail.foreach { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell.toString, colWidths(i)) + } else { +StringUtils.rightPad(cell.toString, colWidths(i)) + } +}.addString(sb, "|", "|", "|\n") + } + + sb.append(sep) +} else { + // Extended display mode enabled + val fieldNames = rows.head + val dataRows = rows.tail + + // Compute the width of field name and data columns + val fieldNameColWidth = fieldNames.foldLeft(minimumColWidth) { case (curMax, fieldName) => +math.max(curMax, fieldName.length) + } + val dataColWidth = dataRows.foldLeft(minimumColWidth) { case (curMax, row) => +math.max(curMax, row.map(_.length).reduceLeftOption[Int] { case (cellMax, cell) => + math.max(cellMax, cell) +}.getOrElse(0)) + } + + dataRows.zipWithIndex.foreach { case (row, i) => --- End diff -- Now, in this PR, we output nothing in this case. Postgres and MySQL at least output the message to indicate the result set is empty. cc @cloud-fan @sameeragarwal @hvanhovell @rxin Any suggestion here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17640#discussion_r113372131 --- Diff: R/pkg/R/serialize.R --- @@ -83,6 +83,7 @@ writeObject <- function(con, object, writeType = TRUE) { Date = writeDate(con, object), POSIXlt = writeTime(con, object), POSIXct = writeTime(con, object), + bigint = writeDouble(con, object), --- End diff -- and another thing, there is no `bigint` in R so I'm not sure how we would hit this path --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17640#discussion_r113370732 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -3043,6 +3043,23 @@ test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", { expect_equal(dbs[[1]], "default") }) +test_that("dapply with bigint type", { + df <- createDataFrame( +list(list(1380742793415240, 1, "1"), list(1380742793415240, 2, "2"), +list(1380742793415240, 3, "3")), c("a", "b", "c")) + schema <- structType(structField("a", "bigint"), structField("b", "bigint"), --- End diff -- actually, I'm not sure. Walking through the code, `createDataFrame` calls `parallelize` which eventually calls R's `serialize`. Since these big values are actually numeric, and not integer, `serialize` writes them in that way --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17760: [SPARK-20439] [SQL] [Backport-2.1] Fix Catalog AP...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/17760 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17649: [SPARK-20380][SQL] Output table comment for DESC ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17649#discussion_r113373050 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala --- @@ -295,7 +295,9 @@ class InMemoryCatalog( assert(tableDefinition.identifier.database.isDefined) val db = tableDefinition.identifier.database.get requireTableExists(db, tableDefinition.identifier.table) -catalog(db).tables(tableDefinition.identifier.table).table = tableDefinition +val updatedProperties = tableDefinition.properties.filter(kv => kv._1 != "comment") +val newTableDefinition = tableDefinition.copy(properties = updatedProperties) +catalog(db).tables(tableDefinition.identifier.table).table = newTableDefinition --- End diff -- This only fixes the issue in InMemoryCatalog. We still have a hole in HiveExternalCatalog. Could you move the fixes to `AlterTableSetPropertiesCommand `? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17649 @wzhfy In HiveClientImpl.scala, we follow Hive and consider the case sensitivity of property key `comment`. @sujith71955 Could you resolve the following comment? https://github.com/apache/spark/pull/17649#discussion_r112824524 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support a vertical display mod...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17733#discussion_r113371912 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -277,43 +279,73 @@ class Dataset[T] private[sql]( val sb = new StringBuilder val numCols = schema.fieldNames.length +// We set a minimum column width at '3' +val minimumColWidth = 3 -// Initialise the width of each column to a minimum value of '3' -val colWidths = Array.fill(numCols)(3) +if (!vertical) { + // Initialise the width of each column to a minimum value + val colWidths = Array.fill(numCols)(minimumColWidth) -// Compute the width of each column -for (row <- rows) { - for ((cell, i) <- row.zipWithIndex) { -colWidths(i) = math.max(colWidths(i), cell.length) - } -} - -// Create SeparateLine -val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() - -// column names -rows.head.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell, colWidths(i)) - } else { -StringUtils.rightPad(cell, colWidths(i)) + // Compute the width of each column + for (row <- rows) { +for ((cell, i) <- row.zipWithIndex) { + colWidths(i) = math.max(colWidths(i), cell.length) +} } -}.addString(sb, "|", "|", "|\n") -sb.append(sep) + // Create SeparateLine + val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() -// data -rows.tail.map { - _.zipWithIndex.map { case (cell, i) => + // column names + rows.head.zipWithIndex.map { case (cell, i) => if (truncate > 0) { - StringUtils.leftPad(cell.toString, colWidths(i)) + StringUtils.leftPad(cell, colWidths(i)) } else { - StringUtils.rightPad(cell.toString, colWidths(i)) + StringUtils.rightPad(cell, colWidths(i)) } }.addString(sb, "|", "|", "|\n") -} -sb.append(sep) + sb.append(sep) + + // data + rows.tail.foreach { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell.toString, colWidths(i)) + } else { +StringUtils.rightPad(cell.toString, colWidths(i)) + } +}.addString(sb, "|", "|", "|\n") + } + + sb.append(sep) +} else { + // Extended display mode enabled + val fieldNames = rows.head + val dataRows = rows.tail + + // Compute the width of field name and data columns + val fieldNameColWidth = fieldNames.foldLeft(minimumColWidth) { case (curMax, fieldName) => +math.max(curMax, fieldName.length) + } + val dataColWidth = dataRows.foldLeft(minimumColWidth) { case (curMax, row) => +math.max(curMax, row.map(_.length).reduceLeftOption[Int] { case (cellMax, cell) => + math.max(cellMax, cell) +}.getOrElse(0)) + } + + dataRows.zipWithIndex.foreach { case (row, i) => --- End diff -- I checked and found that both pg and mysql output no column name in the mode; ``` // pg postgres=# create table t(a INT, b TEXT); CREATE TABLE postgres=# select * from t; a | b ---+--- (0 rows) postgres=# \x Expanded display is on. postgres=# select * from t; (0 rows) // mysql mysql -u root --vertical mysql> create table t(a INT, b TEXT); Query OK, 0 rows affected (0.04 sec) mysql> select * from t; Empty set (0.00 sec) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17596 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17596 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76169/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17596 **[Test build #76169 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76169/testReport)** for PR 17596 at commit [`4df95f2`](https://github.com/apache/spark/commit/4df95f2abc28b362b8330f11efeb801ca00f2f6e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support a vertical display mod...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17733#discussion_r113370773 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -277,43 +279,73 @@ class Dataset[T] private[sql]( val sb = new StringBuilder val numCols = schema.fieldNames.length +// We set a minimum column width at '3' +val minimumColWidth = 3 -// Initialise the width of each column to a minimum value of '3' -val colWidths = Array.fill(numCols)(3) +if (!vertical) { + // Initialise the width of each column to a minimum value + val colWidths = Array.fill(numCols)(minimumColWidth) -// Compute the width of each column -for (row <- rows) { - for ((cell, i) <- row.zipWithIndex) { -colWidths(i) = math.max(colWidths(i), cell.length) - } -} - -// Create SeparateLine -val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() - -// column names -rows.head.zipWithIndex.map { case (cell, i) => - if (truncate > 0) { -StringUtils.leftPad(cell, colWidths(i)) - } else { -StringUtils.rightPad(cell, colWidths(i)) + // Compute the width of each column + for (row <- rows) { +for ((cell, i) <- row.zipWithIndex) { + colWidths(i) = math.max(colWidths(i), cell.length) +} } -}.addString(sb, "|", "|", "|\n") -sb.append(sep) + // Create SeparateLine + val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", "+\n").toString() -// data -rows.tail.map { - _.zipWithIndex.map { case (cell, i) => + // column names + rows.head.zipWithIndex.map { case (cell, i) => if (truncate > 0) { - StringUtils.leftPad(cell.toString, colWidths(i)) + StringUtils.leftPad(cell, colWidths(i)) } else { - StringUtils.rightPad(cell.toString, colWidths(i)) + StringUtils.rightPad(cell, colWidths(i)) } }.addString(sb, "|", "|", "|\n") -} -sb.append(sep) + sb.append(sep) + + // data + rows.tail.foreach { +_.zipWithIndex.map { case (cell, i) => + if (truncate > 0) { +StringUtils.leftPad(cell.toString, colWidths(i)) + } else { +StringUtils.rightPad(cell.toString, colWidths(i)) + } +}.addString(sb, "|", "|", "|\n") + } + + sb.append(sep) +} else { + // Extended display mode enabled + val fieldNames = rows.head + val dataRows = rows.tail + + // Compute the width of field name and data columns + val fieldNameColWidth = fieldNames.foldLeft(minimumColWidth) { case (curMax, fieldName) => +math.max(curMax, fieldName.length) + } + val dataColWidth = dataRows.foldLeft(minimumColWidth) { case (curMax, row) => +math.max(curMax, row.map(_.length).reduceLeftOption[Int] { case (cellMax, cell) => + math.max(cellMax, cell) +}.getOrElse(0)) + } + + dataRows.zipWithIndex.foreach { case (row, i) => --- End diff -- When no row exists, we at least need to output the column names. ```Scala df.limit(0).show(20, 0, true) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17758 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76167/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17758 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17758 **[Test build #76167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76167/testReport)** for PR 17758 at commit [`05a7a61`](https://github.com/apache/spark/commit/05a7a61259d87b8fa97214c96cedde9dc52dd3ec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17768: [SPARK-20465][CORE] Throws a proper exception when any t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17768 @joshrosen, could you take a look and see if it makes sense? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17768: [SPARK-20465][CORE] Throws a proper exception when any t...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17768 This actually also happens when the directory exists but the user does not have the permission. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17768: [SPARK-20465][CORE] Throws a proper exception when any t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17768 **[Test build #76172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76172/testReport)** for PR 17768 at commit [`bf21e3b`](https://github.com/apache/spark/commit/bf21e3bef93cd865744c603e373aef2916b2ce79). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17768: [SPARK-20465][CORE] Throws a proper exception when any t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17768 **[Test build #76171 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76171/testReport)** for PR 17768 at commit [`b9ce248`](https://github.com/apache/spark/commit/b9ce24832dcd0b91e70026cf71d379cd99f26ead). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17737 @holdenk do you have bandwidth to review this or ok with me pushing this to master? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17130#discussion_r113366111 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -268,12 +269,8 @@ class FPGrowthModel private[ml] ( val predictUDF = udf((items: Seq[_]) => { if (items != null) { val itemset = items.toSet -brRules.value.flatMap(rule => - if (items != null && rule._1.forall(item => itemset.contains(item))) { -rule._2.filter(item => !itemset.contains(item)) - } else { -Seq.empty - }).distinct +brRules.value.filter(_._1.forall(itemset.contains)) + .flatMap(_._2.filter(!itemset.contains(_))).distinct --- End diff -- right, 2 things - first just calling out while the PR says doc changes there is this one code change here. second, before this code was checking `items != null` do we need not consider that now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17728 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17768: [SPARK-20465][CORE] Throws a proper exception whe...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/17768 [SPARK-20465][CORE] Throws a proper exception when any temp directory could not be got/created (rather than ArrayIndexOutOfBoundsException) ## What changes were proposed in this pull request? This PR proposes to throw an exception with better message rather than `ArrayIndexOutOfBoundsException` when temp directories could not be created. **Before** ``` ./bin/spark-shell --conf spark.local.dir=/NONEXISTENT_DIR_ONE,/NONEXISTENT_DIR_TWO ``` ``` Exception in thread "main" java.lang.ExceptionInInitializerError ... Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 ... ``` **After** ``` Exception in thread "main" java.lang.ExceptionInInitializerError ... Caused by: java.io.IOException: Failed to get a temp directory under [/NONEXISTENT_DIR_ONE,/NONEXISTENT_DIR_TWO]. ... ``` ## How was this patch tested? Unit tests in `LocalDirsSuite.scala`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark throws-temp-dir-exception Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17768.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17768 commit 4500c2f2d989bfc1e76e07cbd38a9acbd384d3d5 Author: hyukjinkwon Date: 2017-04-26T04:52:33Z Throws a proper exception rather than ArrayIndexOutOfBoundsException when temp directories could not be got/created --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17728 merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17191: [SPARK-14471][SQL] Aliases in SELECT could be used in GR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17191 **[Test build #76170 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76170/testReport)** for PR 17191 at commit [`7b32f46`](https://github.com/apache/spark/commit/7b32f46b1dd83007f066ebcc4dc92a48da6ca89a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17191: [SPARK-14471][SQL] Aliases in SELECT could be use...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17191#discussion_r113365310 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -136,6 +136,7 @@ class Analyzer( ResolveGroupingAnalytics :: ResolvePivot :: ResolveOrdinalInOrderByAndGroupBy :: + ResolveAggAliasInGroupBy :: --- End diff -- aha, ok. I'll move there. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17757 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76168/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17757 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17757 **[Test build #76168 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76168/testReport)** for PR 17757 at commit [`a87d5c0`](https://github.com/apache/spark/commit/a87d5c0c578542916706745cdbeca58ae24269e8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17596 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76157/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17596 **[Test build #76157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76157/testReport)** for PR 17596 at commit [`7f41155`](https://github.com/apache/spark/commit/7f41155bf5c02485c5606f874c327a9330cb2c9f). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17596 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17757#discussion_r113362917 --- Diff: R/pkg/inst/tests/testthat/test_mllib_classification.R --- @@ -284,22 +284,11 @@ test_that("spark.mlp", { c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0")) # test initialWeights - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights = + model <- spark.mlp(df, label ~ features, layers = c(4, 3), initialWeights = c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9)) mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0")) - - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights = -c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 9.0)) - mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) - expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0")) - - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2) - mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) - expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "2.0", "1.0", "0.0")) + c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0")) --- End diff -- checking more closely it looks like earlier tests do call `predict`. I'm good with simplifying this part of the test with weights. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17757#discussion_r113362748 --- Diff: R/pkg/inst/tests/testthat/test_mllib_classification.R --- @@ -284,22 +284,11 @@ test_that("spark.mlp", { c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0")) # test initialWeights - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights = + model <- spark.mlp(df, label ~ features, layers = c(4, 3), initialWeights = c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9)) mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0")) - - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights = -c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 9.0)) - mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) - expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0")) - - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2) - mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) - expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "2.0", "1.0", "0.0")) + c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0")) --- End diff -- I got the uncoverged test with the maxIter. My main concern at this end is to at least exercise calling from R to JVM for each public API we export (ie. by calling `predict` on the MLP model) - we have had issues in the past the API never works and/or it is broken and we don't know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17733 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17733 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76166/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17733 **[Test build #76166 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76166/testReport)** for PR 17733 at commit [`f696d35`](https://github.com/apache/spark/commit/f696d357ae9aa2e850f82d408aa413750c4d84b8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17640#discussion_r113362108 --- Diff: R/pkg/inst/tests/testthat/test_Serde.R --- @@ -28,6 +28,10 @@ test_that("SerDe of primitive types", { expect_equal(x, 1) expect_equal(class(x), "numeric") + x <- callJStatic("SparkRHandler", "echo", 1380742793415240) --- End diff -- I did some google search. R can't specify `bigint` type. So, we can't directly test `bigint` type. We can remove the tests above, as we added `schema` tests and scala API tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17503 @srowen I am not sure whether I understand your question clearly. RandomForest uses LearningNode to construct tree model when training, and convert them to Leaf or InternalNode at last. Hence, all nodes are same type and can be merged when training. However, if two children of a node output same prediction, does the node keep step with its children? I don't know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/17757#discussion_r113361559 --- Diff: R/pkg/inst/tests/testthat/test_mllib_classification.R --- @@ -284,22 +284,11 @@ test_that("spark.mlp", { c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0")) # test initialWeights - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights = + model <- spark.mlp(df, label ~ features, layers = c(4, 3), initialWeights = c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9)) mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0")) - - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, initialWeights = -c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 9.0, 9.0)) - mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) - expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", "1.0", "0.0")) - - model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2) - mlpPredictions <- collect(select(predict(model, mlpTestDF), "prediction")) - expect_equal(head(mlpPredictions$prediction, 10), - c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "2.0", "1.0", "0.0")) + c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", "1.0", "0.0")) --- End diff -- Yeah, here we just removed the unconverged test(with ```maxIter = 2```), since we can't guarantee any equality during the iteration. I think the best way to test the api works well is to check number of iterations. If we set proper initial weights, the number of iterations to converge would be different from other initial weights or no initial weights. Let's open a separate JIRA to expose training summary for MLP at MLlib side, and then we can expose them at SparkR and add check here. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17191: [SPARK-14471][SQL] Aliases in SELECT could be use...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17191#discussion_r113360559 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -136,6 +136,7 @@ class Analyzer( ResolveGroupingAnalytics :: ResolvePivot :: ResolveOrdinalInOrderByAndGroupBy :: + ResolveAggAliasInGroupBy :: --- End diff -- we have a `postHocResolutionRules` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...
Github user facaiy commented on a diff in the pull request: https://github.com/apache/spark/pull/17503#discussion_r113360409 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala --- @@ -61,6 +61,8 @@ import org.apache.spark.mllib.tree.impurity.{Entropy, Gini, Impurity, Variance} * @param subsamplingRate Fraction of the training data used for learning decision tree. * @param useNodeIdCache If this is true, instead of passing trees to executors, the algorithm will * maintain a separate RDD of node Id cache for each row. + * @param canMergeChildren Merge pairs of leaf nodes of the same parent which --- End diff -- A new parameter is added in Strategy class, which fails Mima tests. How to deal with it? ```bash [error] * synthetic method $default$13()Int in object org.apache.spark.mllib.tree.configuration.Strategy has a different result type in current version, where it is Boolean rather than Int ``` [see failed logs](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3675/consoleFull) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16618: [SPARK-14409][ML][WIP] Add RankingEvaluator
Github user ebernhardson commented on a diff in the pull request: https://github.com/apache/spark/pull/16618#discussion_r113360277 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/RankingMetrics.scala --- @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.ml.evaluation + +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.sql.{Column, DataFrame} +import org.apache.spark.sql.functions.{mean, sum} +import org.apache.spark.sql.functions.udf +import org.apache.spark.sql.types.DoubleType + +@Since("2.2.0") +class RankingMetrics( + predictionAndObservations: DataFrame, predictionCol: String, labelCol: String) + extends Logging with Serializable { + + /** + * Compute the Mean Percentile Rank (MPR) of all the queries. + * + * See the following paper for detail ("Expected percentile rank" in the paper): + * Hu, Y., Y. Koren, and C. Volinsky. âCollaborative Filtering for Implicit Feedback Datasets.â + * In 2008 Eighth IEEE International Conference on Data Mining, 263â72, 2008. + * doi:10.1109/ICDM.2008.22. + * + * @return the mean percentile rank + */ + lazy val meanPercentileRank: Double = { + +def rank = udf((predicted: Seq[Any], actual: Any) => { + val l_i = predicted.indexOf(actual) + + if (l_i == -1) { +1 + } else { +l_i.toDouble / predicted.size + } +}, DoubleType) + +val R_prime = predictionAndObservations.count() +val predictionColumn: Column = predictionAndObservations.col(predictionCol) +val labelColumn: Column = predictionAndObservations.col(labelCol) + +val rankSum: Double = predictionAndObservations + .withColumn("rank", rank(predictionColumn, labelColumn)) + .agg(sum("rank")).first().getDouble(0) + +rankSum / R_prime + } + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n is less than k) results, the precision + * value will be computed as #(relevant items retrieved) / k. This formula also applies when + * the size of the ground truth set is less than k. + * + * If a query has an empty ground truth set, zero will be used as precision together with + * a log warning. + * + * See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated precision, must be positive + * @return the average precision at the first k ranking positions + */ + @Since("2.2.0") + def precisionAt(k: Int): Double = { +require(k > 0, "ranking position k should be positive") + +def precisionAtK = udf((predicted: Seq[Any], actual: Seq[Any]) => { + val actualSet = actual.toSet + if (actualSet.nonEmpty) { +val n = math.min(predicted.length, k) +var i = 0 +var cnt = 0 +while (i < n) { + if (actualSet.contains(predicted(i))) { +cnt += 1 + } + i += 1 +} +cnt.toDouble / k + } else { +logWarning("Empty ground truth set, check input data") +0.0 + } +}, DoubleType) + +val predictionColumn: Column = predictionAndObservations.col(predictionCol) +val labelColumn: Column = predictionAndObservations.col(labelCol) + +predictionAndObservations + .withColumn("predictionAtK", precisionAtK(predictionColumn, labelColumn)) + .agg(mean("predictionAtK")).first().getDouble(0) + } + + /** + * Returns the mean average precision (MAP) of all the queries. + * If a query has an empty ground truth set
[GitHub] spark pull request #16618: [SPARK-14409][ML][WIP] Add RankingEvaluator
Github user ebernhardson commented on a diff in the pull request: https://github.com/apache/spark/pull/16618#discussion_r113358325 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/RankingEvaluator.scala --- @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.ml.evaluation + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.param.{IntParam, Param, ParamMap, ParamValidators} +import org.apache.spark.ml.param.shared.{HasLabelCol, HasPredictionCol} +import org.apache.spark.ml.util.{DefaultParamsReadable, DefaultParamsWritable, Identifiable, SchemaUtils} +import org.apache.spark.sql.{DataFrame, Dataset} +import org.apache.spark.sql.expressions.Window +import org.apache.spark.sql.functions.{coalesce, col, collect_list, row_number, udf} +import org.apache.spark.sql.types.LongType + +/** + * Evaluator for ranking. + */ +@Since("2.2.0") +@Experimental +final class RankingEvaluator @Since("2.2.0")(@Since("2.2.0") override val uid: String) + extends Evaluator with HasPredictionCol with HasLabelCol with DefaultParamsWritable { + + @Since("2.2.0") + def this() = this(Identifiable.randomUID("rankingEval")) + + @Since("2.2.0") + val k = new IntParam(this, "k", "Top-K cutoff", (x: Int) => x > 0) + + /** @group getParam */ + @Since("2.2.0") + def getK: Int = $(k) + + /** @group setParam */ + @Since("2.2.0") + def setK(value: Int): this.type = set(k, value) + + setDefault(k -> 1) + + @Since("2.2.0") + val metricName: Param[String] = { +val allowedParams = ParamValidators.inArray(Array("mpr")) +new Param(this, "metricName", "metric name in evaluation (mpr)", allowedParams) + } + + /** @group getParam */ + @Since("2.2.0") + def getMetricName: String = $(metricName) + + /** @group setParam */ + @Since("2.2.0") + def setMetricName(value: String): this.type = set(metricName, value) + + /** @group setParam */ + @Since("2.2.0") + def setPredictionCol(value: String): this.type = set(predictionCol, value) + + /** @group setParam */ + @Since("2.2.0") + def setLabelCol(value: String): this.type = set(labelCol, value) + + /** + * Param for query column name. + * @group param + */ + val queryCol: Param[String] = new Param[String](this, "queryCol", "query column name") + + setDefault(queryCol, "query") + + /** @group getParam */ + @Since("2.2.0") + def getQueryCol: String = $(queryCol) + + /** @group setParam */ + @Since("2.2.0") + def setQueryCol(value: String): this.type = set(queryCol, value) + + setDefault(metricName -> "mpr") + + @Since("2.2.0") + override def evaluate(dataset: Dataset[_]): Double = { +val schema = dataset.schema +SchemaUtils.checkNumericType(schema, $(predictionCol)) +SchemaUtils.checkNumericType(schema, $(labelCol)) +SchemaUtils.checkNumericType(schema, $(queryCol)) + +val w = Window.partitionBy(col($(queryCol))).orderBy(col($(predictionCol)).desc) + +val topAtk: DataFrame = dataset + .na.drop("all", Seq($(predictionCol))) + .select(col($(predictionCol)), col($(labelCol)).cast(LongType), col($(queryCol))) + .withColumn("rn", row_number().over(w)).where(col("rn") <= $(k)) + .drop("rn") + .groupBy(col($(queryCol))) + .agg(collect_list($(labelCol)).as("topAtk")) + +val mapToEmptyArray_ = udf(() => Array.empty[Long]) + +val predictionAndLabels: DataFrame = dataset + .join(topAtk, Seq($(queryCol)), "outer") + .withColumn("topAtk", coalesce(col("topAtk"), mapToEmptyArray_())) + .select($(labelCol), "topAtk") --- End diff -- Don't we also need to run an aggregation on the label column, roughly the same as the previous aggregation but using labelCol as the sort inst
[GitHub] spark pull request #16618: [SPARK-14409][ML][WIP] Add RankingEvaluator
Github user ebernhardson commented on a diff in the pull request: https://github.com/apache/spark/pull/16618#discussion_r113355473 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/RankingMetrics.scala --- @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.ml.evaluation + +import org.apache.spark.annotation.Since +import org.apache.spark.internal.Logging +import org.apache.spark.sql.{Column, DataFrame} +import org.apache.spark.sql.functions.{mean, sum} +import org.apache.spark.sql.functions.udf +import org.apache.spark.sql.types.DoubleType + +@Since("2.2.0") +class RankingMetrics( + predictionAndObservations: DataFrame, predictionCol: String, labelCol: String) + extends Logging with Serializable { + + /** + * Compute the Mean Percentile Rank (MPR) of all the queries. + * + * See the following paper for detail ("Expected percentile rank" in the paper): + * Hu, Y., Y. Koren, and C. Volinsky. âCollaborative Filtering for Implicit Feedback Datasets.â + * In 2008 Eighth IEEE International Conference on Data Mining, 263â72, 2008. + * doi:10.1109/ICDM.2008.22. + * + * @return the mean percentile rank + */ + lazy val meanPercentileRank: Double = { + +def rank = udf((predicted: Seq[Any], actual: Any) => { + val l_i = predicted.indexOf(actual) + + if (l_i == -1) { +1 + } else { +l_i.toDouble / predicted.size + } +}, DoubleType) + +val R_prime = predictionAndObservations.count() +val predictionColumn: Column = predictionAndObservations.col(predictionCol) +val labelColumn: Column = predictionAndObservations.col(labelCol) + +val rankSum: Double = predictionAndObservations + .withColumn("rank", rank(predictionColumn, labelColumn)) + .agg(sum("rank")).first().getDouble(0) + +rankSum / R_prime + } + + /** + * Compute the average precision of all the queries, truncated at ranking position k. + * + * If for a query, the ranking algorithm returns n (n is less than k) results, the precision + * value will be computed as #(relevant items retrieved) / k. This formula also applies when + * the size of the ground truth set is less than k. + * + * If a query has an empty ground truth set, zero will be used as precision together with + * a log warning. + * + * See the following paper for detail: + * + * IR evaluation methods for retrieving highly relevant documents. K. Jarvelin and J. Kekalainen + * + * @param k the position to compute the truncated precision, must be positive + * @return the average precision at the first k ranking positions + */ + @Since("2.2.0") + def precisionAt(k: Int): Double = { +require(k > 0, "ranking position k should be positive") + +def precisionAtK = udf((predicted: Seq[Any], actual: Seq[Any]) => { + val actualSet = actual.toSet + if (actualSet.nonEmpty) { +val n = math.min(predicted.length, k) +var i = 0 +var cnt = 0 +while (i < n) { + if (actualSet.contains(predicted(i))) { +cnt += 1 + } + i += 1 +} +cnt.toDouble / k + } else { +logWarning("Empty ground truth set, check input data") +0.0 + } +}, DoubleType) + +val predictionColumn: Column = predictionAndObservations.col(predictionCol) +val labelColumn: Column = predictionAndObservations.col(labelCol) + +predictionAndObservations + .withColumn("predictionAtK", precisionAtK(predictionColumn, labelColumn)) + .agg(mean("predictionAtK")).first().getDouble(0) + } + + /** + * Returns the mean average precision (MAP) of all the queries. + * If a query has an empty ground truth set
[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17596 **[Test build #76169 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76169/testReport)** for PR 17596 at commit [`4df95f2`](https://github.com/apache/spark/commit/4df95f2abc28b362b8330f11efeb801ca00f2f6e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17757 **[Test build #76168 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76168/testReport)** for PR 17757 at commit [`a87d5c0`](https://github.com/apache/spark/commit/a87d5c0c578542916706745cdbeca58ae24269e8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17693: [SPARK-16548][SQL] Inconsistent error handling in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17693 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17693 thanks, merging to master/2.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/17556 fix failed case, please retest it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17760: [SPARK-20439] [SQL] [Backport-2.1] Fix Catalog API listT...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17760 thanks, merging to 2.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17758 **[Test build #76167 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76167/testReport)** for PR 17758 at commit [`05a7a61`](https://github.com/apache/spark/commit/05a7a61259d87b8fa97214c96cedde9dc52dd3ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17640#discussion_r113358460 --- Diff: R/pkg/inst/tests/testthat/test_Serde.R --- @@ -28,6 +28,10 @@ test_that("SerDe of primitive types", { expect_equal(x, 1) expect_equal(class(x), "numeric") + x <- callJStatic("SparkRHandler", "echo", 1380742793415240) --- End diff -- I don't know how to specify in R console to enforce bigint type. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/17640#discussion_r113358355 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -3043,6 +3043,23 @@ test_that("catalog APIs, currentDatabase, setCurrentDatabase, listDatabases", { expect_equal(dbs[[1]], "default") }) +test_that("dapply with bigint type", { + df <- createDataFrame( +list(list(1380742793415240, 1, "1"), list(1380742793415240, 2, "2"), +list(1380742793415240, 3, "3")), c("a", "b", "c")) + schema <- structType(structField("a", "bigint"), structField("b", "bigint"), --- End diff -- This one tests bigint --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17758 **[Test build #76165 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76165/testReport)** for PR 17758 at commit [`de11a5b`](https://github.com/apache/spark/commit/de11a5b9f063953cb77d53f666312a3df6ba9801). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17758 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76165/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17758 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76163/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17765 **[Test build #76163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76163/testReport)** for PR 17765 at commit [`bd13a01`](https://github.com/apache/spark/commit/bd13a0178705bee4237ff30f6eabe7a5383b6dc5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76162/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17765 **[Test build #76162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76162/testReport)** for PR 17765 at commit [`609d50e`](https://github.com/apache/spark/commit/609d50ed4568bb2bb8f22869543dacac2b51c42f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17765 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76160/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17744: [SPARK-20426] Lazy initialization of FileSegmentM...
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/17744#discussion_r113356306 --- Diff: common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java --- @@ -93,14 +92,25 @@ protected void handleMessage( OpenBlocks msg = (OpenBlocks) msgObj; checkAuth(client, msg.appId); -List blocks = Lists.newArrayList(); -long totalBlockSize = 0; -for (String blockId : msg.blockIds) { - final ManagedBuffer block = blockManager.getBlockData(msg.appId, msg.execId, blockId); - totalBlockSize += block != null ? block.size() : 0; - blocks.add(block); -} -long streamId = streamManager.registerStream(client.getClientId(), blocks.iterator()); +Iterator iter = new Iterator() { + private int index = 0; + + @Override + public boolean hasNext() { +return index < msg.blockIds.length; + } + + @Override + public ManagedBuffer next() { +final ManagedBuffer block = blockManager.getBlockData(msg.appId, msg.execId, + msg.blockIds[index]); --- End diff -- @tgravescs Thanks a lot for taking time looking into this :) In my understanding, the `OpenBlocks` will be kept in heap after initialization(https://github.com/apache/spark/blob/master/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java#L84). Yes, `TransportRequestHandler.processRpcRequest` will release the `ByteBuf`, but the `OpenBlocks` will not be released. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17765 **[Test build #76160 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76160/testReport)** for PR 17765 at commit [`bd13a01`](https://github.com/apache/spark/commit/bd13a0178705bee4237ff30f6eabe7a5383b6dc5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17649 The changes look good to me if we don't care about the case sensitivity issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17766: [SPARK-20421][core] Mark internal listeners as deprecate...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17766 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76154/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17725 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17766: [SPARK-20421][core] Mark internal listeners as deprecate...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17766 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17725 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76156/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17766: [SPARK-20421][core] Mark internal listeners as deprecate...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17766 **[Test build #76154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76154/testReport)** for PR 17766 at commit [`d16be2b`](https://github.com/apache/spark/commit/d16be2b004c7b2f7ca34faf8bdd993cf0445694b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `@deprecated(\"This class will be removed in a future release.\", \"2.2.0\")` * `@deprecated(\"This class will be removed in a future release.\", \"2.2.0\")` * `@deprecated(\"This class will be removed in a future release.\", \"2.2.0\")` * `@deprecated(\"This class will be removed in a future release.\", \"2.2.0\")` * `@deprecated(\"This class will be removed in a future release.\", \"2.2.0\")` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17725 **[Test build #76156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76156/testReport)** for PR 17725 at commit [`80e40ba`](https://github.com/apache/spark/commit/80e40ba57ab6779604fca87cb696d2e889c4ddd2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17767: Refactoring of the ALS code
Github user hhbyyh commented on the issue: https://github.com/apache/spark/pull/17767 Preparing a PR like this takes a lot of efforts. Please try to follow the guidelines in http://spark.apache.org/contributing.html. (create a jira and rename the title). Like you said, I doubt if anyone would be able to review and confidently merge a PR of this scope. Could you please share some reasons for the refactoring or pain points of the current implementation? Then maybe we can find a way to break it down to some smaller changes. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17738: [SPARK-20422][Spark Core] Worker registration retries sh...
Github user unsleepy22 commented on the issue: https://github.com/apache/spark/pull/17738 Could someone take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support an extended display mo...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17733#discussion_r113351508 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -663,8 +695,54 @@ class Dataset[T] private[sql]( * @group action * @since 1.6.0 */ + def show(numRows: Int, truncate: Int): Unit = show(numRows, truncate, extendedMode = false) + + /** + * Displays the Dataset in a tabular form. For example: + * {{{ + * year month AVG('Adj Close) MAX('Adj Close) + * 1980 120.5032180.595103 + * 1981 010.5232890.570307 + * 1982 020.4365040.475256 + * 1983 030.4105160.442194 + * 1984 040.4500900.483521 + * }}} + * + * If `extendedMode` enabled, this command prints a column dat per line: + * {{{ + * -RECORD 0- + * c0 | 0.6988392500990668 + * c1 | 0.3035961718851606 + * c2 | 0.2446213804275899 + * c3 | 0.6132556607194246 + * c4 | 0.1904412430355646 + * c5 | 0.8856600775630444 + * -RECORD 1- + * c0 | 0.3942727621020799 + * c1 | 0.6501707200059537 + * c2 | 0.2550059028276454 + * c3 | 0.9806662488156962 + * c4 | 0.8533897091838063 + * c5 | 0.3911189623246518 + * -RECORD 2- + * c0 | 0.9024183805969801 + * c1 | 0.0242018765375147 + * c2 | 0.8508820250344251 + * c3 | 0.4593368817024575 + * c4 | 0.2216918145613194 + * c5 | 0.3756882647319614 + * }}} + * + * @param numRows Number of rows to show + * @param truncate If set to more than 0, truncates strings to `truncate` characters and + *all cells will be aligned right. + * @param extendedMode Enable expanded table formatting mode to print a column data per line. --- End diff -- Yes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17733 **[Test build #76166 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76166/testReport)** for PR 17733 at commit [`f696d35`](https://github.com/apache/spark/commit/f696d357ae9aa2e850f82d408aa413750c4d84b8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17191: [SPARK-14471][SQL] Aliases in SELECT could be use...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17191#discussion_r113351326 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -136,6 +136,7 @@ class Analyzer( ResolveGroupingAnalytics :: ResolvePivot :: ResolveOrdinalInOrderByAndGroupBy :: + ResolveAggAliasInGroupBy :: --- End diff -- @gatorsmile ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17760: [SPARK-20439] [SQL] [Backport-2.1] Fix Catalog API listT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17760 cc @cloud-fan @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17693 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76153/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17693 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17693 **[Test build #76153 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76153/testReport)** for PR 17693 at commit [`91bb487`](https://github.com/apache/spark/commit/91bb48708f852ea65ada9ebc48d03e57cd95ebf4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17737#discussion_r113350427 --- Diff: python/pyspark/sql/column.py --- @@ -251,15 +285,16 @@ def __iter__(self): # string methods _rlike_doc = """ -Return a Boolean :class:`Column` based on a regex match. +SQL RLIKE expression (LIKE with Regex). Returns a boolean :class:`Column` based on a regex --- End diff -- Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/17077 ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16781#discussion_r113346208 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala --- @@ -397,13 +392,38 @@ class ParquetHiveCompatibilitySuite extends ParquetCompatibilityTest with TestHi schema = new StructType().add("display", StringType).add("ts", TimestampType), options = options ) - Seq(false, true).foreach { vectorized => -withClue(s"vectorized = $vectorized;") { + + // also write out a partitioned table, to make sure we can access that correctly. + // add a column we can partition by (value doesn't particularly matter). + val partitionedData = adjustedRawData.withColumn("id", monotonicallyIncreasingId) + partitionedData.write.partitionBy("id") +.parquet(partitionedPath.getCanonicalPath) + // unfortunately, catalog.createTable() doesn't let us specify partitioning, so just use + // a "CREATE TABLE" stmt. + val tblOpts = explicitTz.map { tz => raw"""TBLPROPERTIES ($key="$tz")""" }.getOrElse("") + spark.sql(raw"""CREATE EXTERNAL TABLE partitioned_$baseTable ( + | display string, + | ts timestamp + |) + |PARTITIONED BY (id bigint) --- End diff -- We should test for the partitioned table like `PARTITIONED BY (ts timestamp)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/16781#discussion_r113345272 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala --- @@ -17,14 +17,25 @@ package org.apache.spark.sql.hive +import java.io.File import java.sql.Timestamp +import java.util.TimeZone -import org.apache.spark.sql.Row -import org.apache.spark.sql.execution.datasources.parquet.ParquetCompatibilityTest +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.parquet.hadoop.ParquetFileReader +import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.sql.{AnalysisException, Dataset, Row, SparkSession} +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.execution.datasources.parquet.{ParquetCompatibilityTest, ParquetFileFormat} +import org.apache.spark.sql.functions._ import org.apache.spark.sql.hive.test.TestHiveSingleton import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types.{StringType, StructField, StructType, TimestampType} -class ParquetHiveCompatibilitySuite extends ParquetCompatibilityTest with TestHiveSingleton { +class ParquetHiveCompatibilitySuite extends ParquetCompatibilityTest with TestHiveSingleton +with BeforeAndAfterEach { --- End diff -- We don't need `BeforeAndAfterEach` anymore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17758 **[Test build #76165 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76165/testReport)** for PR 17758 at commit [`de11a5b`](https://github.com/apache/spark/commit/de11a5b9f063953cb77d53f666312a3df6ba9801). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17767: Refactoring of the ALS code
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17767 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17767: Als refactor
GitHub user danielyli opened a pull request: https://github.com/apache/spark/pull/17767 Als refactor ## What changes were proposed in this pull request? This is a non-feature-changing refactoring of the ALS code (specifically, the `org.apache.spark.ml.recommendation` package), done to improve code maintainability and to add significant documentation to the existing code. My motivation for this PR is that I've been working on an online streaming ALS implementation [[SPARK-6407](https://issues.apache.org/jira/browse/SPARK-6407)] (PR coming soon), and I've been refactoring the package to help me understand the existing code before adding to it. I've also tried my best to include a fair bit of Scaladocs and inline comments where I felt they would have helped when I was reading the code. I've done a fair bit of rebasing and sausage making to make the commits easy to follow, since no one likes to stare at a 2,700-line PR. Please let me know if I can make anything clearer. I'd be happy to answer any questions. In a few places, you'll find a `PLEASE_ADVISE(danielyli):` tag in the code. These are questions I had in the course of the refactoring. I'd appreciate if the relevant folks could help me with these. Thanks. ## How was this patch tested? As this is a non-feature-changing refactoring, existing tests were used. All existing ALS tests pass. You can merge this pull request into a Git repository by running: $ git pull https://github.com/danielyli/spark als-refactor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17767.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17767 commit deca4db3f234ea60c1494265d4f3ac9375869dd6 Author: Daniel Li Date: 2017-04-05T21:55:21Z Split `ALS.scala` into multiple files This commit moves the classes `ALS` and `ALSModel` and the traits `ALSParams` and `ALSModelParams` into their own files. commit 4086bc9d0c7689e0d2047ac17ada29fe236eb6e6 Author: Daniel Li Date: 2017-04-05T22:20:32Z Move solver classes into their own file This commit puts the classes `LeastSquaresNESolver`, `CholeskySolver`, `NNLSSolver`, and `NormalEquation` into a mixin in a separate file in order to reduce the size and improve the readability of `ALS.scala`. commit 8aaa533df6f3c9a4b4e8c5d5023f831daf06fa9e Author: Daniel Li Date: 2017-04-05T22:30:38Z Minor cleanup of imports * import java.util.Arrays * import scala.collection.mutable.ArrayBuilder commit b68680025e71ebd422087ef95d5ecb7af40fa26d Author: Daniel Li Date: 2017-04-05T22:48:50Z Create a package object to hold small type and class definitions commit 83f849ee45fd7c80a1a50fcf12da1eb99d8b6346 Author: Daniel Li Date: 2017-04-06T02:17:14Z Refactor `RatingBlock`-related code This commit moves the following classes and methods into new files, separating and encapsulating them as appropriate: * RatingBlock * RatingBlockBuilder * UncompressedInBlock * UncompressedInBlockBuilder * KeyWrapper * UncompressedInBlockSort * LocalIndexEncoder * partitionRatings * makeBlocks In the course of this refactoring we create a new class, `RatingBlocks`, to hold the user/item in/out block data and associated logic. commit 819a00f7fe7384e588ce78cb65e9413ac6588401 Author: Daniel Li Date: 2017-04-06T07:08:43Z Pull out `RatingBlock` from `RatingBlocks` into its own file This commit puts the `RatingBlock` class into a mixin for the `RatingBlocks` companion object to extend. This is done purely to increase readability by reducing the file size of `RatingBlocks.scala`. commit 56d10ba1fa627f343e67525e2a3b08e7287bfe2f Author: Daniel Li Date: 2017-04-06T08:50:06Z Tighten access modifiers where appropriate and make case classes `final` commit b861d18784ba4ce688d3eacaea10169c9ce2d091 Author: Daniel Li Date: 2017-04-06T09:38:54Z Improve code hygiene of `RatingBlocks` Among other things, `while` loops that used manually incremented counters have been changed to `for` loops to increase readability. Performance should be nominally affected. commit 5dfee79a1280d0a72bbe7b8596cdf86654fa0fbc Author: Daniel Li Date: 2017-04-06T09:57:11Z Spruce up `ALS#fit` This commit adds vertical whitespace to improve readability. commit 056d6d0ecc962f94c83f43a6384607bf8833d083 Author: Daniel Li Date: 2017-04-25T23:31:54Z Mark `RatingBlocks` constructor as `private` commit 34df11247ec3fdcf29e220bedcdf28b58d1ac4ec Author: Daniel Li Date: 2017-04-25T23:32:44Z Add Scaladocs to `RatingBlocks.scala` commit 31b0dcd843d8edc16c6d2bf982d3de753b6dc066 Author: Daniel Li Date: 2017-04-06T09
[GitHub] spark issue #17605: [SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76164/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17605: [SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper fo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17605 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17605: [SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper fo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17605 **[Test build #76164 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76164/testReport)** for PR 17605 at commit [`9d75094`](https://github.com/apache/spark/commit/9d750943860479fab48543038fa89cb1dec4037c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17077 I think we should because branch-2.2 is cut out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support an extended display mo...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17733#discussion_r113347799 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -663,8 +695,54 @@ class Dataset[T] private[sql]( * @group action * @since 1.6.0 */ + def show(numRows: Int, truncate: Int): Unit = show(numRows, truncate, extendedMode = false) + + /** + * Displays the Dataset in a tabular form. For example: + * {{{ + * year month AVG('Adj Close) MAX('Adj Close) + * 1980 120.5032180.595103 + * 1981 010.5232890.570307 + * 1982 020.4365040.475256 + * 1983 030.4105160.442194 + * 1984 040.4500900.483521 + * }}} + * + * If `extendedMode` enabled, this command prints a column dat per line: + * {{{ + * -RECORD 0- + * c0 | 0.6988392500990668 + * c1 | 0.3035961718851606 + * c2 | 0.2446213804275899 + * c3 | 0.6132556607194246 + * c4 | 0.1904412430355646 + * c5 | 0.8856600775630444 + * -RECORD 1- + * c0 | 0.3942727621020799 + * c1 | 0.6501707200059537 + * c2 | 0.2550059028276454 + * c3 | 0.9806662488156962 + * c4 | 0.8533897091838063 + * c5 | 0.3911189623246518 + * -RECORD 2- + * c0 | 0.9024183805969801 + * c1 | 0.0242018765375147 + * c2 | 0.8508820250344251 + * c3 | 0.4593368817024575 + * c4 | 0.2216918145613194 + * c5 | 0.3756882647319614 + * }}} + * + * @param numRows Number of rows to show + * @param truncate If set to more than 0, truncates strings to `truncate` characters and + *all cells will be aligned right. + * @param extendedMode Enable expanded table formatting mode to print a column data per line. --- End diff -- This one? https://dev.mysql.com/doc/refman/5.7/en/mysql-command-options.html `Print query output rows vertically (one line per column value)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support an extended display mo...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17733#discussion_r113347676 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -663,8 +695,54 @@ class Dataset[T] private[sql]( * @group action * @since 1.6.0 */ + def show(numRows: Int, truncate: Int): Unit = show(numRows, truncate, extendedMode = false) + + /** + * Displays the Dataset in a tabular form. For example: + * {{{ + * year month AVG('Adj Close) MAX('Adj Close) + * 1980 120.5032180.595103 + * 1981 010.5232890.570307 + * 1982 020.4365040.475256 + * 1983 030.4105160.442194 + * 1984 040.4500900.483521 + * }}} + * + * If `extendedMode` enabled, this command prints a column dat per line: + * {{{ + * -RECORD 0- + * c0 | 0.6988392500990668 + * c1 | 0.3035961718851606 + * c2 | 0.2446213804275899 + * c3 | 0.6132556607194246 + * c4 | 0.1904412430355646 + * c5 | 0.8856600775630444 + * -RECORD 1- + * c0 | 0.3942727621020799 + * c1 | 0.6501707200059537 + * c2 | 0.2550059028276454 + * c3 | 0.9806662488156962 + * c4 | 0.8533897091838063 + * c5 | 0.3911189623246518 + * -RECORD 2- + * c0 | 0.9024183805969801 + * c1 | 0.0242018765375147 + * c2 | 0.8508820250344251 + * c3 | 0.4593368817024575 + * c4 | 0.2216918145613194 + * c5 | 0.3756882647319614 --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support an extended display mo...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17733#discussion_r113347666 --- Diff: R/pkg/R/DataFrame.R --- @@ -194,6 +194,8 @@ setMethod("isLocal", #' 20 characters will be truncated. However, if set greater than zero, #' truncates strings longer than \code{truncate} characters and all cells #' will be aligned right. +#' @param extendedMode enable expanded table formatting mode to print a column data --- End diff -- yea, STGM. I'll update. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/17077 @holdenk, @HyukjinKwon Do we retarget this to 2.3? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17728 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org