[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20900 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88566/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20900 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20900 **[Test build #88566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88566/testReport)** for PR 20900 at commit [`bc49c3c`](https://github.com/apache/spark/commit/bc49c3cc5ae2e23da5cc7b6d7e1a779e9d012c8c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20900 **[Test build #88566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88566/testReport)** for PR 20900 at commit [`bc49c3c`](https://github.com/apache/spark/commit/bc49c3cc5ae2e23da5cc7b6d7e1a779e9d012c8c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20798 that's fine :). let's close this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20900 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88565/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20886 **[Test build #88565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88565/testReport)** for PR 20886 at commit [`3b882fa`](https://github.com/apache/spark/commit/3b882fa61e94c7025ac035dcc2b483bec59e1ddf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88564/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20886 **[Test build #88564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88564/testReport)** for PR 20886 at commit [`ff57889`](https://github.com/apache/spark/commit/ff57889c4070e3f47e3f078e28d9d1c40fc8c338). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20842: [SPARK-23162][PySpark][ML] Add r2adj into Python API in ...
Github user tengpeng commented on the issue: https://github.com/apache/spark/pull/20842 Looks good! Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20896 Also merged to 2.2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20896 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20896 LGTM. Merging to master and 2.3. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/20896#discussion_r176925878 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -550,22 +550,22 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi .start() } -val input = MemoryStream[Int] -val q1 = startQuery(input.toDS, "stream_serializable_test_1") -val q2 = startQuery(input.toDS.map { i => +val input = MemoryStream[Int] :: MemoryStream[Int] :: MemoryStream[Int] :: Nil --- End diff -- I think this is just to save several lines. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20787: [MINOR][DOCS] Documenting months_between directio...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20787#discussion_r176925831 --- Diff: R/pkg/R/functions.R --- @@ -1957,8 +1958,12 @@ setMethod("levenshtein", signature(y = "Column"), }) #' @details -#' \code{months_between}: Returns number of months between dates \code{y} and \code{x}. -#' +#' \code{months_between}: Returns number of months between dates \code{y} and \code{x}. +#' If \code{y} is later than \code{x}, then the result is positive. +#' If \code{y} and \code{x} are on the same day of month, or both are the last day of month, +#' time of day will be ignored. +#' Otherwise, the difference is calculated based on 31 days per month, and rounded to +#' 8 digits. #' @rdname column_datetime_diff_functions --- End diff -- you should leave a line `#'` before this --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20787: [MINOR][DOCS] Documenting months_between directio...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20787#discussion_r176925842 --- Diff: R/pkg/R/functions.R --- @@ -1957,8 +1958,12 @@ setMethod("levenshtein", signature(y = "Column"), }) #' @details -#' \code{months_between}: Returns number of months between dates \code{y} and \code{x}. -#' +#' \code{months_between}: Returns number of months between dates \code{y} and \code{x}. +#' If \code{y} is later than \code{x}, then the result is positive. +#' If \code{y} and \code{x} are on the same day of month, or both are the last day of month, +#' time of day will be ignored. +#' Otherwise, the difference is calculated based on 31 days per month, and rounded to +#' 8 digits. #' @rdname column_datetime_diff_functions --- End diff -- also just as reference, the whitespace/newline will be stripped ``` time of day will be ignored. Otherwise ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1739/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20886 **[Test build #88565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88565/testReport)** for PR 20886 at commit [`3b882fa`](https://github.com/apache/spark/commit/3b882fa61e94c7025ac035dcc2b483bec59e1ddf). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20886 **[Test build #88564 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88564/testReport)** for PR 20886 at commit [`ff57889`](https://github.com/apache/spark/commit/ff57889c4070e3f47e3f078e28d9d1c40fc8c338). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20886: [WIP][SPARK-19724][SQL]create a managed table with an ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1738/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20851#discussion_r176922300 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -353,6 +353,13 @@ object SQLConf { .booleanConf .createWithDefault(true) + val PARQUET_FILTER_PUSHDOWN_DATE_ENABLED = buildConf("spark.sql.parquet.filterPushdown.date") +.doc("If true, enables Parquet filter push-down optimization for Date. " + + "This configuration only has an effect when 'spark.sql.parquet.filterPushdown' is enabled.") +.internal() +.booleanConf +.createWithDefault(false) --- End diff -- If you think so, +1. :) BTW, based on Apache Spark way, I assume that this will not land on `branch-2.3` with `spark.sql.parquet.filterPushdown.date=true`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20798 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88563/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20798 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20798 **[Test build #88563 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88563/testReport)** for PR 20798 at commit [`b4f7056`](https://github.com/apache/spark/commit/b4f7056187474a2bad16c81e79798214980d7b80). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20866#discussion_r176919674 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala --- @@ -92,8 +93,8 @@ private[security] class HiveDelegationTokenProvider s"$principal at $metastoreUri") doAsRealUser { -val hive = Hive.get(conf, classOf[HiveConf]) --- End diff -- Ya. What I asked is the following. > Could you be more specific about the scope of this PR in the title and description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user mstewart141 commented on the issue: https://github.com/apache/spark/pull/20900 @HyukjinKwon the old pr: https://github.com/apache/spark/pull/20798 was a disaster from a git-cleanliness perspective so i've updated here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20900 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20900 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20900: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `p...
GitHub user mstewart141 opened a pull request: https://github.com/apache/spark/pull/20900 [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_udf` with keyword args ## What changes were proposed in this pull request? Add documentation about the limitations of `pandas_udf` with keyword arguments and related concepts, like `functools.partial` fn objects. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mstewart141/spark udfkw2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20900.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20900 commit 048570f7e5f421288b7c297e4d2e3873626a6adc Author: Michael (Stu) StewartDate: 2018-03-11T20:38:29Z [SPARK-23645][PYTHON] Allow python udfs to be called with keyword arguments commit 9ea2595f0cecb0cd05e0e6b99baf538679332e8b Author: Michael (Stu) Stewart Date: 2018-03-18T18:04:21Z Incomplete / Show issue with partial fn in pandas_udf commit acd1cbe53dc7d1bf83b1022a7e36652cd9530b58 Author: Michael (Stu) Stewart Date: 2018-03-18T18:13:53Z Add note RE no keyword args in python UDFs commit bc49c3cc5ae2e23da5cc7b6d7e1a779e9d012c8c Author: Michael (Stu) Stewart Date: 2018-03-24T17:30:15Z Address comments --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20798 **[Test build #88563 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88563/testReport)** for PR 20798 at commit [`b4f7056`](https://github.com/apache/spark/commit/b4f7056187474a2bad16c81e79798214980d7b80). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88562/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20866 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20866 **[Test build #88562 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88562/testReport)** for PR 20866 at commit [`f51bf63`](https://github.com/apache/spark/commit/f51bf634d89aaa0a2e1077903e6831a9284aea10). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19876: [ML][SPARK-23783][SPARK-11239] Add PMML export to...
Github user goungoun commented on a diff in the pull request: https://github.com/apache/spark/pull/19876#discussion_r176911201 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -710,15 +711,58 @@ class LinearRegressionModel private[ml] ( } /** - * Returns a [[org.apache.spark.ml.util.MLWriter]] instance for this ML instance. + * Returns a [[org.apache.spark.ml.util.GeneralMLWriter]] instance for this ML instance. * * For [[LinearRegressionModel]], this does NOT currently save the training [[summary]]. * An option to save [[summary]] may be added in the future. * * This also does not save the [[parent]] currently. */ @Since("1.6.0") - override def write: MLWriter = new LinearRegressionModel.LinearRegressionModelWriter(this) + override def write: GeneralMLWriter = new GeneralMLWriter(this) +} + +/** A writer for LinearRegression that handles the "internal" (or default) format */ +private class InternalLinearRegressionModelWriter + extends MLWriterFormat with MLFormatRegister { + + override def format(): String = "internal" + override def stageName(): String = "org.apache.spark.ml.regression.LinearRegressionModel" + + private case class Data(intercept: Double, coefficients: Vector, scale: Double) + + override def write(path: String, sparkSession: SparkSession, +optionMap: mutable.Map[String, String], stage: PipelineStage): Unit = { +val instance = stage.asInstanceOf[LinearRegressionModel] +val sc = sparkSession.sparkContext +// Save metadata and Params +DefaultParamsWriter.saveMetadata(instance, path, sc) +// Save model data: intercept, coefficients, scale +val data = Data(instance.intercept, instance.coefficients, instance.scale) +val dataPath = new Path(path, "data").toString + sparkSession.createDataFrame(Seq(data)).repartition(1).write.parquet(dataPath) + } +} + +/** A writer for LinearRegression that handles the "pmml" format */ +private class PMMLLinearRegressionModelWriter +extends MLWriterFormat with MLFormatRegister { --- End diff -- Should be two space indentation `extends MLWriterFormat with MLFormatRegister {` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20891 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20891 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88560/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20891 **[Test build #88560 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88560/testReport)** for PR 20891 at commit [`126e6a8`](https://github.com/apache/spark/commit/126e6a8e7d333ecf99c26b374698d7cd0e1a9d19). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20891 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88561/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20891 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20891 **[Test build #88561 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88561/testReport)** for PR 20891 at commit [`cefd672`](https://github.com/apache/spark/commit/cefd672e79b508e995382ce146cd70a4d130af01). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20899: Bug fix in sendMessage() of pregel implementation in Pag...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20899 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20899: fix bug in sendMessage() of pregel implementation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20899 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20899: fix bug in sendMessage() of pregel implementation
GitHub user mangolzy opened a pull request: https://github.com/apache/spark/pull/20899 fix bug in sendMessage() of pregel implementation ## What changes were proposed in this pull request? Iterator((edge.dstId, edge.srcAttr._2 * edge.attr)) -> Iterator((edge.dstId, edge.srcAttr._1 * edge.attr)) ## How was this patch tested? Since edge.srcAttr._2 is used to compare with tol, it should be the (newPR - oldPR), but in the sendMessage, the origin code send it as part of the message, instead it should be the newPR which is edge.srcAttr._1. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mangolzy/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20899.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20899 commit be567d83c4069a0592fb61464c795d123bd34116 Author: mangolzyDate: 2018-03-24T13:32:52Z fix bug in sendMessage() of pregel implementation Iterator((edge.dstId, edge.srcAttr._2 * edge.attr)) -> Iterator((edge.dstId, edge.srcAttr._1 * edge.attr)) Since edge.srcAttr._2 is used to compare with tol, it should be the (newPR - oldPR), but in the sendMessage, the origin code send it as part of the message, instead it should be the newPR which is edge.srcAttr._1. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20866#discussion_r176906868 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala --- @@ -92,8 +93,8 @@ private[security] class HiveDelegationTokenProvider s"$principal at $metastoreUri") doAsRealUser { -val hive = Hive.get(conf, classOf[HiveConf]) --- End diff -- 1. This [`Hive.get()`](https://github.com/apache/spark/blob/v2.3.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L239) is different from others, It loaded by `IsolatedClientLoader`. 2. I can not start a `HiveThriftServer2` in a kerberized cluster, so I'm not sure `CLIService.java` should be updated, How about update it later? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20866#discussion_r176906522 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala --- @@ -92,8 +93,8 @@ private[security] class HiveDelegationTokenProvider s"$principal at $metastoreUri") doAsRealUser { -val hive = Hive.get(conf, classOf[HiveConf]) -val tokenStr = hive.getDelegationToken(currentUser.getUserName(), principal) +metaStoreClient = new HiveMetaStoreClient(conf.asInstanceOf[HiveConf]) +val tokenStr = metaStoreClient.getDelegationToken(currentUser.getUserName, principal) --- End diff -- Yes, both HMS 1.x and 2.x --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20889: [MINOR][DOC] Fix ml-guide markdown typos
Github user Lemonjing closed the pull request at: https://github.com/apache/spark/pull/20889 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20889: [MINOR][DOC] Fix ml-guide markdown typos
Github user Lemonjing commented on the issue: https://github.com/apache/spark/pull/20889 @felixcheung thanks a lot. and i closed it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20866#discussion_r176906474 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala --- @@ -92,8 +94,9 @@ private[security] class HiveDelegationTokenProvider s"$principal at $metastoreUri") doAsRealUser { -val hive = Hive.get(conf, classOf[HiveConf]) -val tokenStr = hive.getDelegationToken(currentUser.getUserName(), principal) +metastoreClient = RetryingMetaStoreClient.getProxy(conf.asInstanceOf[HiveConf], null, --- End diff -- HiveMetaStoreClient -> RetryingMetaStoreClient. In fact, `Hive.get` also uses `RetryingMetaStoreClient`: ``` at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20866 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20866 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1737/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible with d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20866 **[Test build #88562 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88562/testReport)** for PR 20866 at commit [`f51bf63`](https://github.com/apache/spark/commit/f51bf634d89aaa0a2e1077903e6831a9284aea10). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20891 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20891 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1736/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20891 **[Test build #88561 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88561/testReport)** for PR 20891 at commit [`cefd672`](https://github.com/apache/spark/commit/cefd672e79b508e995382ce146cd70a4d130af01). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20701 any more comments @holdenk ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20891 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20891 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1735/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20891 **[Test build #88560 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88560/testReport)** for PR 20891 at commit [`126e6a8`](https://github.com/apache/spark/commit/126e6a8e7d333ecf99c26b374698d7cd0e1a9d19). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20898 Yes, it's proxy user: ``` export HADOOP_PROXY_USER=user spark-sql --master yarn ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20838#discussion_r176904365 --- Diff: python/pyspark/cloudpickle.py --- @@ -802,9 +802,8 @@ def save_not_implemented(self, obj): self.save_reduce(_gen_not_implemented, ()) if PY3: -dispatch[io.TextIOWrapper] = save_file -else: -dispatch[file] = save_file +file = io.TextIOWrapper +dispatch[file] = save_file --- End diff -- I think this one is actually related with cloudpickle's PR. I was trying to (exactly) match this file to a specific version of cloudpickle (which is currently 0.4.3 - https://github.com/cloudpipe/cloudpickle/releases/tag/v0.4.3). So, I thought we could wait for more feedback there. At least, I was thinking that we should match it to https://github.com/cloudpipe/cloudpickle/tree/0.4.x If that one is merged, I could backport that change into 0.4.x branch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...
Github user rednaxelafx commented on the issue: https://github.com/apache/spark/pull/20779 Sorry for the late comment. This PR itself is LGTM. I'd just like to make some side comments on why this bug is happening. Janino uses a peculiar encoding of implementing bridge methods for extended accessibility from an inner class to members of its enclosing class. Here we're actually hitting a bug in Janino where it missed creating bridge methods on the enclosing class (`GeneratedClass...`) for `protected` members that it inherited from a base class (`BufferedRowIterator`). I've seen this bug in Janino before, and I plan to fix it in Janino soon. Once it's fixed in Janino, we can safely use `protected` members such as `append` and `stopEarly` in nested classes within `GeneratedClass...` again. Would anybody be interested in switching these methods back to `protected` once it's fixed in Janino and Spark bumps the Janino dependency to that new version? Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...
Github user yaooqinn commented on the issue: https://github.com/apache/spark/pull/20898 proxy or not, i only find such an issue with proxy https://github.com/apache/spark/pull/20784 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20898 cc @yaooqinn @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20898 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88559/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20898 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20898 **[Test build #88559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88559/testReport)** for PR 20898 at commit [`a382aab`](https://github.com/apache/spark/commit/a382aab4f3a9cabda10ab2aedbbb8d663737348f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20898 **[Test build #88559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88559/testReport)** for PR 20898 at commit [`a382aab`](https://github.com/apache/spark/commit/a382aab4f3a9cabda10ab2aedbbb8d663737348f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20898 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.uris bef...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20898 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1734/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20898: [SPARK-23789][SQL] Shouldn't set hive.metastore.u...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20898 [SPARK-23789][SQL] Shouldn't set hive.metastore.uris before invoking HiveDelegationTokenProvider ## What changes were proposed in this pull request? `spark-sql` can't connect to metastore with a security Hadoop cluster after [SPARK-21428](https://issues.apache.org/jira/browse/SPARK-21428). `hive.metastore.uris` was `HiveConf.ConfVars.METASTOREURIS.defaultStrVal` here before SPARK-21428. This pr revert `hive.metastore.uris` to `HiveConf.ConfVars.METASTOREURIS.defaultStrVal`. ## How was this patch tested? manual tests with a security Hadoop cluster You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-23789 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20898.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20898 commit a382aab4f3a9cabda10ab2aedbbb8d663737348f Author: Yuming WangDate: 2018-03-24T07:19:25Z Shouldn't set hive.metastore.uris before invoking HiveDelegationTokenProvider --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20897 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20897 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88558/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20897 **[Test build #88558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88558/testReport)** for PR 20897 at commit [`937bbef`](https://github.com/apache/spark/commit/937bbef522eedddbcb502f7f9692564040a63cd7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20866: [SPARK-23749][SQL] Avoid Hive.get() to compatible...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20866#discussion_r176902474 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala --- @@ -92,8 +93,8 @@ private[security] class HiveDelegationTokenProvider s"$principal at $metastoreUri") doAsRealUser { -val hive = Hive.get(conf, classOf[HiveConf]) --- End diff -- Thanks @dongjoon-hyun, seems `RetryingMetaStoreClient` is a better choice and I will try. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176901542 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Concatenates multiple arrays into one. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6)); + [1,2,3,4,5,6] + """) +case class ConcatArrays(children: Seq[Expression]) extends Expression with NullSafeEvaluation { + + override def checkInputDataTypes(): TypeCheckResult = { +val arrayCheck = checkInputDataTypesAreArrays +if(arrayCheck.isFailure) arrayCheck --- End diff -- Style issue: ```scala if (...) { ... } else { ... } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176901684 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression { * and Hive function wrappers. */ trait UserDefinedExpression + +/** + * The trait covers logic for performing null save evaluation and code generation. + */ +trait NullSafeEvaluation extends Expression --- End diff -- Do we need to bring in `NullSafeEvaluation`? If only `ConcatArray` uses it, we may not need to add this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176901843 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression { * and Hive function wrappers. */ trait UserDefinedExpression + +/** + * The trait covers logic for performing null save evaluation and code generation. + */ +trait NullSafeEvaluation extends Expression +{ + override def foldable: Boolean = children.forall(_.foldable) + + override def nullable: Boolean = children.exists(_.nullable) + + /** + * Default behavior of evaluation according to the default nullability of NullSafeEvaluation. + * If a class utilizing NullSaveEvaluation override [[nullable]], probably should also + * override this. + */ + override def eval(input: InternalRow): Any = + { +val values = children.map(_.eval(input)) +if (values.contains(null)) null +else nullSafeEval(values) + } + + /** + * Called by default [[eval]] implementation. If a class utilizing NullSaveEvaluation keep + * the default nullability, they can override this method to save null-check code. If we need + * full control of evaluation process, we should override [[eval]]. + */ + protected def nullSafeEval(inputs: Seq[Any]): Any = +sys.error(s"The class utilizing NullSaveEvaluation must override either eval or nullSafeEval") + + /** + * Short hand for generating of null save evaluation code. + * If either of the sub-expressions is null, the result of this computation + * is assumed to be null. + * + * @param f accepts a sequence of variable names and returns Java code to compute the output. + */ + protected def defineCodeGen( +ctx: CodegenContext, +ev: ExprCode, +f: Seq[String] => String): ExprCode = { +nullSafeCodeGen(ctx, ev, values => { + s"${ev.value} = ${f(values)};" +}) + } + + /** + * Called by expressions to generate null safe evaluation code. + * If either of the sub-expressions is null, the result of this computation + * is assumed to be null. + * + * @param f a function that accepts a sequence of non-null evaluation result names of children + * and returns Java code to compute the output. + */ + protected def nullSafeCodeGen( + ctx: CodegenContext, + ev: ExprCode, + f: Seq[String] => String): ExprCode = { +val gens = children.map(_.genCode(ctx)) +val resultCode = f(gens.map(_.value)) + +if (nullable) { + val nullSafeEval = +(s""" + ${ev.isNull} = false; // resultCode could change nullability. + $resultCode +""" /: children.zip(gens)) { --- End diff -- Use `foldLeft` for readability. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176902162 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Concatenates multiple arrays into one. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6)); + [1,2,3,4,5,6] + """) +case class ConcatArrays(children: Seq[Expression]) extends Expression with NullSafeEvaluation { + + override def checkInputDataTypes(): TypeCheckResult = { +val arrayCheck = checkInputDataTypesAreArrays +if(arrayCheck.isFailure) arrayCheck +else TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), s"function $prettyName") + } + + private def checkInputDataTypesAreArrays(): TypeCheckResult = + { +val mismatches = children.zipWithIndex.collect { + case (child, idx) if !ArrayType.acceptsType(child.dataType) => +s"argument ${idx + 1} has to be ${ArrayType.simpleString} type, " + + s"however, '${child.sql}' is of ${child.dataType.simpleString} type." +} + +if (mismatches.isEmpty) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure(mismatches.mkString(" ")) +} + } + + override def dataType: ArrayType = +children + .headOption.map(_.dataType.asInstanceOf[ArrayType]) + .getOrElse(ArrayType.defaultConcreteType.asInstanceOf[ArrayType]) + + + override protected def nullSafeEval(inputs: Seq[Any]): Any = { +val elements = inputs.flatMap(_.asInstanceOf[ArrayData].toObjectArray(dataType.elementType)) +new GenericArrayData(elements) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, arrays => { + val elementType = dataType.elementType + if (CodeGenerator.isPrimitiveType(elementType)) { +genCodeForConcatOfPrimitiveElements(ctx, elementType, arrays, ev.value) + } else { +genCodeForConcatOfComplexElements(ctx, arrays, ev.value) + } +}) + } + + private def genCodeForNumberOfElements( +ctx: CodegenContext, +elements: Seq[String] + ) : (String, String) = { +val variableName = ctx.freshName("numElements") +val code = elements + .map(el => s"$variableName += $el.numElements();") + .foldLeft( s"int $variableName = 0;")((acc, s) => acc + "\n" + s) +(code, variableName) + } + + private def genCodeForConcatOfPrimitiveElements( +ctx: CodegenContext, +elementType: DataType, +elements: Seq[String], +arrayDataName: String + ): String = { +val arrayName = ctx.freshName("array") +val arraySizeName = ctx.freshName("size") +val counter = ctx.freshName("counter") +val tempArrayDataName = ctx.freshName("tempArrayData") + +val (numElemCode, numElemName) = genCodeForNumberOfElements(ctx, elements) + +val unsafeArraySizeInBytes = s""" + |int $arraySizeName = UnsafeArrayData.calculateHeaderPortionInBytes($numElemName) + + |${classOf[ByteArrayMethods].getName}.roundNumberOfBytesToNearestWord( + |${elementType.defaultSize} * $numElemName + |); + """.stripMargin +val baseOffset = Platform.BYTE_ARRAY_OFFSET + +val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType) +val assignments = elements.map { el => + s""" +|for(int z = 0; z < $el.numElements(); z++) { +| if($el.isNullAt(z)) { --- End diff -- Style: `if ()`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176901957 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression { * and Hive function wrappers. */ trait UserDefinedExpression + +/** + * The trait covers logic for performing null save evaluation and code generation. + */ +trait NullSafeEvaluation extends Expression +{ + override def foldable: Boolean = children.forall(_.foldable) + + override def nullable: Boolean = children.exists(_.nullable) + + /** + * Default behavior of evaluation according to the default nullability of NullSafeEvaluation. + * If a class utilizing NullSaveEvaluation override [[nullable]], probably should also + * override this. + */ + override def eval(input: InternalRow): Any = + { +val values = children.map(_.eval(input)) +if (values.contains(null)) null +else nullSafeEval(values) + } + + /** + * Called by default [[eval]] implementation. If a class utilizing NullSaveEvaluation keep + * the default nullability, they can override this method to save null-check code. If we need + * full control of evaluation process, we should override [[eval]]. + */ + protected def nullSafeEval(inputs: Seq[Any]): Any = +sys.error(s"The class utilizing NullSaveEvaluation must override either eval or nullSafeEval") + + /** + * Short hand for generating of null save evaluation code. + * If either of the sub-expressions is null, the result of this computation + * is assumed to be null. + * + * @param f accepts a sequence of variable names and returns Java code to compute the output. + */ + protected def defineCodeGen( +ctx: CodegenContext, +ev: ExprCode, +f: Seq[String] => String): ExprCode = { +nullSafeCodeGen(ctx, ev, values => { + s"${ev.value} = ${f(values)};" +}) + } + + /** + * Called by expressions to generate null safe evaluation code. + * If either of the sub-expressions is null, the result of this computation + * is assumed to be null. + * + * @param f a function that accepts a sequence of non-null evaluation result names of children + * and returns Java code to compute the output. + */ + protected def nullSafeCodeGen( + ctx: CodegenContext, + ev: ExprCode, + f: Seq[String] => String): ExprCode = { +val gens = children.map(_.genCode(ctx)) +val resultCode = f(gens.map(_.value)) + +if (nullable) { + val nullSafeEval = +(s""" + ${ev.isNull} = false; // resultCode could change nullability. + $resultCode +""" /: children.zip(gens)) { + case (acc, (child, gen)) => +gen.code + ctx.nullSafeExec(child.nullable, gen.isNull)(acc) --- End diff -- For example, for a binary expression, doesn't this generate code like: ```scala rightGen.code + ctx.nullSafeExec(right.nullable, rightGen.isNull) { leftGen.code + ctx.nullSafeExec(left.nullable, leftGen.isNull) { ${ev.isNull} = false; // resultCode could change nullability. $resultCode } } ``` Although for deterministic expressions, the evaluation order doesn't matter. But for non-deterministic, I'm little concerned that it may cause unexpected change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176901989 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Concatenates multiple arrays into one. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.", --- End diff -- Defines that the element types of the arrays must be the same. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176901348 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression { * and Hive function wrappers. */ trait UserDefinedExpression + +/** + * The trait covers logic for performing null save evaluation and code generation. + */ +trait NullSafeEvaluation extends Expression +{ + override def foldable: Boolean = children.forall(_.foldable) + + override def nullable: Boolean = children.exists(_.nullable) + + /** + * Default behavior of evaluation according to the default nullability of NullSafeEvaluation. + * If a class utilizing NullSaveEvaluation override [[nullable]], probably should also + * override this. + */ + override def eval(input: InternalRow): Any = + { --- End diff -- Spark usually use the style like: ```scala override def eval(input: InternalRow): Any = { val values = children.map(_.eval(input)) if (values.contains(null)) { null } else { nullSafeEval(values) } } ``` You could follow the style of other codes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176901429 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression { * and Hive function wrappers. */ trait UserDefinedExpression + +/** + * The trait covers logic for performing null save evaluation and code generation. + */ +trait NullSafeEvaluation extends Expression +{ + override def foldable: Boolean = children.forall(_.foldable) + + override def nullable: Boolean = children.exists(_.nullable) + + /** + * Default behavior of evaluation according to the default nullability of NullSafeEvaluation. + * If a class utilizing NullSaveEvaluation override [[nullable]], probably should also + * override this. + */ + override def eval(input: InternalRow): Any = + { --- End diff -- There are other places where the braces `{}` style doesn't follow Spark codes. We should keep the same code style. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176902327 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Concatenates multiple arrays into one. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6)); + [1,2,3,4,5,6] + """) +case class ConcatArrays(children: Seq[Expression]) extends Expression with NullSafeEvaluation { + + override def checkInputDataTypes(): TypeCheckResult = { +val arrayCheck = checkInputDataTypesAreArrays +if(arrayCheck.isFailure) arrayCheck +else TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), s"function $prettyName") + } + + private def checkInputDataTypesAreArrays(): TypeCheckResult = + { +val mismatches = children.zipWithIndex.collect { + case (child, idx) if !ArrayType.acceptsType(child.dataType) => +s"argument ${idx + 1} has to be ${ArrayType.simpleString} type, " + + s"however, '${child.sql}' is of ${child.dataType.simpleString} type." +} + +if (mismatches.isEmpty) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure(mismatches.mkString(" ")) +} + } + + override def dataType: ArrayType = +children + .headOption.map(_.dataType.asInstanceOf[ArrayType]) + .getOrElse(ArrayType.defaultConcreteType.asInstanceOf[ArrayType]) + + + override protected def nullSafeEval(inputs: Seq[Any]): Any = { +val elements = inputs.flatMap(_.asInstanceOf[ArrayData].toObjectArray(dataType.elementType)) +new GenericArrayData(elements) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, arrays => { + val elementType = dataType.elementType + if (CodeGenerator.isPrimitiveType(elementType)) { +genCodeForConcatOfPrimitiveElements(ctx, elementType, arrays, ev.value) + } else { +genCodeForConcatOfComplexElements(ctx, arrays, ev.value) + } +}) + } + + private def genCodeForNumberOfElements( +ctx: CodegenContext, +elements: Seq[String] + ) : (String, String) = { +val variableName = ctx.freshName("numElements") +val code = elements + .map(el => s"$variableName += $el.numElements();") + .foldLeft( s"int $variableName = 0;")((acc, s) => acc + "\n" + s) +(code, variableName) + } + + private def genCodeForConcatOfPrimitiveElements( +ctx: CodegenContext, +elementType: DataType, +elements: Seq[String], +arrayDataName: String + ): String = { +val arrayName = ctx.freshName("array") +val arraySizeName = ctx.freshName("size") +val counter = ctx.freshName("counter") +val tempArrayDataName = ctx.freshName("tempArrayData") + +val (numElemCode, numElemName) = genCodeForNumberOfElements(ctx, elements) + +val unsafeArraySizeInBytes = s""" + |int $arraySizeName = UnsafeArrayData.calculateHeaderPortionInBytes($numElemName) + + |${classOf[ByteArrayMethods].getName}.roundNumberOfBytesToNearestWord( + |${elementType.defaultSize} * $numElemName + |); + """.stripMargin +val baseOffset = Platform.BYTE_ARRAY_OFFSET + +val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType) +val assignments = elements.map { el => + s""" +|for(int z = 0; z < $el.numElements(); z++) { +| if($el.isNullAt(z)) { +| $tempArrayDataName.setNullAt($counter); +| } else { +| $tempArrayDataName.set$primitiveValueTypeName( +| $counter, +| $el.get$primitiveValueTypeName(z) +| ); +| } +| $counter++; +|} +""".stripMargin +}.mkString("\n") + +s""" + |$numElemCode + |$unsafeArraySizeInBytes + |byte[] $arrayName = new byte[$arraySizeName]; + |UnsafeArrayData $tempArrayDataName = new UnsafeArrayData(); + |Platform.putLong($arrayName, $baseOffset, $numElemName); + |$tempArrayDataName.pointTo($arrayName, $baseOffset, $arraySizeName); + |int $counter = 0; + |$assignments + |$arrayDataName = $tempArrayDataName; +""".stripMargin +
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176901467 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression { * and Hive function wrappers. */ trait UserDefinedExpression + +/** + * The trait covers logic for performing null save evaluation and code generation. + */ +trait NullSafeEvaluation extends Expression +{ + override def foldable: Boolean = children.forall(_.foldable) + + override def nullable: Boolean = children.exists(_.nullable) + + /** + * Default behavior of evaluation according to the default nullability of NullSafeEvaluation. + * If a class utilizing NullSaveEvaluation override [[nullable]], probably should also + * override this. + */ + override def eval(input: InternalRow): Any = + { +val values = children.map(_.eval(input)) --- End diff -- We probably don't need to evaluate all children. Once any child expression is null, we can just return null. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176901317 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -699,3 +699,88 @@ abstract class TernaryExpression extends Expression { * and Hive function wrappers. */ trait UserDefinedExpression + +/** + * The trait covers logic for performing null save evaluation and code generation. --- End diff -- typo: null safe. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20858: [SPARK-23736][SQL] Implementation of the concat_a...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r176902161 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -287,3 +289,152 @@ case class ArrayContains(left: Expression, right: Expression) override def prettyName: String = "array_contains" } + +/** + * Concatenates multiple arrays into one. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, ...) - Concatenates multiple arrays into one.", + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(4, 5), array(6)); + [1,2,3,4,5,6] + """) +case class ConcatArrays(children: Seq[Expression]) extends Expression with NullSafeEvaluation { + + override def checkInputDataTypes(): TypeCheckResult = { +val arrayCheck = checkInputDataTypesAreArrays +if(arrayCheck.isFailure) arrayCheck +else TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), s"function $prettyName") + } + + private def checkInputDataTypesAreArrays(): TypeCheckResult = + { +val mismatches = children.zipWithIndex.collect { + case (child, idx) if !ArrayType.acceptsType(child.dataType) => +s"argument ${idx + 1} has to be ${ArrayType.simpleString} type, " + + s"however, '${child.sql}' is of ${child.dataType.simpleString} type." +} + +if (mismatches.isEmpty) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure(mismatches.mkString(" ")) +} + } + + override def dataType: ArrayType = +children + .headOption.map(_.dataType.asInstanceOf[ArrayType]) + .getOrElse(ArrayType.defaultConcreteType.asInstanceOf[ArrayType]) + + + override protected def nullSafeEval(inputs: Seq[Any]): Any = { +val elements = inputs.flatMap(_.asInstanceOf[ArrayData].toObjectArray(dataType.elementType)) +new GenericArrayData(elements) + } + + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +nullSafeCodeGen(ctx, ev, arrays => { + val elementType = dataType.elementType + if (CodeGenerator.isPrimitiveType(elementType)) { +genCodeForConcatOfPrimitiveElements(ctx, elementType, arrays, ev.value) + } else { +genCodeForConcatOfComplexElements(ctx, arrays, ev.value) + } +}) + } + + private def genCodeForNumberOfElements( +ctx: CodegenContext, +elements: Seq[String] + ) : (String, String) = { +val variableName = ctx.freshName("numElements") +val code = elements + .map(el => s"$variableName += $el.numElements();") + .foldLeft( s"int $variableName = 0;")((acc, s) => acc + "\n" + s) +(code, variableName) + } + + private def genCodeForConcatOfPrimitiveElements( +ctx: CodegenContext, +elementType: DataType, +elements: Seq[String], +arrayDataName: String + ): String = { +val arrayName = ctx.freshName("array") +val arraySizeName = ctx.freshName("size") +val counter = ctx.freshName("counter") +val tempArrayDataName = ctx.freshName("tempArrayData") + +val (numElemCode, numElemName) = genCodeForNumberOfElements(ctx, elements) + +val unsafeArraySizeInBytes = s""" + |int $arraySizeName = UnsafeArrayData.calculateHeaderPortionInBytes($numElemName) + + |${classOf[ByteArrayMethods].getName}.roundNumberOfBytesToNearestWord( + |${elementType.defaultSize} * $numElemName + |); + """.stripMargin +val baseOffset = Platform.BYTE_ARRAY_OFFSET + +val primitiveValueTypeName = CodeGenerator.primitiveTypeName(elementType) +val assignments = elements.map { el => + s""" +|for(int z = 0; z < $el.numElements(); z++) { --- End diff -- Stype: `for (` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20897 **[Test build #88558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88558/testReport)** for PR 20897 at commit [`937bbef`](https://github.com/apache/spark/commit/937bbef522eedddbcb502f7f9692564040a63cd7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20897 jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20896: [SPARK-23788][SS] Fix race in StreamingQuerySuite
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20896#discussion_r176902181 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -550,22 +550,22 @@ class StreamingQuerySuite extends StreamTest with BeforeAndAfter with Logging wi .start() } -val input = MemoryStream[Int] -val q1 = startQuery(input.toDS, "stream_serializable_test_1") -val q2 = startQuery(input.toDS.map { i => +val input = MemoryStream[Int] :: MemoryStream[Int] :: MemoryStream[Int] :: Nil --- End diff -- why build a list and not use 3 separate variables? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20897 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20897 **[Test build #88557 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88557/testReport)** for PR 20897 at commit [`937bbef`](https://github.com/apache/spark/commit/937bbef522eedddbcb502f7f9692564040a63cd7). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20897 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88557/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20003: [SPARK-22817][R] Use fixed testthat version for SparkR t...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20003 yea, I started doing some work but was staled, let me check.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20897: [MINOR][DOC] Fix a few markdown typos
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20897 **[Test build #88557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88557/testReport)** for PR 20897 at commit [`937bbef`](https://github.com/apache/spark/commit/937bbef522eedddbcb502f7f9692564040a63cd7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20889: [MINOR][DOC] Fix ml-guide markdown typos
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20889 @Lemonjing you need to close the PR from github.com - we don't have access to close it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org