[GitHub] spark issue #19099: [SPARK-21652][SQL] Fix rule confliction between InferFil...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19099 **[Test build #81299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81299/testReport)** for PR 19099 at commit [`7a364a1`](https://github.com/apache/spark/commit/7a364a192f15bc99e362a2615c775730cb11fc24). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19099: [SPARK-21652][SQL] Fix rule confliction between InferFil...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19099 cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19060 Previous parquet link is broken. The official one is https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/example/TestInputOutputFormat.java --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19099: [SPARK-21652][SQL] Fix rule confliction between I...
GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/19099 [SPARK-21652][SQL] Fix rule confliction between InferFiltersFromConstraints and ConstantPropagation ## What changes were proposed in this pull request? For the given example below, the predicate added by `InferFiltersFromConstraints` is folded by `ConstantPropagation` later, this leads to unconverged optimize iteration: ``` Seq((1, 1)).toDF("col1", "col2").createOrReplaceTempView("t1") Seq(1, 2).toDF("col").createOrReplaceTempView("t2") sql("SELECT * FROM t1, t2 WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1 = t2.col AND t1.col2 = t2.col") ``` We can fix this by adjusting the indent of the optimize rules. ## How was this patch tested? Add test case that would have failed in `SQLQuerySuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiangxb1987/spark unconverge-optimization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19099.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19099 commit 7a364a192f15bc99e362a2615c775730cb11fc24 Author: Xingbo JiangDate: 2017-08-31T21:49:02Z fix rule confliction between InferFiltersFromConstraints and ConstantPropagation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19060: [WIP][SQL] Add DataSourceSuite validating data sources l...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19060 **[Test build #81298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81298/testReport)** for PR 19060 at commit [`104f24c`](https://github.com/apache/spark/commit/104f24c9ad0743dc7c6329b4c0dde902e8e87de6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16774: [SPARK-19357][ML] Adding parallel model evaluatio...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/16774#discussion_r136457345 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -120,6 +120,33 @@ class CrossValidatorSuite } } + test("cross validation with parallel evaluation") { +val lr = new LogisticRegression +val lrParamMaps = new ParamGridBuilder() + .addGrid(lr.regParam, Array(0.001, 1000.0)) + .addGrid(lr.maxIter, Array(0, 3)) + .build() +val eval = new BinaryClassificationEvaluator +val cv = new CrossValidator() + .setEstimator(lr) + .setEstimatorParamMaps(lrParamMaps) + .setEvaluator(eval) + .setNumFolds(2) + .setParallelism(1) +val cvSerialModel = cv.fit(dataset) +cv.setParallelism(2) --- End diff -- It's a little difficult to do this in a unit test without making it flaky. I have run tests manually and verified it is working by both the expected speedup in timing and that the expected number of tasks are run concurrently. I can post some results if that would help. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16774: [SPARK-19357][ML] Adding parallel model evaluatio...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/16774#discussion_r136456379 --- Diff: mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala --- @@ -120,6 +120,33 @@ class CrossValidatorSuite } } + test("cross validation with parallel evaluation") { +val lr = new LogisticRegression +val lrParamMaps = new ParamGridBuilder() + .addGrid(lr.regParam, Array(0.001, 1000.0)) + .addGrid(lr.maxIter, Array(0, 3)) + .build() +val eval = new BinaryClassificationEvaluator +val cv = new CrossValidator() + .setEstimator(lr) + .setEstimatorParamMaps(lrParamMaps) + .setEvaluator(eval) + .setNumFolds(2) + .setParallelism(1) --- End diff -- So the seed param here is fixed by default and doesn't need to be set to ensure consistent results. I think that's why it's not set in the other tests in this suite. I'm not a fan of this behavior and I think it's better to explicitly set in tests, but then we should probably be consistent and set elsewhere too. What are your thoughts on this @MLnick ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19082: [SPARK-21870][SQL] Split aggregation code into sm...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19082#discussion_r136455832 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -244,6 +246,92 @@ case class HashAggregateExec( protected override val shouldStopRequired = false + // We assume a prefix has lower cases and a name has camel cases + private val variableName = "^[a-z]+_[a-zA-Z]+[0-9]*".r + + // Returns true if a given name id belongs to this `CodegenContext` + private def isVariable(nameId: String): Boolean = nameId match { +case variableName() => true +case _ => false + } + + // Extracts all the outer references for a given `aggExpr`. This result will be used to split + // aggregation into small functions. + private def getOuterReferences( + ctx: CodegenContext, + aggExpr: Expression, + subExprs: Map[Expression, SubExprEliminationState]): Set[(String, String)] = { +val stack = mutable.Stack[Expression](aggExpr) +val argSet = mutable.Set[(String, String)]() +val addIfNotLiteral = (value: String, tpe: String) => { + if (isVariable(value)) { +argSet += ((tpe, value)) + } +} +while (stack.nonEmpty) { + stack.pop() match { +case e if subExprs.contains(e) => + val exprCode = subExprs(e) + addIfNotLiteral(exprCode.value, ctx.javaType(e.dataType)) + addIfNotLiteral(exprCode.isNull, "boolean") + // Since the children possibly has common expressions, we push them here + stack.pushAll(e.children) +case ref: BoundReference +if ctx.currentVars != null && ctx.currentVars(ref.ordinal) != null => + val argVal = ctx.currentVars(ref.ordinal).value + addIfNotLiteral(argVal, ctx.javaType(ref.dataType)) + addIfNotLiteral(ctx.currentVars(ref.ordinal).isNull, "boolean") +case _: BoundReference => + argSet += (("InternalRow", ctx.INPUT_ROW)) +case e => + stack.pushAll(e.children) + } +} + +argSet.toSet + } + + // Splits the aggregation into small functions because the HotSpot does not compile + // too long functions. + private def splitAggregateExpressions( + ctx: CodegenContext, + aggExprs: Seq[Expression], + evalAndUpdateCodes: Seq[String], + subExprs: Map[Expression, SubExprEliminationState], + otherArgs: Seq[(String, String)] = Seq.empty): Seq[String] = { +aggExprs.zipWithIndex.map { case (aggExpr, i) => + // The maximum number of parameters in Java methods is 255, so this method gives up splitting + // the code if the number goes over the limit. + // You can find more information about the limit in the JVM specification: + // - The number of method parameters is limited to 255 by the definition of a method + // descriptor, where the limit includes one unit for this in the case of instance + // or interface method invocations. + val args = (getOuterReferences(ctx, aggExpr, subExprs) ++ otherArgs).toSeq + + // This is for testing/benchmarking only + val maxParamNumInJavaMethod = + sqlContext.getConf("spark.sql.codegen.aggregate.maxParamNumInJavaMethod", null) match { --- End diff -- Can we add a check code if a user specify a value that is more than 255? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18883: [SPARK-21276][CORE] Update lz4-java to the latest...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18883#discussion_r136451595 --- Diff: project/MimaExcludes.scala --- @@ -41,7 +41,10 @@ object MimaExcludes { // [SPARK-19937] Add remote bytes read to disk. ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetrics.this"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetricDistributions.this") + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetricDistributions.this"), + +// [SPARK-21276] Update lz4-java to the latest (v1.4.0) + ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.io.LZ4BlockInputStream") --- End diff -- By the way, I'm not sure if we want to pursue strictly compatible. Just to point out the issue here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19097: [SPARK-17107][SQL][FOLLOW-UP] Remove redundant pu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19097 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19097: [SPARK-17107][SQL][FOLLOW-UP] Remove redundant pushdown ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19097 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18883: [SPARK-21276][CORE] Update lz4-java to the latest...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18883#discussion_r136450775 --- Diff: project/MimaExcludes.scala --- @@ -41,7 +41,10 @@ object MimaExcludes { // [SPARK-19937] Add remote bytes read to disk. ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetrics.this"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetricDistributions.this") + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetricDistributions.this"), + +// [SPARK-21276] Update lz4-java to the latest (v1.4.0) + ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.io.LZ4BlockInputStream") --- End diff -- But the user may write some codes to run different logics according to the InputStream types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18883: [SPARK-21276][CORE] Update lz4-java to the latest...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18883#discussion_r136448647 --- Diff: project/MimaExcludes.scala --- @@ -41,7 +41,10 @@ object MimaExcludes { // [SPARK-19937] Add remote bytes read to disk. ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetrics.this"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetricDistributions.this") + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetricDistributions.this"), + +// [SPARK-21276] Update lz4-java to the latest (v1.4.0) + ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.io.LZ4BlockInputStream") --- End diff -- It's "public" only insofar as it has to be in Java to use it this way. There's no case where a user should or would use this class directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18975 **[Test build #81297 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81297/testReport)** for PR 18975 at commit [`e2db5e1`](https://github.com/apache/spark/commit/e2db5e1e0cc491480828328e07b7bb619dc05bbd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19098: [SPARK-21583][HOTFIX] Removed intercept in test causing ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19098 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81293/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19098: [SPARK-21583][HOTFIX] Removed intercept in test causing ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19098 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19098: [SPARK-21583][HOTFIX] Removed intercept in test causing ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19098 **[Test build #81293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81293/testReport)** for PR 19098 at commit [`567487d`](https://github.com/apache/spark/commit/567487d6089400527a1b30aca054ea517174a08d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136443263 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.datasources._ + +/** + * A command used to write the result of a query to a directory. + * + * The syntax of using this command in SQL is: + * {{{ + * INSERT OVERWRITE DIRECTORY (path=STRING)? + * USING format OPTIONS ([option1_name "option1_value", option2_name "option2_value", ...]) + * SELECT ... + * }}} + */ +case class InsertIntoDataSourceDirCommand( +storage: CatalogStorageFormat, +provider: Option[String], +query: LogicalPlan, +overwrite: Boolean) extends RunnableCommand { + + override def innerChildren: Seq[LogicalPlan] = Seq(query) + + override def run(sparkSession: SparkSession): Seq[Row] = { +assert(innerChildren.length == 1) +assert(storage.locationUri.nonEmpty, "Directory path is required") +assert(provider.isDefined, "Data source is required") + +// Create the relation based on the input logical plan: `data`. +val pathOption = storage.locationUri.map("path" -> CatalogUtils.URIToString(_)) +val dataSource = DataSource( --- End diff -- @gatorsmile I am not familiar with data source. Is it possible that you can give me some hints how to limit this to only "FileFormat"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18883: [SPARK-21276][CORE] Update lz4-java to the latest...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18883#discussion_r136443138 --- Diff: project/MimaExcludes.scala --- @@ -41,7 +41,10 @@ object MimaExcludes { // [SPARK-19937] Add remote bytes read to disk. ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetrics.this"), - ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetricDistributions.this") + ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.status.api.v1.ShuffleReadMetricDistributions.this"), + +// [SPARK-21276] Update lz4-java to the latest (v1.4.0) + ProblemFilters.exclude[MissingClassProblem]("org.apache.spark.io.LZ4BlockInputStream") --- End diff -- @srowen This is a breaking change. We should not remove a public class that is in the api docs: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.io.LZ4BlockInputStream --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136439466 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -534,4 +534,83 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef } } } + + test("insert overwrite to dir from hive metastore table") { +withTempDir { dir => + val path = dir.toURI.getPath + + checkAnswer( +sql(s"INSERT OVERWRITE LOCAL DIRECTORY '${path}' SELECT * FROM src where key < 10"), +Seq.empty[Row]) + + checkAnswer( +sql( + s""" + |INSERT OVERWRITE LOCAL DIRECTORY '${path}' + |STORED AS orc + |SELECT * FROM src where key < 10 + """.stripMargin), +Seq.empty[Row]) + + // use orc data source to check the data of path is right. + withTempView("orc_source") { +sql( + s""" + |CREATE TEMPORARY VIEW orc_source + |USING org.apache.spark.sql.hive.orc + |OPTIONS ( + | PATH '${dir.getCanonicalPath}' + |) + """.stripMargin) + +checkAnswer( + sql("select * from orc_source"), + sql("select * from src where key < 10").collect()) + } +} + } + + test("insert overwrite to dir from temp table") { +withTempView("test_insert_table") { + spark.range(10).selectExpr("id", "id AS str").createOrReplaceTempView("test_insert_table") + + withTempDir { dir => +val path = dir.toURI.getPath + +checkAnswer( + sql( +s""" + |INSERT OVERWRITE LOCAL DIRECTORY '${path}' --- End diff -- added --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19097: [SPARK-17107][SQL][FOLLOW-UP] Remove redundant pushdown ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19097 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19097: [SPARK-17107][SQL][FOLLOW-UP] Remove redundant pushdown ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19097 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81292/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19097: [SPARK-17107][SQL][FOLLOW-UP] Remove redundant pushdown ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19097 **[Test build #81292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81292/testReport)** for PR 19097 at commit [`c568282`](https://github.com/apache/spark/commit/c5682826710e784e283762e76d2ce0760af142d5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18306: [SPARK-21029][SS] All StreamingQuery should be st...
Github user aray commented on a diff in the pull request: https://github.com/apache/spark/pull/18306#discussion_r136436631 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -562,6 +563,8 @@ class SparkContext(config: SparkConf) extends Logging { } _cleaner.foreach(_.start()) +_stopHooks = new SparkShutdownHookManager() --- End diff -- The queries also need to be gracefully stopped if someone calls `sc.stop()` without shutting down the JVM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136436585 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -534,4 +534,83 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef } } } + + test("insert overwrite to dir from hive metastore table") { +withTempDir { dir => + val path = dir.toURI.getPath + + checkAnswer( +sql(s"INSERT OVERWRITE LOCAL DIRECTORY '${path}' SELECT * FROM src where key < 10"), +Seq.empty[Row]) --- End diff -- ok. updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19089: [SPARK-21728][core] Follow up: fix user config, auth in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19089 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19089: [SPARK-21728][core] Follow up: fix user config, auth in ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19089 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81290/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19098: [SPARK-21583][HOTFIX] Removed intercept in test causing ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19098 The asserts are wrong if this can be called from user code. It should be `require`. The reason is, basically, exactly this. If you don't have assertions on this argument is accepted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19089: [SPARK-21728][core] Follow up: fix user config, auth in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19089 **[Test build #81290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81290/testReport)** for PR 19089 at commit [`31d6c77`](https://github.com/apache/spark/commit/31d6c776cfad48c1835effc417ec2116fada757f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19093: [SPARK-21880][web UI]In the SQL table page, modify jobs ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19093 **[Test build #81296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81296/testReport)** for PR 19093 at commit [`6ae5f2b`](https://github.com/apache/spark/commit/6ae5f2b27cc08ec5bf0d6f9986516887e9a4b36a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18975 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18975 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81291/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19093: [SPARK-21880][web UI]In the SQL table page, modify jobs ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19093 LGTM pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18975 **[Test build #81291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81291/testReport)** for PR 18975 at commit [`b2068ce`](https://github.com/apache/spark/commit/b2068ce27eec36e5970206d48282e36e09ebbec0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19093: [SPARK-21880][web UI]In the SQL table page, modify jobs ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19093 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19098: [SPARK-21583][HOTFIX] Removed intercept in test causing ...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/19098 @srowen , the assertion is from the Spark `ColumnarBatch` [here](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java#L491). ```java public ColumnarBatch.Row getRow(int rowId) { assert(rowId >= 0); assert(rowId < numRows); row.rowId = rowId; return row; } ``` I'm also not quite sure why this wasn't working, I checked and `-ea` is an added argument in the pom. Still, I wonder if we should change the asserts in this class to something better. Maybe these are used instead of exceptions for performance? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18704 **[Test build #81295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81295/testReport)** for PR 18704 at commit [`097fc05`](https://github.com/apache/spark/commit/097fc0502b059222f4cbc77c4aa0019bf013b6a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19098: [SPARK-21583][HOTFIX] Removed intercept in test c...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19098#discussion_r136427951 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1308,10 +1308,6 @@ class ColumnarBatchSuite extends SparkFunSuite { } } -intercept[java.lang.AssertionError] { - batch.getRow(100) --- End diff -- Thanks @gatorsmile , I'll put this in another test once I figure out why it wasn't being hit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19093: [SPARK-21880][web UI]In the SQL table page, modify jobs ...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/19093 I'm ok with this change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18818: [SPARK-21110][SQL] Structs, arrays, and other orderable ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18818 **[Test build #81294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81294/testReport)** for PR 18818 at commit [`6e01186`](https://github.com/apache/spark/commit/6e011860ed800c9f869b66674cb241d3bb2d94fc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18692 Sorry for the delay. @jiangxb1987 will submit a simple fix for the issue you mentioned. That will not be a perfect fix but it partially resolve the issue. In the future, we need to move the filter removal to a separate batch for cost-based optimization instead of doing it with filter inference in the same RBO batch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18692: [SPARK-21417][SQL] Infer join conditions using propagate...
Github user aokolnychyi commented on the issue: https://github.com/apache/spark/pull/18692 @gatorsmile what is our decision here? Shall we wait until SPARK-21652 is resolved? In the meantime, I can add some tests and see how the proposed rule works together with all others. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18818: [SPARK-21110][SQL] Structs, arrays, and other orderable ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18818 LGTM except one comment. Thanks for working on it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18818: [SPARK-21110][SQL] Structs, arrays, and other ord...
Github user aray commented on a diff in the pull request: https://github.com/apache/spark/pull/18818#discussion_r136421644 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -582,6 +582,7 @@ class CodegenContext { case array: ArrayType => genComp(array, c1, c2) + " == 0" case struct: StructType => genComp(struct, c1, c2) + " == 0" case udt: UserDefinedType[_] => genEqual(udt.sqlType, c1, c2) +case NullType => "true" --- End diff -- Yea, codegen fails without this. I had originally made the value `false` but when i noticed the codegen for comparison (https://github.com/aray/spark/blob/cc2f3eca28ee6b9faa87853568205307567827cc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L606) returned `0`, I changed it to be consistent. Happy to change it back though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136421034 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/InsertIntoDataSourceDirCommand.scala --- @@ -0,0 +1,65 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.command + +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.datasources._ + +/** + * A command used to write the result of a query to a directory. + * + * The syntax of using this command in SQL is: + * {{{ + * INSERT OVERWRITE DIRECTORY (path=STRING)? + * USING format OPTIONS ([option1_name "option1_value", option2_name "option2_value", ...]) + * SELECT ... + * }}} + */ +case class InsertIntoDataSourceDirCommand( +storage: CatalogStorageFormat, +provider: Option[String], --- End diff -- updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19080: [SPARK-21865][SQL] simplify the distribution sema...
Github user aray commented on a diff in the pull request: https://github.com/apache/spark/pull/19080#discussion_r136419947 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -284,24 +241,17 @@ case class RangePartitioning(ordering: Seq[SortOrder], numPartitions: Int) override def nullable: Boolean = false override def dataType: DataType = IntegerType - override def satisfies(required: Distribution): Boolean = required match { -case UnspecifiedDistribution => true -case OrderedDistribution(requiredOrdering) => - val minSize = Seq(requiredOrdering.size, ordering.size).min - requiredOrdering.take(minSize) == ordering.take(minSize) -case ClusteredDistribution(requiredClustering) => - ordering.map(_.child).forall(x => requiredClustering.exists(_.semanticEquals(x))) -case _ => false - } - - override def compatibleWith(other: Partitioning): Boolean = other match { -case o: RangePartitioning => this.semanticEquals(o) -case _ => false - } - - override def guarantees(other: Partitioning): Boolean = other match { -case o: RangePartitioning => this.semanticEquals(o) -case _ => false + override def satisfies(required: Distribution): Boolean = { +super.satisfies(required) || { + required match { +case OrderedDistribution(requiredOrdering) => + val minSize = Seq(requiredOrdering.size, ordering.size).min + requiredOrdering.take(minSize) == ordering.take(minSize) --- End diff -- While we are cleaning things up, this needs fixed. `RangePartitioning(a+,b+)` does not satisfy `OrderedDistribution(a+)`. It violates the requirement that all values of `a` need to be in the same partition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18818: [SPARK-21110][SQL] Structs, arrays, and other ord...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18818#discussion_r136419981 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -582,6 +582,7 @@ class CodegenContext { case array: ArrayType => genComp(array, c1, c2) + " == 0" case struct: StructType => genComp(struct, c1, c2) + " == 0" case udt: UserDefinedType[_] => genEqual(udt.sqlType, c1, c2) +case NullType => "true" --- End diff -- I found the test case, but the test case is not affected by the value we generate here since it is under `nullSafeCodeGen`. However, we should still return `false` when doing `null = null` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136419843 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive.execution + +import java.util.Properties + +import scala.language.existentials + +import org.apache.hadoop.fs.{FileSystem, Path} +import org.apache.hadoop.hive.common.FileUtils +import org.apache.hadoop.hive.ql.plan.TableDesc +import org.apache.hadoop.hive.serde.serdeConstants +import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe +import org.apache.hadoop.mapred._ + +import org.apache.spark.sql.{Row, SparkSession} +import org.apache.spark.sql.catalyst.catalog.CatalogStorageFormat +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.SparkPlan +import org.apache.spark.util.Utils + + +case class InsertIntoHiveDirCommand( --- End diff -- updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136418390 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala --- @@ -155,6 +156,9 @@ object HiveAnalysis extends Rule[LogicalPlan] { case CreateTable(tableDesc, mode, Some(query)) if DDLUtils.isHiveTable(tableDesc) => CreateHiveTableAsSelectCommand(tableDesc, query, mode) + +case InsertIntoDir(isLocal, storage, _, child, overwrite) => --- End diff -- updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136418037 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -140,6 +141,9 @@ case class DataSourceAnalysis(conf: SQLConf) extends Rule[LogicalPlan] with Cast parts, query, overwrite, false) if parts.isEmpty => InsertIntoDataSourceCommand(l, query, overwrite) +case InsertIntoDir(_, storage, provider, query, overwrite) if provider.nonEmpty => --- End diff -- updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18975#discussion_r136417593 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1509,4 +1509,84 @@ class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { query: LogicalPlan): LogicalPlan = { RepartitionByExpression(expressions, query, conf.numShufflePartitions) } + + /** + * Return the parameters for [[InsertIntoDir]] logical plan. + * + * Expected format: + * {{{ + * INSERT OVERWRITE DIRECTORY + * [path] + * [OPTIONS table_property_list] + * select_statement; + * }}} + */ + override def visitInsertOverwriteDir( + ctx: InsertOverwriteDirContext): InsertDirParams = withOrigin(ctx) { +val options = Option(ctx.options).map(visitPropertyKeyValues).getOrElse(Map.empty) +var storage = DataSource.buildStorageFormatFromOptions(options) + +val path = Option(ctx.path) match { + case Some(s) => string(s) + case None => "" +} + +if (!path.isEmpty && storage.locationUri.isDefined) { + throw new ParseException( +"Directory path and 'path' in OPTIONS are both used to indicate the directory path, " + + "you can only specify one of them.", ctx) +} +if (path.isEmpty && !storage.locationUri.isDefined) { + throw new ParseException( +"You need to specify directory path or 'path' in OPTIONS, but not both", ctx) +} + +if (!path.isEmpty) { + val customLocation = Some(CatalogUtils.stringToURI(path)) + storage = storage.copy(locationUri = customLocation) +} + +val provider = ctx.tableProvider.qualifiedName.getText + +(false, storage, Some(provider)) + } + + /** + * Return the parameters for [[InsertIntoDir]] logical plan. + * + * Expected format: + * {{{ + * INSERT OVERWRITE [LOCAL] DIRECTORY + * path + * [ROW FORMAT row_format] + * [STORED AS file_format] + * select_statement; + * }}} + */ + override def visitInsertOverwriteHiveDir( + ctx: InsertOverwriteHiveDirContext): InsertDirParams = withOrigin(ctx) { +validateRowFormatFileFormat(ctx.rowFormat, ctx.createFileFormat, ctx) +val rowStorage = Option(ctx.rowFormat).map(visitRowFormat) + .getOrElse(CatalogStorageFormat.empty) +val fileStorage = Option(ctx.createFileFormat).map(visitCreateFileFormat) + .getOrElse(CatalogStorageFormat.empty) + +val path = string(ctx.path) +// The path field is required +if (path.isEmpty) { + operationNotAllowed("INSERT OVERWRITE DIRECTORY must be accompanied by path", ctx) +} + +val defaultStorage = HiveSerDe.getDefaultStorage(conf) + +val storage = CatalogStorageFormat( + locationUri = Some(CatalogUtils.stringToURI(path)), + inputFormat = fileStorage.inputFormat.orElse(defaultStorage.inputFormat), + outputFormat = fileStorage.outputFormat.orElse(defaultStorage.outputFormat), + serde = rowStorage.serde.orElse(fileStorage.serde).orElse(defaultStorage.serde), + compressed = false, + properties = rowStorage.properties ++ fileStorage.properties) + +(ctx.LOCAL != null, storage, None) --- End diff -- got it. updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18818: [SPARK-21110][SQL] Structs, arrays, and other ord...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18818#discussion_r136416655 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -582,6 +582,7 @@ class CodegenContext { case array: ArrayType => genComp(array, c1, c2) + " == 0" case struct: StructType => genComp(struct, c1, c2) + " == 0" case udt: UserDefinedType[_] => genEqual(udt.sqlType, c1, c2) +case NullType => "true" --- End diff -- Is this required? Will it be covered by any test? BTW, the value should be `false`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19098: [SPARK-21583][HOTFIX] Removed intercept in test causing ...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19098 Hm, I wonder how that results in an assertion? that's a normal error case and shouldn't cause an assert. Is it from a third-party library in this case, like Arrow? really we should fix that somehow so that the user-visible contract for this behavior never involves AssertionError. Still, assertions ought to be _enabled_ during tests anyway, so I don't see how this doesn't actually fire. If it only affects the Maven build I'd suspect that maybe the scalatest-maven-plugin somehow doesn't turn on assertions? but it has `-ea` in its command line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19098: [SPARK-21583][HOTFIX] Removed intercept in test c...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19098 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19098: [SPARK-21583][HOTFIX] Removed intercept in test causing ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19098 LGTM. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19098: [SPARK-21583][HOTFIX] Removed intercept in test c...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19098#discussion_r136413800 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1308,10 +1308,6 @@ class ColumnarBatchSuite extends SparkFunSuite { } } -intercept[java.lang.AssertionError] { - batch.getRow(100) --- End diff -- Please add another test in a follow-up PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r136413085 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { s"vectorized reader")) } } + + test("create columnar batch from Arrow column vectors") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector1.allocateNew() +val mutator1 = vector1.getMutator() +val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector2.allocateNew() +val mutator2 = vector2.getMutator() + +(0 until 10).foreach { i => + mutator1.setSafe(i, i) + mutator2.setSafe(i + 1, i) +} +mutator1.setNull(10) +mutator1.setValueCount(11) +mutator2.setNull(0) +mutator2.setValueCount(11) + +val columnVectors = Seq(new ArrowColumnVector(vector1), new ArrowColumnVector(vector2)) + +val schema = StructType(Seq(StructField("int1", IntegerType), StructField("int2", IntegerType))) +val batch = new ColumnarBatch(schema, columnVectors.toArray[ColumnVector], 11) +batch.setNumRows(11) + +assert(batch.numCols() == 2) +assert(batch.numRows() == 11) + +val rowIter = batch.rowIterator().asScala +rowIter.zipWithIndex.foreach { case (row, i) => + if (i == 10) { +assert(row.isNullAt(0)) + } else { +assert(row.getInt(0) == i) + } + if (i == 0) { +assert(row.isNullAt(1)) + } else { +assert(row.getInt(1) == i - 1) + } +} + +intercept[java.lang.AssertionError] { + batch.getRow(100) --- End diff -- I just made #19098 to remove this check - it's not really testing the functionality added here anyway but maybe another test should be added for checkout index out of bounds errors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19098: [SPARK-21583][HOTFIX] Removed intercept in test causing ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19098 **[Test build #81293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81293/testReport)** for PR 19098 at commit [`567487d`](https://github.com/apache/spark/commit/567487d6089400527a1b30aca054ea517174a08d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19098: [SPARK-21583][HOTFIX] Removed intercept in test c...
GitHub user BryanCutler opened a pull request: https://github.com/apache/spark/pull/19098 [SPARK-21583][HOTFIX] Removed intercept in test causing failures Removing a check in the ColumnarBatchSuite that depended on a Java assertion. This assertion is being compiled out in the Maven builds causing the test to fail. This part of the test is not specifically from to the functionality that is being tested here. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/spark hotfix-ColumnarBatchSuite-assertion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19098.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19098 commit 567487d6089400527a1b30aca054ea517174a08d Author: Bryan CutlerDate: 2017-08-31T18:21:06Z this itercept relies on a Java assertion that could be compiled out, failing the test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18306: [SPARK-21029][SS] All StreamingQuery should be st...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18306#discussion_r136410312 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -562,6 +563,8 @@ class SparkContext(config: SparkConf) extends Logging { } _cleaner.foreach(_.start()) +_stopHooks = new SparkShutdownHookManager() --- End diff -- there's already a shutdown hook to call sc.stop() - perhaps just add the clean up in stop() https://github.com/aray/spark/blob/005472ed10fad3d1bc8feff12fc55c5682724a0e/core/src/main/scala/org/apache/spark/SparkContext.scala#L584 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r136409601 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { s"vectorized reader")) } } + + test("create columnar batch from Arrow column vectors") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector1.allocateNew() +val mutator1 = vector1.getMutator() +val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector2.allocateNew() +val mutator2 = vector2.getMutator() + +(0 until 10).foreach { i => + mutator1.setSafe(i, i) + mutator2.setSafe(i + 1, i) +} +mutator1.setNull(10) +mutator1.setValueCount(11) +mutator2.setNull(0) +mutator2.setValueCount(11) + +val columnVectors = Seq(new ArrowColumnVector(vector1), new ArrowColumnVector(vector2)) + +val schema = StructType(Seq(StructField("int1", IntegerType), StructField("int2", IntegerType))) +val batch = new ColumnarBatch(schema, columnVectors.toArray[ColumnVector], 11) +batch.setNumRows(11) + +assert(batch.numCols() == 2) +assert(batch.numRows() == 11) + +val rowIter = batch.rowIterator().asScala +rowIter.zipWithIndex.foreach { case (row, i) => + if (i == 10) { +assert(row.isNullAt(0)) + } else { +assert(row.getInt(0) == i) + } + if (i == 0) { +assert(row.isNullAt(1)) + } else { +assert(row.getInt(1) == i - 1) + } +} + +intercept[java.lang.AssertionError] { + batch.getRow(100) --- End diff -- I think the problem is that if the Java assertion is compiled out, then no error is produce and the test fails. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81286/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18697 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18697 **[Test build #81286 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81286/testReport)** for PR 18697 at commit [`0f21237`](https://github.com/apache/spark/commit/0f21237b61a59bfcbf384866e06323a667154924). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r136408878 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { s"vectorized reader")) } } + + test("create columnar batch from Arrow column vectors") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector1.allocateNew() +val mutator1 = vector1.getMutator() +val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector2.allocateNew() +val mutator2 = vector2.getMutator() + +(0 until 10).foreach { i => + mutator1.setSafe(i, i) + mutator2.setSafe(i + 1, i) +} +mutator1.setNull(10) +mutator1.setValueCount(11) +mutator2.setNull(0) +mutator2.setValueCount(11) + +val columnVectors = Seq(new ArrowColumnVector(vector1), new ArrowColumnVector(vector2)) + +val schema = StructType(Seq(StructField("int1", IntegerType), StructField("int2", IntegerType))) +val batch = new ColumnarBatch(schema, columnVectors.toArray[ColumnVector], 11) +batch.setNumRows(11) + +assert(batch.numCols() == 2) +assert(batch.numRows() == 11) + +val rowIter = batch.rowIterator().asScala +rowIter.zipWithIndex.foreach { case (row, i) => + if (i == 10) { +assert(row.isNullAt(0)) + } else { +assert(row.getInt(0) == i) + } + if (i == 0) { +assert(row.isNullAt(1)) + } else { +assert(row.getInt(1) == i - 1) + } +} + +intercept[java.lang.AssertionError] { + batch.getRow(100) --- End diff -- Maybe? ```scala val m = intercept[java.lang.AssertionError] { ... }.getMessage assert(m.contains(...)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r136408531 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { s"vectorized reader")) } } + + test("create columnar batch from Arrow column vectors") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector1.allocateNew() +val mutator1 = vector1.getMutator() +val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector2.allocateNew() +val mutator2 = vector2.getMutator() + +(0 until 10).foreach { i => + mutator1.setSafe(i, i) + mutator2.setSafe(i + 1, i) +} +mutator1.setNull(10) +mutator1.setValueCount(11) +mutator2.setNull(0) +mutator2.setValueCount(11) + +val columnVectors = Seq(new ArrowColumnVector(vector1), new ArrowColumnVector(vector2)) + +val schema = StructType(Seq(StructField("int1", IntegerType), StructField("int2", IntegerType))) +val batch = new ColumnarBatch(schema, columnVectors.toArray[ColumnVector], 11) +batch.setNumRows(11) + +assert(batch.numCols() == 2) +assert(batch.numRows() == 11) + +val rowIter = batch.rowIterator().asScala +rowIter.zipWithIndex.foreach { case (row, i) => + if (i == 10) { +assert(row.isNullAt(0)) + } else { +assert(row.getInt(0) == i) + } + if (i == 0) { +assert(row.isNullAt(1)) + } else { +assert(row.getInt(1) == i - 1) + } +} + +intercept[java.lang.AssertionError] { + batch.getRow(100) --- End diff -- Then, please check the error message here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r136408063 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { s"vectorized reader")) } } + + test("create columnar batch from Arrow column vectors") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector1.allocateNew() +val mutator1 = vector1.getMutator() +val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector2.allocateNew() +val mutator2 = vector2.getMutator() + +(0 until 10).foreach { i => + mutator1.setSafe(i, i) + mutator2.setSafe(i + 1, i) +} +mutator1.setNull(10) +mutator1.setValueCount(11) +mutator2.setNull(0) +mutator2.setValueCount(11) + +val columnVectors = Seq(new ArrowColumnVector(vector1), new ArrowColumnVector(vector2)) + +val schema = StructType(Seq(StructField("int1", IntegerType), StructField("int2", IntegerType))) +val batch = new ColumnarBatch(schema, columnVectors.toArray[ColumnVector], 11) +batch.setNumRows(11) + +assert(batch.numCols() == 2) +assert(batch.numRows() == 11) + +val rowIter = batch.rowIterator().asScala +rowIter.zipWithIndex.foreach { case (row, i) => + if (i == 10) { +assert(row.isNullAt(0)) + } else { +assert(row.getInt(0) == i) + } + if (i == 0) { +assert(row.isNullAt(1)) + } else { +assert(row.getInt(1) == i - 1) + } +} + +intercept[java.lang.AssertionError] { + batch.getRow(100) --- End diff -- It's probably because the assert is being compiled out.. This should probably not be in the test then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r136407451 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { s"vectorized reader")) } } + + test("create columnar batch from Arrow column vectors") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector1.allocateNew() +val mutator1 = vector1.getMutator() +val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector2.allocateNew() +val mutator2 = vector2.getMutator() + +(0 until 10).foreach { i => + mutator1.setSafe(i, i) + mutator2.setSafe(i + 1, i) +} +mutator1.setNull(10) +mutator1.setValueCount(11) +mutator2.setNull(0) +mutator2.setValueCount(11) + +val columnVectors = Seq(new ArrowColumnVector(vector1), new ArrowColumnVector(vector2)) + +val schema = StructType(Seq(StructField("int1", IntegerType), StructField("int2", IntegerType))) +val batch = new ColumnarBatch(schema, columnVectors.toArray[ColumnVector], 11) +batch.setNumRows(11) + +assert(batch.numCols() == 2) +assert(batch.numRows() == 11) + +val rowIter = batch.rowIterator().asScala +rowIter.zipWithIndex.foreach { case (row, i) => + if (i == 10) { +assert(row.isNullAt(0)) + } else { +assert(row.getInt(0) == i) + } + if (i == 0) { +assert(row.isNullAt(1)) + } else { +assert(row.getInt(1) == i - 1) + } +} + +intercept[java.lang.AssertionError] { + batch.getRow(100) --- End diff -- Thanks! It seems to happen Maven only. sbt-hadoop-2.6 passed. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/3480/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r136406559 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { s"vectorized reader")) } } + + test("create columnar batch from Arrow column vectors") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector1.allocateNew() +val mutator1 = vector1.getMutator() +val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector2.allocateNew() +val mutator2 = vector2.getMutator() + +(0 until 10).foreach { i => + mutator1.setSafe(i, i) + mutator2.setSafe(i + 1, i) +} +mutator1.setNull(10) +mutator1.setValueCount(11) +mutator2.setNull(0) +mutator2.setValueCount(11) + +val columnVectors = Seq(new ArrowColumnVector(vector1), new ArrowColumnVector(vector2)) + +val schema = StructType(Seq(StructField("int1", IntegerType), StructField("int2", IntegerType))) +val batch = new ColumnarBatch(schema, columnVectors.toArray[ColumnVector], 11) +batch.setNumRows(11) + +assert(batch.numCols() == 2) +assert(batch.numRows() == 11) + +val rowIter = batch.rowIterator().asScala +rowIter.zipWithIndex.foreach { case (row, i) => + if (i == 10) { +assert(row.isNullAt(0)) + } else { +assert(row.getInt(0) == i) + } + if (i == 0) { +assert(row.isNullAt(1)) + } else { +assert(row.getInt(1) == i - 1) + } +} + +intercept[java.lang.AssertionError] { + batch.getRow(100) --- End diff -- Hmm, that is strange. I'll take a look, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18837: [Spark-20812][Mesos] Add secrets support to the d...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18837 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18270 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81289/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18270 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18270 **[Test build #81289 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81289/testReport)** for PR 18270 at commit [`2c6ed67`](https://github.com/apache/spark/commit/2c6ed672aeb075243e453cadccaf24c9611735c6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18837: [Spark-20812][Mesos] Add secrets support to the dispatch...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18837 There are still some small issues (minor style nits, duplicating conf keys instead of using `CONSTANT.key`), but well. I can't really comment on the functionality itself so I'll trust your guys' judgement since you're way more familiar with Mesos. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17014: [SPARK-18608][ML] Fix double-caching in ML algorithms
Github user smurching commented on the issue: https://github.com/apache/spark/pull/17014 @WeichenXu123 That approach sounds reasonable to me. My main thought (& this might be obvious) is on the implementation level -- as long as we implement this by adding an `org.apache.spark.ml.Param` named `handlePersistence`, I think we can maintain binary compatibility. I'd be concerned about making `handlePersistence` an argument to `fit()`, which seems like it might [break binary compatibility](https://wiki.eclipse.org/Evolving_Java-based_APIs_2#Evolving_API_classes_-_API_methods_and_constructors). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18787: [SPARK-21583][SQL] Create a ColumnarBatch from Ar...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18787#discussion_r136404583 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1261,4 +1264,55 @@ class ColumnarBatchSuite extends SparkFunSuite { s"vectorized reader")) } } + + test("create columnar batch from Arrow column vectors") { +val allocator = ArrowUtils.rootAllocator.newChildAllocator("int", 0, Long.MaxValue) +val vector1 = ArrowUtils.toArrowField("int1", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector1.allocateNew() +val mutator1 = vector1.getMutator() +val vector2 = ArrowUtils.toArrowField("int2", IntegerType, nullable = true) + .createVector(allocator).asInstanceOf[NullableIntVector] +vector2.allocateNew() +val mutator2 = vector2.getMutator() + +(0 until 10).foreach { i => + mutator1.setSafe(i, i) + mutator2.setSafe(i + 1, i) +} +mutator1.setNull(10) +mutator1.setValueCount(11) +mutator2.setNull(0) +mutator2.setValueCount(11) + +val columnVectors = Seq(new ArrowColumnVector(vector1), new ArrowColumnVector(vector2)) + +val schema = StructType(Seq(StructField("int1", IntegerType), StructField("int2", IntegerType))) +val batch = new ColumnarBatch(schema, columnVectors.toArray[ColumnVector], 11) +batch.setNumRows(11) + +assert(batch.numCols() == 2) +assert(batch.numRows() == 11) + +val rowIter = batch.rowIterator().asScala +rowIter.zipWithIndex.foreach { case (row, i) => + if (i == 10) { +assert(row.isNullAt(0)) + } else { +assert(row.getInt(0) == i) + } + if (i == 0) { +assert(row.isNullAt(1)) + } else { +assert(row.getInt(1) == i - 1) + } +} + +intercept[java.lang.AssertionError] { + batch.getRow(100) --- End diff -- Hi, @BryanCutler and @ueshin . This seems to make master branch fail. Could you take a look once more? Thank you in advance! - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/3696/testReport/ - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/3730/testReport/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19097: [SPARK-17107][SQL][FOLLOW-UP] Remove redundant pushdown ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19097 **[Test build #81292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81292/testReport)** for PR 19097 at commit [`c568282`](https://github.com/apache/spark/commit/c5682826710e784e283762e76d2ce0760af142d5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19097: [SPARK-17107][SQL][FOLLOW-UP] Remove redundant pu...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/19097 [SPARK-17107][SQL][FOLLOW-UP] Remove redundant pushdown rule for Union ## What changes were proposed in this pull request? Also remove useless function `partitionByDeterministic` after the changes of https://github.com/apache/spark/pull/14687 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark followupSPARK-17107 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19097.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19097 commit c5682826710e784e283762e76d2ce0760af142d5 Author: gatorsmileDate: 2017-08-31T17:39:06Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19072: [SPARK-17139][ML][FOLLOW-UP] Add convenient method `asBi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19072 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19072: [SPARK-17139][ML][FOLLOW-UP] Add convenient method `asBi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19072 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81283/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19072: [SPARK-17139][ML][FOLLOW-UP] Add convenient method `asBi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19072 **[Test build #81283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81283/testReport)** for PR 19072 at commit [`e185d37`](https://github.com/apache/spark/commit/e185d37b9814c67d4e6d7f6404dc0900740bfded). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 Hi, @marmbrus , @liancheng , @yhuai . Could you give me some advice about this ORC upgrade PR? I tried to minimize the diff of PR, so I didn't remove the unused old one. Thank you in advance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19095: [SPARK-21886][SQL] Use SparkSession.internalCreateDataFr...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19095 @jaceklaskowski Maybe you can fix the PR title next time. Thanks for your work! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19065 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81288/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19065 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19065 **[Test build #81288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81288/testReport)** for PR 19065 at commit [`f13cd73`](https://github.com/apache/spark/commit/f13cd73926e80173228637da2015c7d6e7a0e848). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19095: [SPARK-21886][SQL] Use SparkSession.internalCreateDataFr...
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/19095 That was really quick! Thanks a lot @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18975 **[Test build #81291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81291/testReport)** for PR 18975 at commit [`b2068ce`](https://github.com/apache/spark/commit/b2068ce27eec36e5970206d48282e36e09ebbec0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18538 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18538 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81287/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19089: [SPARK-21728][core] Follow up: fix user config, auth in ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19089 **[Test build #81290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81290/testReport)** for PR 19089 at commit [`31d6c77`](https://github.com/apache/spark/commit/31d6c776cfad48c1835effc417ec2116fada757f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18538 **[Test build #81287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81287/testReport)** for PR 18538 at commit [`45d1380`](https://github.com/apache/spark/commit/45d1380574ece58ff63c34ff31af6243aff16c3c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19095: [SPARK-21886][SQL] Use SparkSession.internalCreat...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19095 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19095: [SPARK-21886][SQL] Use SparkSession.internalCreateDataFr...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19095 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19095: [SPARK-21886][SQL] Use SparkSession.internalCreateDataFr...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19095 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19078: [SPARK-21862][ML] Add overflow check in PCA
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19078 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81285/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19078: [SPARK-21862][ML] Add overflow check in PCA
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19078 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19078: [SPARK-21862][ML] Add overflow check in PCA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19078 **[Test build #81285 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81285/testReport)** for PR 19078 at commit [`3304092`](https://github.com/apache/spark/commit/33040929a0332853f5999b750714ce4be2c2b19d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18270 **[Test build #81289 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81289/testReport)** for PR 18270 at commit [`2c6ed67`](https://github.com/apache/spark/commit/2c6ed672aeb075243e453cadccaf24c9611735c6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18270 That commit is the code changes I suggested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org