[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21556 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92996/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202506240 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -803,6 +804,67 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex // Test inverseCanDrop() has taken effect testStringStartsWith(spark.range(1024).map(c => "100").toDF(), "value not like '10%'") } + + test("SPARK-17091: Convert IN predicate to Parquet filter push-down") { +val schema = StructType(Seq( + StructField("a", IntegerType, nullable = false) +)) + +val parquetSchema = new SparkToParquetSchemaConverter(conf).convert(schema) + +assertResult(Some(FilterApi.eq(intColumn("a"), null: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(null))) +} + +assertResult(Some(FilterApi.eq(intColumn("a"), 10: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10))) +} + +// Remove duplicates +assertResult(Some(FilterApi.eq(intColumn("a"), 10: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10, 10))) +} + +assertResult(Some(or(or( + FilterApi.eq(intColumn("a"), 10: Integer), + FilterApi.eq(intColumn("a"), 20: Integer)), + FilterApi.eq(intColumn("a"), 30: Integer))) +) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10, 20, 30))) +} + +assert(parquetFilters.createFilter(parquetSchema, sources.In("a", + Range(0, conf.parquetFilterPushDownInFilterThreshold).toArray)).isDefined) +assert(parquetFilters.createFilter(parquetSchema, sources.In("a", + Range(0, conf.parquetFilterPushDownInFilterThreshold + 1).toArray)).isEmpty) + +import testImplicits._ +withTempPath { path => + val data = 0 to 1024 + data.toDF("a").selectExpr("if (a = 1024, null, a) AS a") // convert 1024 to null +.coalesce(1).write.option("parquet.block.size", 512) +.parquet(path.getAbsolutePath) + val df = spark.read.parquet(path.getAbsolutePath) + Seq(true, false).foreach { pushEnabled => +withSQLConf( + SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> pushEnabled.toString) { + Seq(1, 5, 10, 11).foreach { count => +val filter = s"a in(${Range(0, count).mkString(",")})" +assert(df.where(filter).count() === count) +val actual = stripSparkFilter(df.where(filter)).collect().length +if (pushEnabled && count <= conf.parquetFilterPushDownInFilterThreshold) { + assert(actual > 1 && actual < data.length) --- End diff -- ah okay this tests block level filtering. lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21556 **[Test build #92996 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92996/testReport)** for PR 21556 at commit [`e713698`](https://github.com/apache/spark/commit/e7136984b482b3652e7ecfa542aec83aadc58320). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202505997 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -803,6 +804,67 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex // Test inverseCanDrop() has taken effect testStringStartsWith(spark.range(1024).map(c => "100").toDF(), "value not like '10%'") } + + test("SPARK-17091: Convert IN predicate to Parquet filter push-down") { +val schema = StructType(Seq( + StructField("a", IntegerType, nullable = false) +)) + +val parquetSchema = new SparkToParquetSchemaConverter(conf).convert(schema) + +assertResult(Some(FilterApi.eq(intColumn("a"), null: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(null))) +} + +assertResult(Some(FilterApi.eq(intColumn("a"), 10: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10))) +} + +// Remove duplicates +assertResult(Some(FilterApi.eq(intColumn("a"), 10: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10, 10))) +} + +assertResult(Some(or(or( + FilterApi.eq(intColumn("a"), 10: Integer), + FilterApi.eq(intColumn("a"), 20: Integer)), + FilterApi.eq(intColumn("a"), 30: Integer))) +) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10, 20, 30))) +} + +assert(parquetFilters.createFilter(parquetSchema, sources.In("a", + Range(0, conf.parquetFilterPushDownInFilterThreshold).toArray)).isDefined) +assert(parquetFilters.createFilter(parquetSchema, sources.In("a", + Range(0, conf.parquetFilterPushDownInFilterThreshold + 1).toArray)).isEmpty) + +import testImplicits._ +withTempPath { path => + val data = 0 to 1024 + data.toDF("a").selectExpr("if (a = 1024, null, a) AS a") // convert 1024 to null +.coalesce(1).write.option("parquet.block.size", 512) +.parquet(path.getAbsolutePath) + val df = spark.read.parquet(path.getAbsolutePath) + Seq(true, false).foreach { pushEnabled => +withSQLConf( + SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> pushEnabled.toString) { + Seq(1, 5, 10, 11).foreach { count => +val filter = s"a in(${Range(0, count).mkString(",")})" +assert(df.where(filter).count() === count) +val actual = stripSparkFilter(df.where(filter)).collect().length +if (pushEnabled && count <= conf.parquetFilterPushDownInFilterThreshold) { + assert(actual > 1 && actual < data.length) --- End diff -- If yoy intened to test block level filtering, you should disable record level filtering explicitly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202505899 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -803,6 +804,67 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex // Test inverseCanDrop() has taken effect testStringStartsWith(spark.range(1024).map(c => "100").toDF(), "value not like '10%'") } + + test("SPARK-17091: Convert IN predicate to Parquet filter push-down") { +val schema = StructType(Seq( + StructField("a", IntegerType, nullable = false) +)) + +val parquetSchema = new SparkToParquetSchemaConverter(conf).convert(schema) + +assertResult(Some(FilterApi.eq(intColumn("a"), null: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(null))) +} + +assertResult(Some(FilterApi.eq(intColumn("a"), 10: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10))) +} + +// Remove duplicates +assertResult(Some(FilterApi.eq(intColumn("a"), 10: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10, 10))) +} + +assertResult(Some(or(or( + FilterApi.eq(intColumn("a"), 10: Integer), + FilterApi.eq(intColumn("a"), 20: Integer)), + FilterApi.eq(intColumn("a"), 30: Integer))) +) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10, 20, 30))) +} + +assert(parquetFilters.createFilter(parquetSchema, sources.In("a", + Range(0, conf.parquetFilterPushDownInFilterThreshold).toArray)).isDefined) +assert(parquetFilters.createFilter(parquetSchema, sources.In("a", + Range(0, conf.parquetFilterPushDownInFilterThreshold + 1).toArray)).isEmpty) + +import testImplicits._ +withTempPath { path => + val data = 0 to 1024 + data.toDF("a").selectExpr("if (a = 1024, null, a) AS a") // convert 1024 to null +.coalesce(1).write.option("parquet.block.size", 512) +.parquet(path.getAbsolutePath) + val df = spark.read.parquet(path.getAbsolutePath) + Seq(true, false).foreach { pushEnabled => +withSQLConf( + SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> pushEnabled.toString) { + Seq(1, 5, 10, 11).foreach { count => +val filter = s"a in(${Range(0, count).mkString(",")})" +assert(df.where(filter).count() === count) +val actual = stripSparkFilter(df.where(filter)).collect().length +if (pushEnabled && count <= conf.parquetFilterPushDownInFilterThreshold) { + assert(actual > 1 && actual < data.length) --- End diff -- @wangyum, I think if we don't push down it should be the same length of `data.length`, no? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...
Github user yuanboliu commented on the issue: https://github.com/apache/spark/pull/21690 Thanks very much --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21766: [SPARK-24803][SQL] add support for numeric
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21766 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21603 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92995/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21766: [SPARK-24803] add support for numeric
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21766 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21603 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21766: [SPARK-24803] add support for numeric
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21766 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21603 **[Test build #92995 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92995/testReport)** for PR 21603 at commit [`c386e02`](https://github.com/apache/spark/commit/c386e02c8bfbf1eb43dcf95717a9e2ea30123d57). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21766: [SPARK-24803] add support for numeric
GitHub user wangtao605 opened a pull request: https://github.com/apache/spark/pull/21766 [SPARK-24803] add support for numeric numerical is as same with decimal. Spark has already supported decimalï¼so i think we should add support for numeric to align SQL standards. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangtao605/spark numeric Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21766.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21766 commit b0d6ffdbaed7b0638f8859f4a26926761764d95e Author: çæ·10181990 Date: 2018-07-14T04:18:53Z add support for numeric --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21589 sgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202503533 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -2336,6 +2336,18 @@ class SparkContext(config: SparkConf) extends Logging { */ def defaultMinPartitions: Int = math.min(defaultParallelism, 2) + /** + * Total number of CPU cores of all executors registered in the cluster at the moment. + * The number reflects current status of the cluster and can change in the future. + */ --- End diff -- that means https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/common/tags/src/main/java/org/apache/spark/annotation/Experimental.java --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21589: [SPARK-24591][CORE] Number of cores and executors...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21589#discussion_r202503503 --- Diff: R/pkg/R/context.R --- @@ -435,3 +435,31 @@ setCheckpointDir <- function(directory) { sc <- getSparkContext() invisible(callJMethod(sc, "setCheckpointDir", suppressWarnings(normalizePath(directory } + +#' Total number of CPU cores of all executors registered in the cluster at the moment. --- End diff -- It's okay for this PR itself for now. I can test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/21698 OK we can treat it as a data loss. However, it's not caused by spark but by the user himself. If a user calls `zip` and then using a custom function to compute keys from the zipped pairs, and finally call `groupByKey`, there is nothing Spark can guarantee if the RDDs are unsorted. I think in this case the user should fix his business logic, Spark does nothing wrong on this. Even if the tasks never fail, the users can still get different result/cardinality if he runs his query multiple times. `repartition` is different because the user's business logic is nothing wrong: he just wants to repartition the data, Spark should not add/remove/update the existing records. Anyway if we do want to "fix" the `zip` problem, I think this should be a different topic: we would need to write all the input data to somewhere and make sure the retired task can get exactly same input, which is very expensive and very different from this approach. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20915 Ah, I am happy too that it's added back. Was just simply wondering if I can fix the fix version in SPARK-12850 JIRA. Nothing more then this. Thank you @cloud-fan. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21720 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21720 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92993/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21720 **[Test build #92993 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92993/testReport)** for PR 21720 at commit [`b27245e`](https://github.com/apache/spark/commit/b27245e3e2ca021815e6b353036925e57f665e7a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21762: [SPARK-24800][SQL] Refactor Avro Serializer and D...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21762#discussion_r202502621 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala --- @@ -0,0 +1,348 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.avro + +import java.nio.ByteBuffer + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer + +import org.apache.avro.{Schema, SchemaBuilder} +import org.apache.avro.Schema.Type._ +import org.apache.avro.generic._ +import org.apache.avro.util.Utf8 + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{SpecificInternalRow, UnsafeArrayData} +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, ArrayData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +/** + * A deserializer to deserialize data in avro format to data in catalyst format. + */ +class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) { + private val converter: Any => Any = rootCatalystType match { +// A shortcut for empty schema. +case st: StructType if st.isEmpty => + (data: Any) => InternalRow.empty + +case st: StructType => + val resultRow = new SpecificInternalRow(st.map(_.dataType)) + val fieldUpdater = new RowUpdater(resultRow) + val writer = getRecordWriter(rootAvroType, st, Nil) + (data: Any) => { +val record = data.asInstanceOf[GenericRecord] +writer(fieldUpdater, record) +resultRow + } + +case _ => + val tmpRow = new SpecificInternalRow(Seq(rootCatalystType)) + val fieldUpdater = new RowUpdater(tmpRow) + val writer = newWriter(rootAvroType, rootCatalystType, Nil) + (data: Any) => { +writer(fieldUpdater, 0, data) +tmpRow.get(0, rootCatalystType) + } + } + + def deserialize(data: Any): Any = converter(data) + + /** + * Creates a writer to writer avro values to Catalyst values at the given ordinal with the given + * updater. + */ + private def newWriter( + avroType: Schema, + catalystType: DataType, + path: List[String]): (CatalystDataUpdater, Int, Any) => Unit = +(avroType.getType, catalystType) match { + case (NULL, NullType) => (updater, ordinal, _) => +updater.setNullAt(ordinal) + + // TODO: we can avoid boxing if future version of avro provide primitive accessors. + case (BOOLEAN, BooleanType) => (updater, ordinal, value) => +updater.setBoolean(ordinal, value.asInstanceOf[Boolean]) + + case (INT, IntegerType) => (updater, ordinal, value) => +updater.setInt(ordinal, value.asInstanceOf[Int]) + + case (LONG, LongType) => (updater, ordinal, value) => +updater.setLong(ordinal, value.asInstanceOf[Long]) + + case (LONG, TimestampType) => (updater, ordinal, value) => +updater.setLong(ordinal, value.asInstanceOf[Long] * 1000) + + case (LONG, DateType) => (updater, ordinal, value) => +updater.setInt(ordinal, (value.asInstanceOf[Long] / DateTimeUtils.MILLIS_PER_DAY).toInt) + + case (FLOAT, FloatType) => (updater, ordinal, value) => +updater.setFloat(ordinal, value.asInstanceOf[Float]) + + case (DOUBLE, DoubleType) => (updater, ordinal, value) => +updater.setDouble(ordinal, value.asInstanceOf[Double]) + + case (STRING, StringType) => (updater, ordinal, value) => +val str = value match { + case s: String => UTF8String.fromString(s) + case s: Utf8 => +val bytes = new Array[Byte](s.getByteLength) +System.arraycopy(s.getBytes, 0, bytes, 0, s.getByteLength) +
[GitHub] spark pull request #21762: [SPARK-24800][SQL] Refactor Avro Serializer and D...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21762#discussion_r202502724 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala --- @@ -17,34 +17,30 @@ package org.apache.spark.sql.avro -import java.nio.ByteBuffer -import java.sql.{Date, Timestamp} - import scala.collection.JavaConverters._ import org.apache.avro.{Schema, SchemaBuilder} import org.apache.avro.Schema.Type._ -import org.apache.avro.SchemaBuilder._ -import org.apache.avro.generic.{GenericData, GenericRecord} -import org.apache.avro.generic.GenericFixed -import org.apache.spark.sql.catalyst.expressions.GenericRow import org.apache.spark.sql.types._ /** * This object contains method that are used to convert sparkSQL schemas to avro schemas and vice * versa. */ object SchemaConverters { - - class IncompatibleSchemaException(msg: String, ex: Throwable = null) extends Exception(msg, ex) - case class SchemaType(dataType: DataType, nullable: Boolean) + /** + * This function takes an avro schema and returns a sql schema. + * An alias name of toCatalystType. + */ + def toSqlType(avroSchema: Schema): SchemaType = toCatalystType(avroSchema) + /** * This function takes an avro schema and returns a sql schema. */ - def toSqlType(avroSchema: Schema): SchemaType = { + def toCatalystType(avroSchema: Schema): SchemaType = { --- End diff -- maybe we don't need to rename it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21762: [SPARK-24800][SQL] Refactor Avro Serializer and D...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21762#discussion_r202502456 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala --- @@ -0,0 +1,348 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.avro + +import java.nio.ByteBuffer + +import scala.collection.JavaConverters._ +import scala.collection.mutable.ArrayBuffer + +import org.apache.avro.{Schema, SchemaBuilder} +import org.apache.avro.Schema.Type._ +import org.apache.avro.generic._ +import org.apache.avro.util.Utf8 + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{SpecificInternalRow, UnsafeArrayData} +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, ArrayData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +/** + * A deserializer to deserialize data in avro format to data in catalyst format. + */ +class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) { + private val converter: Any => Any = rootCatalystType match { +// A shortcut for empty schema. +case st: StructType if st.isEmpty => + (data: Any) => InternalRow.empty + +case st: StructType => + val resultRow = new SpecificInternalRow(st.map(_.dataType)) + val fieldUpdater = new RowUpdater(resultRow) + val writer = getRecordWriter(rootAvroType, st, Nil) + (data: Any) => { +val record = data.asInstanceOf[GenericRecord] +writer(fieldUpdater, record) +resultRow + } + +case _ => + val tmpRow = new SpecificInternalRow(Seq(rootCatalystType)) + val fieldUpdater = new RowUpdater(tmpRow) + val writer = newWriter(rootAvroType, rootCatalystType, Nil) + (data: Any) => { +writer(fieldUpdater, 0, data) +tmpRow.get(0, rootCatalystType) + } + } + + def deserialize(data: Any): Any = converter(data) + + /** + * Creates a writer to writer avro values to Catalyst values at the given ordinal with the given --- End diff -- nit `a writer to write` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20915 IIUC the bucket pruning was accidentally removed during refactoring, I'm happy to see it's added back. The migration must keep all the existing features. I think we have enough tests to guarantee it now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21764: [SPARK-24802] Optimization Rule Exclusion
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21764 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21764: [SPARK-24802] Optimization Rule Exclusion
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21764 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92992/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21764: [SPARK-24802] Optimization Rule Exclusion
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21764 **[Test build #92992 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92992/testReport)** for PR 21764 at commit [`eaec2f5`](https://github.com/apache/spark/commit/eaec2f5f2b4e3193de41655b84a1dc936b0e50a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21556 **[Test build #92996 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92996/testReport)** for PR 21556 at commit [`e713698`](https://github.com/apache/spark/commit/e7136984b482b3652e7ecfa542aec83aadc58320). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21556 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/946/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21556 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/21556 cc @gatorsmile @cloud-fan @gengliangwang @michal-databricks @mswit-databricks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21589 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21589 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92989/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21589 **[Test build #92989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92989/testReport)** for PR 21589 at commit [`7533114`](https://github.com/apache/spark/commit/7533114d00110f7350280378b8a3e78f39c5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21603 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21603 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/945/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r202500542 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -747,6 +748,66 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex // Test inverseCanDrop() has taken effect testStringStartsWith(spark.range(1024).map(c => "100").toDF(), "value not like '10%'") } + + test("SPARK-17091: Convert IN predicate to Parquet filter push-down") { +val schema = StructType(Seq( + StructField("a", IntegerType, nullable = false) +)) + +val parquetSchema = new SparkToParquetSchemaConverter(conf).convert(schema) + +assertResult(Some(FilterApi.eq(intColumn("a"), null: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(null))) +} + +assertResult(Some(FilterApi.eq(intColumn("a"), 10: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10))) +} + +// Remove duplicates +assertResult(Some(FilterApi.eq(intColumn("a"), 10: Integer))) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10, 10))) +} + +assertResult(Some(or( + FilterApi.eq(intColumn("a"), 10: Integer), + FilterApi.eq(intColumn("a"), 20: Integer))) +) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10, 20))) +} + +assertResult(Some(or(or( + FilterApi.eq(intColumn("a"), 10: Integer), + FilterApi.eq(intColumn("a"), 20: Integer)), + FilterApi.eq(intColumn("a"), 30: Integer))) +) { + parquetFilters.createFilter(parquetSchema, sources.In("a", Array(10, 20, 30))) +} + +assert(parquetFilters.createFilter(parquetSchema, sources.In("a", + Range(0, conf.parquetFilterPushDownInFilterThreshold).toArray)).isDefined) +assert(parquetFilters.createFilter(parquetSchema, sources.In("a", + Range(0, conf.parquetFilterPushDownInFilterThreshold + 1).toArray)).isEmpty) + +import testImplicits._ +withTempPath { path => + (0 to 1024).toDF("a").selectExpr("if (a = 1024, null, a) AS a") // convert 1024 to null +.coalesce(1).write.option("parquet.block.size", 512) +.parquet(path.getAbsolutePath) + val df = spark.read.parquet(path.getAbsolutePath) + Seq(true, false).foreach { pushEnabled => --- End diff -- Updated to: ```scala val actual = stripSparkFilter(df.where(filter)).collect().length if (pushEnabled && count <= conf.parquetFilterPushDownInFilterThreshold) { assert(actual > 1 && actual < data.length) } else { assert(actual === data.length) } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21603 **[Test build #92995 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92995/testReport)** for PR 21603 at commit [`c386e02`](https://github.com/apache/spark/commit/c386e02c8bfbf1eb43dcf95717a9e2ea30123d57). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/944/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/944/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21583: [SPARK-23984][K8S][Test] Added Integration Tests ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21583 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92994/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92994 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92994/testReport)** for PR 21748 at commit [`ee5c267`](https://github.com/apache/spark/commit/ee5c267206f574d1a18d03f27aa6d7f9cf16eb95). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/944/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21583 Yeah let's merge this - think there might be some work to clean this up a bit later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21748 Ok documentation is done, this is fully ready for review now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92994 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92994/testReport)** for PR 21748 at commit [`ee5c267`](https://github.com/apache/spark/commit/ee5c267206f574d1a18d03f27aa6d7f9cf16eb95). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21439 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92986/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21439 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21439: [SPARK-24391][SQL] Support arrays of any types by from_j...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21439 **[Test build #92986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92986/testReport)** for PR 21439 at commit [`758d1df`](https://github.com/apache/spark/commit/758d1dfebc604715aac826c88c7ff8421095b05e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21720 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/943/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21720 **[Test build #92993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92993/testReport)** for PR 21720 at commit [`b27245e`](https://github.com/apache/spark/commit/b27245e3e2ca021815e6b353036925e57f665e7a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21720: [SPARK-24163][SPARK-24164][SQL] Support column list as t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21720 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21765: [MINOR][CORE] Add test cases for RDD.cartesian
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21765 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21765: [MINOR][CORE] Add test cases for RDD.cartesian
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21765 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21765: [MINOR][CORE] Add test cases for RDD.cartesian
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21765 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21765: [MINOR][CORE] Add test cases for RDD.cartesian
GitHub user NiharS opened a pull request: https://github.com/apache/spark/pull/21765 [MINOR][CORE] Add test cases for RDD.cartesian ## What changes were proposed in this pull request? While looking through the codebase, it appeared that the scala code for RDD.cartesian does not have any tests for correctness. This adds a couple basic tests to verify cartesian yields correct values. While the implementation for RDD.cartesian is pretty simple, it always helps to have a few tests! ## How was this patch tested? The new test cases pass, and the scala style tests from running dev/run-tests all pass. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/NiharS/spark cartesianTests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21765.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21765 commit 9df4c3b4a71082181aa979c3bddf2c3d99db256e Author: Nihar Sheth Date: 2018-07-13T20:37:59Z [MINOR][CORE] Add test cases for RDD.cartesian The scala code for RDD.cartesian does not have any tests for correctness. This adds a couple basic tests to verify cartesian yields correct values. Passes the added test cases, and passes the scala style tests. Author: Nihar Sheth --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21748 Ok, integration test is green. Probably should add some docs though - following up in a moment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21764: [SPARK-24802] Optimization Rule Exclusion
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21764 **[Test build #92992 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92992/testReport)** for PR 21764 at commit [`eaec2f5`](https://github.com/apache/spark/commit/eaec2f5f2b4e3193de41655b84a1dc936b0e50a3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21764: [SPARK-24802] Optimization Rule Exclusion
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21764 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/942/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21764: [SPARK-24802] Optimization Rule Exclusion
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21764 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21764: [SPARK-24802] Optimization Rule Exclusion
GitHub user maryannxue opened a pull request: https://github.com/apache/spark/pull/21764 [SPARK-24802] Optimization Rule Exclusion ## What changes were proposed in this pull request? Since Spark has provided fairly clear interfaces for adding user-defined optimization rules, it would be nice to have an easy-to-use interface for excluding an optimization rule from the Spark query optimizer as well. This would make customizing Spark optimizer easier and sometimes could debugging issues too. - Add a new config spark.sql.optimizer.excludedRules, with the value being a list of rule names separated by comma. - Modify the current batches method to remove the excluded rules from the default batches. Log the rules that have been excluded. - Split the existing default batches into "post-analysis batches" and "optimization batches" so that only rules in the "optimization batches" can be excluded. ## How was this patch tested? Add a new test suite: OptimizerRuleExclusionSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/maryannxue/spark rule-exclusion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21764.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21764 commit eaec2f5f2b4e3193de41655b84a1dc936b0e50a3 Author: maryannxue Date: 2018-07-13T21:32:01Z [SPARK-24802] Optimization Rule Exclusion --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20611 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92978/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20611 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20611 **[Test build #92978 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92978/testReport)** for PR 20611 at commit [`9ceeb30`](https://github.com/apache/spark/commit/9ceeb30ae0f0b04ac46980c499c9c286ba68e20a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/941/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/941/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/941/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92991/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92991 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92991/testReport)** for PR 21748 at commit [`69bf2a4`](https://github.com/apache/spark/commit/69bf2a46ae0ce473c813a94ad7e8339a4a5fe599). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20944: [SPARK-23831][SQL] Add org.apache.derby to Isolat...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20944 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20944: [SPARK-23831][SQL] Add org.apache.derby to IsolatedClien...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20944 LGTM Thanks! Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/940/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/940/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92990/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92990/testReport)** for PR 21748 at commit [`846f093`](https://github.com/apache/spark/commit/846f093fa2dac36fca10b3d9d5a1a2d865fde056). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92991 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92991/testReport)** for PR 21748 at commit [`69bf2a4`](https://github.com/apache/spark/commit/69bf2a46ae0ce473c813a94ad7e8339a4a5fe599). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/940/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18542: [SPARK-21317][SQL] Avoid sorting on bucket expression if...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18542 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/21583 @mccheah @foxish Can we merge? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21095: [SPARK-23529][K8s] Support mounting hostPath volumes
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21095 @madanadit can you close this as #21260 has been merged? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20722: [SPARK-23571][K8S] Delete auxiliary Kubernetes re...
Github user liyinan926 closed the pull request at: https://github.com/apache/spark/pull/20722 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92990 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92990/testReport)** for PR 21748 at commit [`846f093`](https://github.com/apache/spark/commit/846f093fa2dac36fca10b3d9d5a1a2d865fde056). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21748 Integration test isn't ready for review but the main code is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/939/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/939/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92988/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #92988 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92988/testReport)** for PR 21748 at commit [`75db063`](https://github.com/apache/spark/commit/75db0632ff124e5df0d8dc98ff13ed2f1aaa440e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user liyinan926 commented on the issue: https://github.com/apache/spark/pull/21748 @mccheah Is this ready for review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 > in this cluster do we really mean cores allocated to the "application" or "job"? @felixcheung What about `number of CPUs/Executors potentially available to an job submitted via the Spark Context`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org