[GitHub] spark pull request #21705: [SPARK-24727][SQL] Add a static config to control...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21705#discussion_r199986545 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/internal/ExecutorSideSQLConfSuite.scala --- @@ -63,4 +71,15 @@ class ExecutorSideSQLConfSuite extends SparkFunSuite with SQLTestUtils { } } } + + test("SPARK-24727 CODEGEN_CACHE_SIZE is correctly referenced at the executor side") { --- End diff -- very nit: `CODEGEN_CACHE_SIZE` -> `CODEGEN_CACHE_MAX_ENTRIES` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21673: SPARK-24697: Fix the reported start offsets in streaming...
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21673 @arunmahadevan We'd be better to respect style guide on pull request: please change title to include let JIRA issue number being guided with `[]` and also add `[SS]`. http://spark.apache.org/contributing.html > The PR title should be of the form [SPARK-][COMPONENT] Title, where SPARK- is the relevant JIRA number, COMPONENT is one of the PR categories shown at spark-prs.appspot.com and Title may be the JIRAâs title or a more specific title describing the PR itself. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21705 **[Test build #92586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92586/testReport)** for PR 21705 at commit [`0c4644e`](https://github.com/apache/spark/commit/0c4644e3b03457bad09b7abc415b151d9998bbf5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/653/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21433 **[Test build #4206 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4206/testReport)** for PR 21433 at commit [`245181a`](https://github.com/apache/spark/commit/245181a6ebb03b4f394097297ae245705aaf9b0f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21706: [SPARK-24702] Fix Unable to cast to calendar interval in...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21706 Adds tests in `sql/core/src/test/resources/sql-tests/inputs/cast.sql` then run `SPARK_GENERATE_GOLDEN_FILES=1 ./build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.SQLQueryTestSuite test`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21686 **[Test build #92580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92580/testReport)** for PR 21686 at commit [`dc35731`](https://github.com/apache/spark/commit/dc35731f4cd99c3687be486fcf15e9f9883cb139). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21686 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92580/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21686 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21659: [SPARK-24530][PYTHON] Add a control to force Pyth...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21659#discussion_r199989581 --- Diff: python/docs/Makefile --- @@ -1,19 +1,44 @@ # Makefile for Sphinx documentation # +ifndef SPHINXBUILD +ifndef SPHINXPYTHON +SPHINXBUILD = sphinx-build +endif +endif + +ifdef SPHINXBUILD --- End diff -- I couldn't find an easy way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21659: [SPARK-24530][PYTHON] Add a control to force Python vers...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21659 Ah, I mean PRs in spark-website to fix the API documentation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21686 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21702: [SPARK-23698] Remove raw_input() from Python 2
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21702 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21686: [SPARK-24709][SQL] schema_of_json() - schema infe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21686 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21700 **[Test build #92587 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92587/testReport)** for PR 21700 at commit [`c50da7b`](https://github.com/apache/spark/commit/c50da7b40645e8c6d8c1530cf3497ef2d3a09857). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21702: [SPARK-23698] Remove raw_input() from Python 2
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21702 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21321: [SPARK-24268][SQL] Use datatype.simpleString in error me...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21321 @mgaido91, mind rebasing this please? Let me just get this in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r11109 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,166 +19,186 @@ package org.apache.spark.sql.execution.datasources.parquet import java.sql.Date +import scala.collection.JavaConverters._ + import org.apache.parquet.filter2.predicate._ import org.apache.parquet.filter2.predicate.FilterApi._ import org.apache.parquet.io.api.Binary -import org.apache.parquet.schema.PrimitiveComparator +import org.apache.parquet.schema._ +import org.apache.parquet.schema.OriginalType._ +import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName._ import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLDate import org.apache.spark.sql.sources -import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String /** * Some utility function to convert Spark data source filters to Parquet filters. */ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: Boolean) { + case class ParquetSchemaType( + originalType: OriginalType, + primitiveTypeName: PrimitiveType.PrimitiveTypeName, + decimalMetadata: DecimalMetadata) + private def dateToDays(date: Date): SQLDate = { DateTimeUtils.fromJavaDate(date) } - private val makeEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { -case BooleanType => + private val makeEq: PartialFunction[ParquetSchemaType, (String, Any) => FilterPredicate] = { +// BooleanType +case ParquetSchemaType(null, BOOLEAN, null) => (n: String, v: Any) => FilterApi.eq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) -case IntegerType => +// IntegerType +case ParquetSchemaType(null, INT32, null) => --- End diff -- the safest way is to look at both file's type and Spark's type, and deal with type mismatch. We can do it later since it's an existing problem. Currently Spark tries its best to guarantee the type matches(except missing/extra columns). The only case I can think of that may break the assumption is: the parquet files have conflicting schema and Sparks read them using a user-specified schema(so that we can skip schema inference) that doesn't match all the parquet files. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r11024 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -42,6 +42,14 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: private val makeEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { case BooleanType => (n: String, v: Any) => FilterApi.eq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) +case ByteType => + (n: String, v: Any) => FilterApi.eq( +intColumn(n), +Option(v).map(b => b.asInstanceOf[java.lang.Byte].toInt.asInstanceOf[Integer]).orNull) +case ShortType => --- End diff -- How about like this: - `makeEq` and `makeNotEq` ```scala case ByteType | ShortType => (n: String, v: Any) => FilterApi.notEq( intColumn(n), Option(v).map(_.asInstanceOf[Number].intValue.asInstanceOf[Integer]).orNull) case IntegerType => (n: String, v: Any) => FilterApi.notEq(intColumn(n), v.asInstanceOf[Integer]) ``` - `makeLt`, `makeLtEq`, `makeGt` and `makeGtEq`: ```scala case ByteType | ShortType => (n: String, v: Any) => FilterApi.gtEq( intColumn(n), v.asInstanceOf[Number].intValue.asInstanceOf[Integer]) case IntegerType => (n: String, v: Any) => FilterApi.gtEq(intColumn(n), v.asInstanceOf[java.lang.Integer]) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21708: [BUILD] Close stale PRs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21708 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r11214 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,166 +19,186 @@ package org.apache.spark.sql.execution.datasources.parquet import java.sql.Date +import scala.collection.JavaConverters._ + import org.apache.parquet.filter2.predicate._ import org.apache.parquet.filter2.predicate.FilterApi._ import org.apache.parquet.io.api.Binary -import org.apache.parquet.schema.PrimitiveComparator +import org.apache.parquet.schema._ +import org.apache.parquet.schema.OriginalType._ +import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName._ import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLDate import org.apache.spark.sql.sources -import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String /** * Some utility function to convert Spark data source filters to Parquet filters. */ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: Boolean) { + case class ParquetSchemaType( + originalType: OriginalType, + primitiveTypeName: PrimitiveType.PrimitiveTypeName, + decimalMetadata: DecimalMetadata) + private def dateToDays(date: Date): SQLDate = { DateTimeUtils.fromJavaDate(date) } - private val makeEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { -case BooleanType => + private val makeEq: PartialFunction[ParquetSchemaType, (String, Any) => FilterPredicate] = { +// BooleanType +case ParquetSchemaType(null, BOOLEAN, null) => (n: String, v: Any) => FilterApi.eq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) -case IntegerType => +// IntegerType +case ParquetSchemaType(null, INT32, null) => (n: String, v: Any) => FilterApi.eq(intColumn(n), v.asInstanceOf[Integer]) -case LongType => +// LongType +case ParquetSchemaType(null, INT64, null) => (n: String, v: Any) => FilterApi.eq(longColumn(n), v.asInstanceOf[java.lang.Long]) -case FloatType => +// FloatType +case ParquetSchemaType(null, FLOAT, null) => (n: String, v: Any) => FilterApi.eq(floatColumn(n), v.asInstanceOf[java.lang.Float]) -case DoubleType => +// DoubleType +case ParquetSchemaType(null, DOUBLE, null) => (n: String, v: Any) => FilterApi.eq(doubleColumn(n), v.asInstanceOf[java.lang.Double]) +// StringType // Binary.fromString and Binary.fromByteArray don't accept null values -case StringType => +case ParquetSchemaType(UTF8, BINARY, null) => (n: String, v: Any) => FilterApi.eq( binaryColumn(n), Option(v).map(s => Binary.fromString(s.asInstanceOf[String])).orNull) -case BinaryType => +// BinaryType +case ParquetSchemaType(null, BINARY, null) => (n: String, v: Any) => FilterApi.eq( binaryColumn(n), Option(v).map(b => Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull) -case DateType if pushDownDate => +// DateType +case ParquetSchemaType(DATE, INT32, null) if pushDownDate => (n: String, v: Any) => FilterApi.eq( intColumn(n), Option(v).map(date => dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull) } - private val makeNotEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { -case BooleanType => + private val makeNotEq: PartialFunction[ParquetSchemaType, (String, Any) => FilterPredicate] = { +case ParquetSchemaType(null, BOOLEAN, null) => (n: String, v: Any) => FilterApi.notEq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) -case IntegerType => +case ParquetSchemaType(null, INT32, null) => (n: String, v: Any) => FilterApi.notEq(intColumn(n), v.asInstanceOf[Integer]) -case LongType => +case ParquetSchemaType(null, INT64, null) => (n: String, v: Any) => FilterApi.notEq(longColumn(n), v.asInstanceOf[java.lang.Long]) -case FloatType => +case ParquetSchemaType(null, FLOAT, null) => (n: String, v: Any) => FilterApi.notEq(floatColumn(n), v.asInstanceOf[java.lang.Float]) -case DoubleType => +case ParquetSchemaType(null, DOUBLE, null) => (n: String, v: Any) => FilterApi.notEq(doubleColumn(n), v.asInstanceOf[java.lang.Double]) -case StringType => +case ParquetSchemaType(UTF8, BINARY, null) => (n: String, v: Any) => FilterApi.notEq
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21403 **[Test build #92581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92581/testReport)** for PR 21403 at commit [`d3e39ed`](https://github.com/apache/spark/commit/d3e39ed3f442958cfaaa1ef056cb72fedf0fce1c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21403 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92581/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21403 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21700 **[Test build #92587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92587/testReport)** for PR 21700 at commit [`c50da7b`](https://github.com/apache/spark/commit/c50da7b40645e8c6d8c1530cf3497ef2d3a09857). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14291: [SPARK-16658][GRAPHX] Add EdgePartition.withVerte...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14291 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20919: Feature/apply func to rdd
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20919 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21700 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21507: Branch 1.6
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21507 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20809: [SPARK-23667][CORE] Better scala version check
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20809 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20932: [SPARK-23812][SQL] DFS should be removed from uns...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20932 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21708: [BUILD] Close stale PRs
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21708 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18766: [SPARK-8288][SQL] ScalaReflection can use compani...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18766 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21700 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92587/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21336: [SPARK-24286][Documentation] DataFrameReader.csv ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21336 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21691: Branch 2.2
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21691 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #8849: [SPARK-9883][MLlib] Distance to each kmean cluster...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8849 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13477: [SPARK-15739][GraphX] Expose aggregateMessagesWit...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13477 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21681: Pin tag 210
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21681 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17843: [Streaming] groupByKey should also disable map si...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17843 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17907: SPARK-7856 Principal components and variance usin...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17907 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21076: Creating KafkaStreamToCassandra
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21076 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r11918 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/CatalogSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import org.apache.spark.sql.sources.v2.catalog.DataSourceCatalog; + +/** + * A mix-in interface for {@link DataSourceV2} catalog support. Data sources can implement this + * interface to provide the ability to load, create, alter, and drop tables. + * + * Data sources must implement this interface to support logical operations that combine writing + * data with catalog tasks, like create-table-as-select. + */ +public interface CatalogSupport { --- End diff -- After thinking about it more, what we really need in the near future is all about table: create/alter/lookup/drop tables, instead of how the tables are organized, like databases, and how other information is stored, like view/function. How about we call it `TableSupport`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21700 **[Test build #92588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92588/testReport)** for PR 21700 at commit [`d8b4bb8`](https://github.com/apache/spark/commit/d8b4bb84bc9216ebe9b31f8992e6d59e975b377d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r12200 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.catalog; + +import org.apache.spark.sql.catalyst.expressions.Expression; +import org.apache.spark.sql.types.StructType; + +import java.util.List; +import java.util.Map; + +/** + * Represents table metadata from a {@link DataSourceCatalog}. + */ +public interface Table { --- End diff -- this is something we should decide now. IMO `schema` and `properties` are must-have, but others may not. e.g. if a data source uses a path to lookup table, then there is no database/table name to it. And we don't have a story to deal with partitions yet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r12316 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -93,6 +101,10 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: } private val makeLt: PartialFunction[DataType, (String, Any) => FilterPredicate] = { +case ByteType | ShortType => + (n: String, v: Any) => FilterApi.lt( +intColumn(n), +v.asInstanceOf[Number].intValue.asInstanceOf[Integer]) --- End diff -- value cannot be `null`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r12294 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -42,6 +42,10 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: private val makeEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { case BooleanType => (n: String, v: Any) => FilterApi.eq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) +case ByteType | ShortType => + (n: String, v: Any) => FilterApi.eq( +intColumn(n), + Option(v).map(_.asInstanceOf[Number].intValue.asInstanceOf[Integer]).orNull) --- End diff -- value may be `null`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r12550 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.catalog; + +import org.apache.spark.sql.types.DataType; + +/** + * TableChange subclasses represent requested changes to a table. These are passed to + * {@link DataSourceCatalog#alterTable}. + */ +public interface TableChange { + + /** + * Create a TableChange for adding a top-level column to a table. + * + * Because "." may be interpreted as a field path separator or may be used in field names, it is + * not allowed in names passed to this method. To add to nested types or to add fields with + * names that contain ".", use {@link #addColumn(String, String, DataType)}. + * + * @param name the new top-level column name + * @param dataType the new column's data type + * @return a TableChange for the addition + */ + static TableChange addColumn(String name, DataType dataType) { +return new AddColumn(null, name, dataType); + } + + /** + * Create a TableChange for adding a nested column to a table. + * + * The parent name is used to find the parent struct type where the nested field will be added. + * If the parent name is null, the new column will be added to the root as a top-level column. + * If parent identifies a struct, a new column is added to that struct. If it identifies a list, + * the column is added to the list element struct, and if it identifies a map, the new column is + * added to the map's value struct. + * + * The given name is used to name the new column and names containing "." are not handled + * differently. + * + * @param parent the new field's parent + * @param name the new field name + * @param dataType the new field's data type + * @return a TableChange for the addition + */ + static TableChange addColumn(String parent, String name, DataType dataType) { +return new AddColumn(parent, name, dataType); + } + + /** + * Create a TableChange for renaming a field. + * + * The name is used to find the field to rename. The new name will replace the name of the type. + * For example, renameColumn("a.b.c", "x") should produce column a.b.x. --- End diff -- It's great to have an example to show how to use this API, can we add an example to all the methods here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r12580 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.catalog; + +import org.apache.spark.sql.types.DataType; + +/** + * TableChange subclasses represent requested changes to a table. These are passed to + * {@link DataSourceCatalog#alterTable}. + */ +public interface TableChange { --- End diff -- This is great! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21682 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/654/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21682 **[Test build #92589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92589/testReport)** for PR 21682 at commit [`b16a607`](https://github.com/apache/spark/commit/b16a6076b55bd2e1f01ed66ea7f53d2f915888be). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21682 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21657 **[Test build #92590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92590/testReport)** for PR 21657 at commit [`fc2108e`](https://github.com/apache/spark/commit/fc2108e52ee987d7ca3d4cce811ad9fc4e462c47). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/655/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21704: [SPARK-24734][SQL] Fix containsNull of Concat for...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21704#discussion_r14021 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends Expression { } } - override def dataType: DataType = children.map(_.dataType).headOption.getOrElse(StringType) + override def dataType: DataType = { +val dataTypes = children.map(_.dataType) +dataTypes.headOption.map { + case ArrayType(et, _) => +ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt +}.getOrElse(StringType) + } --- End diff -- Can't we handle this case in type coercion (analysis phase)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21709: [SPARK-5152][CORE] Read metrics config file from Hadoop ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21709 Hi @jzhuge what is the purpose of supporting reading metrics conf from HDFS/S3? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21708: [BUILD] Close stale PRs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21708 **[Test build #92579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92579/testReport)** for PR 21708 at commit [`dad2b46`](https://github.com/apache/spark/commit/dad2b4602f4854ab941014cc4ec7535d3d74d2f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21708: [BUILD] Close stale PRs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21708 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21708: [BUILD] Close stale PRs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21708 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92579/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21709: [SPARK-5152][CORE] Read metrics config file from Hadoop ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21709 If you want metrics conf to be centralized without needing to put it on to different nodes, you can set it through `SparkConf` with prefix "spark.metrics.conf.", MetricsSystem also supports getting configurations from `SparkConf`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21701 **[Test build #92582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92582/testReport)** for PR 21701 at commit [`c0d1c6e`](https://github.com/apache/spark/commit/c0d1c6e0a5532eeab0848834d2dc348808e54069). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `sealed trait MultipleWatermarkPolicy ` * `case class WatermarkTracker(policy: MultipleWatermarkPolicy) extends Logging ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21701 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92582/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21655: [SPARK-24675][SQL]Rename table: validate existence of ne...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21655 **[Test build #92591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92591/testReport)** for PR 21655 at commit [`18418c9`](https://github.com/apache/spark/commit/18418c902590b066c6173c1cb33d58a2aef5d6c6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21619: [SPARK-24635][SQL] Remove Blocks class from JavaCode cla...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21619 @cloud-fan Any more thing I should do for this to make this in? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21655: [SPARK-24675][SQL]Rename table: validate existence of ne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21655 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/656/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21611 ping @cloud-fan for taking a look again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21655: [SPARK-24675][SQL]Rename table: validate existence of ne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21655 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r19002 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,166 +19,186 @@ package org.apache.spark.sql.execution.datasources.parquet import java.sql.Date +import scala.collection.JavaConverters._ + import org.apache.parquet.filter2.predicate._ import org.apache.parquet.filter2.predicate.FilterApi._ import org.apache.parquet.io.api.Binary -import org.apache.parquet.schema.PrimitiveComparator +import org.apache.parquet.schema._ +import org.apache.parquet.schema.OriginalType._ +import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName._ import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLDate import org.apache.spark.sql.sources -import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String /** * Some utility function to convert Spark data source filters to Parquet filters. */ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: Boolean) { + case class ParquetSchemaType( + originalType: OriginalType, + primitiveTypeName: PrimitiveType.PrimitiveTypeName, + decimalMetadata: DecimalMetadata) + private def dateToDays(date: Date): SQLDate = { DateTimeUtils.fromJavaDate(date) } - private val makeEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { -case BooleanType => + private val makeEq: PartialFunction[ParquetSchemaType, (String, Any) => FilterPredicate] = { +// BooleanType +case ParquetSchemaType(null, BOOLEAN, null) => (n: String, v: Any) => FilterApi.eq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) -case IntegerType => +// IntegerType +case ParquetSchemaType(null, INT32, null) => (n: String, v: Any) => FilterApi.eq(intColumn(n), v.asInstanceOf[Integer]) -case LongType => +// LongType +case ParquetSchemaType(null, INT64, null) => (n: String, v: Any) => FilterApi.eq(longColumn(n), v.asInstanceOf[java.lang.Long]) -case FloatType => +// FloatType +case ParquetSchemaType(null, FLOAT, null) => (n: String, v: Any) => FilterApi.eq(floatColumn(n), v.asInstanceOf[java.lang.Float]) -case DoubleType => +// DoubleType +case ParquetSchemaType(null, DOUBLE, null) => (n: String, v: Any) => FilterApi.eq(doubleColumn(n), v.asInstanceOf[java.lang.Double]) +// StringType // Binary.fromString and Binary.fromByteArray don't accept null values -case StringType => +case ParquetSchemaType(UTF8, BINARY, null) => (n: String, v: Any) => FilterApi.eq( binaryColumn(n), Option(v).map(s => Binary.fromString(s.asInstanceOf[String])).orNull) -case BinaryType => +// BinaryType +case ParquetSchemaType(null, BINARY, null) => (n: String, v: Any) => FilterApi.eq( binaryColumn(n), Option(v).map(b => Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull) -case DateType if pushDownDate => +// DateType +case ParquetSchemaType(DATE, INT32, null) if pushDownDate => (n: String, v: Any) => FilterApi.eq( intColumn(n), Option(v).map(date => dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull) } - private val makeNotEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { -case BooleanType => + private val makeNotEq: PartialFunction[ParquetSchemaType, (String, Any) => FilterPredicate] = { +case ParquetSchemaType(null, BOOLEAN, null) => (n: String, v: Any) => FilterApi.notEq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) -case IntegerType => +case ParquetSchemaType(null, INT32, null) => (n: String, v: Any) => FilterApi.notEq(intColumn(n), v.asInstanceOf[Integer]) -case LongType => +case ParquetSchemaType(null, INT64, null) => (n: String, v: Any) => FilterApi.notEq(longColumn(n), v.asInstanceOf[java.lang.Long]) -case FloatType => +case ParquetSchemaType(null, FLOAT, null) => (n: String, v: Any) => FilterApi.notEq(floatColumn(n), v.asInstanceOf[java.lang.Float]) -case DoubleType => +case ParquetSchemaType(null, DOUBLE, null) => (n: String, v: Any) => FilterApi.notEq(doubleColumn(n), v.asInstanceOf[java.lang.Double]) -case StringType => +case ParquetSchemaType(UTF8, BINARY, null) => (n: String, v: Any) => FilterApi.notEq(
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21705 **[Test build #92584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92584/testReport)** for PR 21705 at commit [`bc8a21a`](https://github.com/apache/spark/commit/bc8a21af0407f106ee64d3e5b6d4aed8bbf80688). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92584/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21704: [SPARK-24734][SQL] Fix containsNull of Concat for...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21704#discussion_r19346 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends Expression { } } - override def dataType: DataType = children.map(_.dataType).headOption.getOrElse(StringType) + override def dataType: DataType = { +val dataTypes = children.map(_.dataType) +dataTypes.headOption.map { + case ArrayType(et, _) => +ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt +}.getOrElse(StringType) + } --- End diff -- Actually, `Concat` for array type has the type coercion to add casts to make all children the same type, but we also have the optimization `SimplifyCasts` to remove unnecessary casts which might remove casts from arrays not contains null to arrays contains null ([optimizer/expressions.scala#L611](https://github.com/apache/spark/blob/d87a8c6c0d1a4db5c9444781160a65562f8ea738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L611)). E.g., `concat(array(1,2,3), array(4,null,6))` might generate a wrong data type during the execution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21667: [SPARK-24691][SQL]Dispatch the type support check...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r19766 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala --- @@ -152,6 +152,12 @@ trait FileFormat { } } + /** + * Validate the given [[DataType]] in read/write path for this file format. + * If the [[DataType]] is not supported, an exception will be thrown. + * By default all data types are supported. + */ + def validateDataType(dataType: DataType, isReadPath: Boolean): Unit = {} --- End diff -- Yes, that was what I did in first commit. If the unsupported type is inside struct/array, then the error message is not accurate as the current way. I am OK with revert to return Boolean though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21710 [SPARK-24207][R]add R API for PrefixSpan ## What changes were proposed in this pull request? add R API for PrefixSpan ## How was this patch tested? add test in test_mllib_fpm.R You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark-24207 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21710.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21710 commit 5ed11e67703dc7dfb23fb7ff68acffde33c13a30 Author: Huaxin Gao Date: 2018-07-04T03:18:08Z [SPARK-24207][R]add R API for PrefixSpan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21710 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21710 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/657/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21705 **[Test build #92583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92583/testReport)** for PR 21705 at commit [`cb95ea9`](https://github.com/apache/spark/commit/cb95ea918fd6f41eb057f890c0e5579a6083a2c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21710 **[Test build #92592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92592/testReport)** for PR 21710 at commit [`5ed11e6`](https://github.com/apache/spark/commit/5ed11e67703dc7dfb23fb7ff68acffde33c13a30). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92583/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21707 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21707 **[Test build #92593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92593/testReport)** for PR 21707 at commit [`cd9bdca`](https://github.com/apache/spark/commit/cd9bdcaeaab8d5e20747db21b7d6d9653cddaccb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21707 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/658/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21707 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21703 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21703: [SPARK-24732][SQL] Type coercion between MapTypes...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21703 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21703: [SPARK-24732][SQL] Type coercion between MapTypes...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21703#discussion_r25737 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -179,6 +179,12 @@ object TypeCoercion { .orElse((t1, t2) match { case (ArrayType(et1, containsNull1), ArrayType(et2, containsNull2)) => findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || containsNull2)) +case (MapType(kt1, vt1, valueContainsNull1), MapType(kt2, vt2, valueContainsNull2)) => --- End diff -- not related to this PR, but shall we also handle struct type here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21703: [SPARK-24732][SQL] Type coercion between MapTypes...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21703#discussion_r25967 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -179,6 +179,12 @@ object TypeCoercion { .orElse((t1, t2) match { case (ArrayType(et1, containsNull1), ArrayType(et2, containsNull2)) => findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || containsNull2)) +case (MapType(kt1, vt1, valueContainsNull1), MapType(kt2, vt2, valueContainsNull2)) => --- End diff -- Sure, let me think of it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/21711 [SPARK-24681][SQL] Verify nested column names in Hive metastore ## What changes were proposed in this pull request? This pr added code to check if nested column names do not include ',', ':', and ';' because Hive metastore can't handle these characters in nested column names; ref: https://github.com/apache/hive/blob/release-1.2.1/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java#L239 ## How was this patch tested? Added tests in `SQLQuerySuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-24681 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21711.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21711 commit dbc300edb56b6e813c926b061e780378ee564778 Author: Takeshi Yamamuro Date: 2018-07-04T04:07:04Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21711 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21711 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/659/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21711 **[Test build #92594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92594/testReport)** for PR 21711 at commit [`dbc300e`](https://github.com/apache/spark/commit/dbc300edb56b6e813c926b061e780378ee564778). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21710 **[Test build #92592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92592/testReport)** for PR 21710 at commit [`5ed11e6`](https://github.com/apache/spark/commit/5ed11e67703dc7dfb23fb7ff68acffde33c13a30). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21710 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21710 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92592/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21705 **[Test build #92586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92586/testReport)** for PR 21705 at commit [`0c4644e`](https://github.com/apache/spark/commit/0c4644e3b03457bad09b7abc415b151d9998bbf5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org