[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21433 **[Test build #4206 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4206/testReport)** for PR 21433 at commit [`245181a`](https://github.com/apache/spark/commit/245181a6ebb03b4f394097297ae245705aaf9b0f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21707 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21707 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92593/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21707 **[Test build #92593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92593/testReport)** for PR 21707 at commit [`cd9bdca`](https://github.com/apache/spark/commit/cd9bdcaeaab8d5e20747db21b7d6d9653cddaccb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21542 **[Test build #92596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92596/testReport)** for PR 21542 at commit [`b027d62`](https://github.com/apache/spark/commit/b027d62e0d7c66fb1bd94698fc585c7399283071). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/661/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21700 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92588/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21700 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21700 **[Test build #92588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92588/testReport)** for PR 21700 at commit [`d8b4bb8`](https://github.com/apache/spark/commit/d8b4bb84bc9216ebe9b31f8992e6d59e975b377d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21682 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92589/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21682 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21682 **[Test build #92589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92589/testReport)** for PR 21682 at commit [`b16a607`](https://github.com/apache/spark/commit/b16a6076b55bd2e1f01ed66ea7f53d2f915888be). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21696: [SPARK-24716][SQL] Refactor ParquetFilters
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21696 **[Test build #92595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92595/testReport)** for PR 21696 at commit [`9b489ec`](https://github.com/apache/spark/commit/9b489ecf732d2e8a455d1d7ba5fd96a17295292c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21696: [SPARK-24716][SQL] Refactor ParquetFilters
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21696 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r200011264 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,187 +19,200 @@ package org.apache.spark.sql.execution.datasources.parquet import java.sql.Date +import scala.collection.JavaConverters.asScalaBufferConverter + import org.apache.parquet.filter2.predicate._ import org.apache.parquet.filter2.predicate.FilterApi._ import org.apache.parquet.io.api.Binary -import org.apache.parquet.schema.PrimitiveComparator +import org.apache.parquet.schema.{DecimalMetadata, MessageType, OriginalType, PrimitiveComparator, PrimitiveType} +import org.apache.parquet.schema.OriginalType._ +import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName +import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName._ import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLDate import org.apache.spark.sql.sources -import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String /** * Some utility function to convert Spark data source filters to Parquet filters. */ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: Boolean) { + private case class ParquetSchemaType( + originalType: OriginalType, + primitiveTypeName: PrimitiveTypeName, + decimalMetadata: DecimalMetadata) + + private val ParquetBooleanType = ParquetSchemaType(null, BOOLEAN, null) + private val ParquetIntegerType = ParquetSchemaType(null, INT32, null) + private val ParquetLongType = ParquetSchemaType(null, INT64, null) + private val ParquetFloatType = ParquetSchemaType(null, FLOAT, null) + private val ParquetDoubleType = ParquetSchemaType(null, DOUBLE, null) + private val ParquetStringType = ParquetSchemaType(UTF8, BINARY, null) + private val ParquetBinaryType = ParquetSchemaType(null, BINARY, null) + private val ParquetDateType = ParquetSchemaType(DATE, INT32, null) + private def dateToDays(date: Date): SQLDate = { DateTimeUtils.fromJavaDate(date) } - private val makeEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { -case BooleanType => + private val makeEq: PartialFunction[ParquetSchemaType, (String, Any) => FilterPredicate] = { +case ParquetBooleanType => (n: String, v: Any) => FilterApi.eq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) -case IntegerType => +case ParquetIntegerType => (n: String, v: Any) => FilterApi.eq(intColumn(n), v.asInstanceOf[Integer]) -case LongType => +case ParquetLongType => (n: String, v: Any) => FilterApi.eq(longColumn(n), v.asInstanceOf[java.lang.Long]) -case FloatType => +case ParquetFloatType => (n: String, v: Any) => FilterApi.eq(floatColumn(n), v.asInstanceOf[java.lang.Float]) -case DoubleType => +case ParquetDoubleType => (n: String, v: Any) => FilterApi.eq(doubleColumn(n), v.asInstanceOf[java.lang.Double]) // Binary.fromString and Binary.fromByteArray don't accept null values -case StringType => +case ParquetStringType => (n: String, v: Any) => FilterApi.eq( binaryColumn(n), Option(v).map(s => Binary.fromString(s.asInstanceOf[String])).orNull) -case BinaryType => +case ParquetBinaryType => (n: String, v: Any) => FilterApi.eq( binaryColumn(n), Option(v).map(b => Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull) -case DateType if pushDownDate => +case ParquetDateType if pushDownDate => (n: String, v: Any) => FilterApi.eq( intColumn(n), Option(v).map(date => dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull) } - private val makeNotEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { -case BooleanType => + private val makeNotEq: PartialFunction[ParquetSchemaType, (String, Any) => FilterPredicate] = { +case ParquetBooleanType => (n: String, v: Any) => FilterApi.notEq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) -case IntegerType => +case ParquetIntegerType => (n: String, v: Any) => FilterApi.notEq(intColumn(n), v.asInstanceOf[Integer]) -case LongType => +case ParquetLongType => (n: String, v: Any) => FilterApi.notEq(longColumn(n), v.asInstanceOf[java.lang.Long]) -case FloatType => +case ParquetFloatType => (n: String, v: Any) =>
[GitHub] spark issue #21668: [SPARK-24690][SQL] Add a new config to control plan stat...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21668 One of refactoring ideas is to inject the functionality of `ReorderJoin`(=`StarSchemaDetection`) into `CostBasedJoinReorder`; In [the batch rule `Join Reorder` (`Once` strategy)](https://github.com/apache/spark/blob/7c08eb6d61d55ce45229f3302e6d463e7669183d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L139), if `spark.sql.cbo.starSchemaDetection` enabled (false by default), the rule applies star schema detection first. If a fact table found, [dimension tables are reordered](https://github.com/apache/spark/blob/7c08eb6d61d55ce45229f3302e6d463e7669183d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/StarSchemaDetection.scala#L342) by the cost-based algorithm. If `spark.sql.cbo.starSchemaDetection` disabled, the rule just uses `CostBasedJoinReorder`. Currently, we have `ReorderJoin`(=`StarSchemaDetection`) in the batch rule with `fixedPoint` strategy, so, I thnk that, if we could remove this rule from there, we would skip unnecessary checks caused by `ReorderJoin` per rule iteration. @cloud-fan WDYT? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21696: [SPARK-24716][SQL] Refactor ParquetFilters
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21696 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/660/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92585/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21542 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21542 **[Test build #92585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92585/testReport)** for PR 21542 at commit [`1250a92`](https://github.com/apache/spark/commit/1250a924e4d6116169b443c5346a5ee7fb6a4e40). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92586/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21705 **[Test build #92586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92586/testReport)** for PR 21705 at commit [`0c4644e`](https://github.com/apache/spark/commit/0c4644e3b03457bad09b7abc415b151d9998bbf5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21710 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92592/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21710 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21710 **[Test build #92592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92592/testReport)** for PR 21710 at commit [`5ed11e6`](https://github.com/apache/spark/commit/5ed11e67703dc7dfb23fb7ff68acffde33c13a30). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21711 **[Test build #92594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92594/testReport)** for PR 21711 at commit [`dbc300e`](https://github.com/apache/spark/commit/dbc300edb56b6e813c926b061e780378ee564778). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21711 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/659/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21711: [SPARK-24681][SQL] Verify nested column names in Hive me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21711 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21711: [SPARK-24681][SQL] Verify nested column names in ...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/21711 [SPARK-24681][SQL] Verify nested column names in Hive metastore ## What changes were proposed in this pull request? This pr added code to check if nested column names do not include ',', ':', and ';' because Hive metastore can't handle these characters in nested column names; ref: https://github.com/apache/hive/blob/release-1.2.1/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/TypeInfoUtils.java#L239 ## How was this patch tested? Added tests in `SQLQuerySuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-24681 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21711.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21711 commit dbc300edb56b6e813c926b061e780378ee564778 Author: Takeshi Yamamuro Date: 2018-07-04T04:07:04Z Fix --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21703: [SPARK-24732][SQL] Type coercion between MapTypes...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21703#discussion_r25967 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -179,6 +179,12 @@ object TypeCoercion { .orElse((t1, t2) match { case (ArrayType(et1, containsNull1), ArrayType(et2, containsNull2)) => findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || containsNull2)) +case (MapType(kt1, vt1, valueContainsNull1), MapType(kt2, vt2, valueContainsNull2)) => --- End diff -- Sure, let me think of it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21703: [SPARK-24732][SQL] Type coercion between MapTypes...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21703#discussion_r25737 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -179,6 +179,12 @@ object TypeCoercion { .orElse((t1, t2) match { case (ArrayType(et1, containsNull1), ArrayType(et2, containsNull2)) => findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || containsNull2)) +case (MapType(kt1, vt1, valueContainsNull1), MapType(kt2, vt2, valueContainsNull2)) => --- End diff -- not related to this PR, but shall we also handle struct type here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21703: [SPARK-24732][SQL] Type coercion between MapTypes...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21703 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21703: [SPARK-24732][SQL] Type coercion between MapTypes.
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21703 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21707 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21707 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/658/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21707 **[Test build #92593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92593/testReport)** for PR 21707 at commit [`cd9bdca`](https://github.com/apache/spark/commit/cd9bdcaeaab8d5e20747db21b7d6d9653cddaccb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21707: Update for spark 2.2.2 release
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21707 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92583/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21710 **[Test build #92592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92592/testReport)** for PR 21710 at commit [`5ed11e6`](https://github.com/apache/spark/commit/5ed11e67703dc7dfb23fb7ff68acffde33c13a30). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21705 **[Test build #92583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92583/testReport)** for PR 21705 at commit [`cb95ea9`](https://github.com/apache/spark/commit/cb95ea918fd6f41eb057f890c0e5579a6083a2c2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21710 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/657/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21710: [SPARK-24207][R]add R API for PrefixSpan
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21710 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21710: [SPARK-24207][R]add R API for PrefixSpan
GitHub user huaxingao opened a pull request: https://github.com/apache/spark/pull/21710 [SPARK-24207][R]add R API for PrefixSpan ## What changes were proposed in this pull request? add R API for PrefixSpan ## How was this patch tested? add test in test_mllib_fpm.R You can merge this pull request into a Git repository by running: $ git pull https://github.com/huaxingao/spark spark-24207 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21710.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21710 commit 5ed11e67703dc7dfb23fb7ff68acffde33c13a30 Author: Huaxin Gao Date: 2018-07-04T03:18:08Z [SPARK-24207][R]add R API for PrefixSpan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21667: [SPARK-24691][SQL]Dispatch the type support check...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/21667#discussion_r19766 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala --- @@ -152,6 +152,12 @@ trait FileFormat { } } + /** + * Validate the given [[DataType]] in read/write path for this file format. + * If the [[DataType]] is not supported, an exception will be thrown. + * By default all data types are supported. + */ + def validateDataType(dataType: DataType, isReadPath: Boolean): Unit = {} --- End diff -- Yes, that was what I did in first commit. If the unsupported type is inside struct/array, then the error message is not accurate as the current way. I am OK with revert to return Boolean though. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21704: [SPARK-24734][SQL] Fix containsNull of Concat for...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/21704#discussion_r19346 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends Expression { } } - override def dataType: DataType = children.map(_.dataType).headOption.getOrElse(StringType) + override def dataType: DataType = { +val dataTypes = children.map(_.dataType) +dataTypes.headOption.map { + case ArrayType(et, _) => +ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt +}.getOrElse(StringType) + } --- End diff -- Actually, `Concat` for array type has the type coercion to add casts to make all children the same type, but we also have the optimization `SimplifyCasts` to remove unnecessary casts which might remove casts from arrays not contains null to arrays contains null ([optimizer/expressions.scala#L611](https://github.com/apache/spark/blob/d87a8c6c0d1a4db5c9444781160a65562f8ea738/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala#L611)). E.g., `concat(array(1,2,3), array(4,null,6))` might generate a wrong data type during the execution. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21705 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92584/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21705: [SPARK-24727][SQL] Add a static config to control cache ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21705 **[Test build #92584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92584/testReport)** for PR 21705 at commit [`bc8a21a`](https://github.com/apache/spark/commit/bc8a21af0407f106ee64d3e5b6d4aed8bbf80688). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21696: [SPARK-24716][SQL] Refactor ParquetFilters
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21696#discussion_r19002 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -19,166 +19,186 @@ package org.apache.spark.sql.execution.datasources.parquet import java.sql.Date +import scala.collection.JavaConverters._ + import org.apache.parquet.filter2.predicate._ import org.apache.parquet.filter2.predicate.FilterApi._ import org.apache.parquet.io.api.Binary -import org.apache.parquet.schema.PrimitiveComparator +import org.apache.parquet.schema._ +import org.apache.parquet.schema.OriginalType._ +import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName._ import org.apache.spark.sql.catalyst.util.DateTimeUtils import org.apache.spark.sql.catalyst.util.DateTimeUtils.SQLDate import org.apache.spark.sql.sources -import org.apache.spark.sql.types._ import org.apache.spark.unsafe.types.UTF8String /** * Some utility function to convert Spark data source filters to Parquet filters. */ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: Boolean) { + case class ParquetSchemaType( + originalType: OriginalType, + primitiveTypeName: PrimitiveType.PrimitiveTypeName, + decimalMetadata: DecimalMetadata) + private def dateToDays(date: Date): SQLDate = { DateTimeUtils.fromJavaDate(date) } - private val makeEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { -case BooleanType => + private val makeEq: PartialFunction[ParquetSchemaType, (String, Any) => FilterPredicate] = { +// BooleanType +case ParquetSchemaType(null, BOOLEAN, null) => (n: String, v: Any) => FilterApi.eq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) -case IntegerType => +// IntegerType +case ParquetSchemaType(null, INT32, null) => (n: String, v: Any) => FilterApi.eq(intColumn(n), v.asInstanceOf[Integer]) -case LongType => +// LongType +case ParquetSchemaType(null, INT64, null) => (n: String, v: Any) => FilterApi.eq(longColumn(n), v.asInstanceOf[java.lang.Long]) -case FloatType => +// FloatType +case ParquetSchemaType(null, FLOAT, null) => (n: String, v: Any) => FilterApi.eq(floatColumn(n), v.asInstanceOf[java.lang.Float]) -case DoubleType => +// DoubleType +case ParquetSchemaType(null, DOUBLE, null) => (n: String, v: Any) => FilterApi.eq(doubleColumn(n), v.asInstanceOf[java.lang.Double]) +// StringType // Binary.fromString and Binary.fromByteArray don't accept null values -case StringType => +case ParquetSchemaType(UTF8, BINARY, null) => (n: String, v: Any) => FilterApi.eq( binaryColumn(n), Option(v).map(s => Binary.fromString(s.asInstanceOf[String])).orNull) -case BinaryType => +// BinaryType +case ParquetSchemaType(null, BINARY, null) => (n: String, v: Any) => FilterApi.eq( binaryColumn(n), Option(v).map(b => Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull) -case DateType if pushDownDate => +// DateType +case ParquetSchemaType(DATE, INT32, null) if pushDownDate => (n: String, v: Any) => FilterApi.eq( intColumn(n), Option(v).map(date => dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull) } - private val makeNotEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { -case BooleanType => + private val makeNotEq: PartialFunction[ParquetSchemaType, (String, Any) => FilterPredicate] = { +case ParquetSchemaType(null, BOOLEAN, null) => (n: String, v: Any) => FilterApi.notEq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) -case IntegerType => +case ParquetSchemaType(null, INT32, null) => (n: String, v: Any) => FilterApi.notEq(intColumn(n), v.asInstanceOf[Integer]) -case LongType => +case ParquetSchemaType(null, INT64, null) => (n: String, v: Any) => FilterApi.notEq(longColumn(n), v.asInstanceOf[java.lang.Long]) -case FloatType => +case ParquetSchemaType(null, FLOAT, null) => (n: String, v: Any) => FilterApi.notEq(floatColumn(n), v.asInstanceOf[java.lang.Float]) -case DoubleType => +case ParquetSchemaType(null, DOUBLE, null) => (n: String, v: Any) => FilterApi.notEq(doubleColumn(n), v.asInstanceOf[java.lang.Double]) -case StringType => +case ParquetSchemaType(UTF8, BINARY, null) => (n: String, v: Any) => FilterApi.notEq(
[GitHub] spark issue #21655: [SPARK-24675][SQL]Rename table: validate existence of ne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21655 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21655: [SPARK-24675][SQL]Rename table: validate existence of ne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21655 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/656/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21611 ping @cloud-fan for taking a look again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21619: [SPARK-24635][SQL] Remove Blocks class from JavaCode cla...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21619 @cloud-fan Any more thing I should do for this to make this in? Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21655: [SPARK-24675][SQL]Rename table: validate existence of ne...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21655 **[Test build #92591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92591/testReport)** for PR 21655 at commit [`18418c9`](https://github.com/apache/spark/commit/18418c902590b066c6173c1cb33d58a2aef5d6c6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21701 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92582/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21701 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21701: [SPARK-24730][SS] Add policy to choose max as global wat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21701 **[Test build #92582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92582/testReport)** for PR 21701 at commit [`c0d1c6e`](https://github.com/apache/spark/commit/c0d1c6e0a5532eeab0848834d2dc348808e54069). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `sealed trait MultipleWatermarkPolicy ` * `case class WatermarkTracker(policy: MultipleWatermarkPolicy) extends Logging ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21709: [SPARK-5152][CORE] Read metrics config file from Hadoop ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21709 If you want metrics conf to be centralized without needing to put it on to different nodes, you can set it through `SparkConf` with prefix "spark.metrics.conf.", MetricsSystem also supports getting configurations from `SparkConf`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21708: [BUILD] Close stale PRs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21708 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92579/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21708: [BUILD] Close stale PRs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21708 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21708: [BUILD] Close stale PRs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21708 **[Test build #92579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92579/testReport)** for PR 21708 at commit [`dad2b46`](https://github.com/apache/spark/commit/dad2b4602f4854ab941014cc4ec7535d3d74d2f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21709: [SPARK-5152][CORE] Read metrics config file from Hadoop ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21709 Hi @jzhuge what is the purpose of supporting reading metrics conf from HDFS/S3? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21704: [SPARK-24734][SQL] Fix containsNull of Concat for...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21704#discussion_r14021 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2007,7 +2007,14 @@ case class Concat(children: Seq[Expression]) extends Expression { } } - override def dataType: DataType = children.map(_.dataType).headOption.getOrElse(StringType) + override def dataType: DataType = { +val dataTypes = children.map(_.dataType) +dataTypes.headOption.map { + case ArrayType(et, _) => +ArrayType(et, dataTypes.exists(_.asInstanceOf[ArrayType].containsNull)) + case dt => dt +}.getOrElse(StringType) + } --- End diff -- Can't we handle this case in type coercion (analysis phase)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/655/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21657 **[Test build #92590 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92590/testReport)** for PR 21657 at commit [`fc2108e`](https://github.com/apache/spark/commit/fc2108e52ee987d7ca3d4cce811ad9fc4e462c47). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21657: [SPARK-24676][SQL] Project required data from CSV parsed...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21657 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21682 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/654/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21682 **[Test build #92589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92589/testReport)** for PR 21682 at commit [`b16a607`](https://github.com/apache/spark/commit/b16a6076b55bd2e1f01ed66ea7f53d2f915888be). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21682: [SPARK-24706][SQL] ByteType and ShortType support pushdo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21682 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r12580 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.catalog; + +import org.apache.spark.sql.types.DataType; + +/** + * TableChange subclasses represent requested changes to a table. These are passed to + * {@link DataSourceCatalog#alterTable}. + */ +public interface TableChange { --- End diff -- This is great! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r12550 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/TableChange.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.catalog; + +import org.apache.spark.sql.types.DataType; + +/** + * TableChange subclasses represent requested changes to a table. These are passed to + * {@link DataSourceCatalog#alterTable}. + */ +public interface TableChange { + + /** + * Create a TableChange for adding a top-level column to a table. + * + * Because "." may be interpreted as a field path separator or may be used in field names, it is + * not allowed in names passed to this method. To add to nested types or to add fields with + * names that contain ".", use {@link #addColumn(String, String, DataType)}. + * + * @param name the new top-level column name + * @param dataType the new column's data type + * @return a TableChange for the addition + */ + static TableChange addColumn(String name, DataType dataType) { +return new AddColumn(null, name, dataType); + } + + /** + * Create a TableChange for adding a nested column to a table. + * + * The parent name is used to find the parent struct type where the nested field will be added. + * If the parent name is null, the new column will be added to the root as a top-level column. + * If parent identifies a struct, a new column is added to that struct. If it identifies a list, + * the column is added to the list element struct, and if it identifies a map, the new column is + * added to the map's value struct. + * + * The given name is used to name the new column and names containing "." are not handled + * differently. + * + * @param parent the new field's parent + * @param name the new field name + * @param dataType the new field's data type + * @return a TableChange for the addition + */ + static TableChange addColumn(String parent, String name, DataType dataType) { +return new AddColumn(parent, name, dataType); + } + + /** + * Create a TableChange for renaming a field. + * + * The name is used to find the field to rename. The new name will replace the name of the type. + * For example, renameColumn("a.b.c", "x") should produce column a.b.x. --- End diff -- It's great to have an example to show how to use this API, can we add an example to all the methods here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r12294 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -42,6 +42,10 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: private val makeEq: PartialFunction[DataType, (String, Any) => FilterPredicate] = { case BooleanType => (n: String, v: Any) => FilterApi.eq(booleanColumn(n), v.asInstanceOf[java.lang.Boolean]) +case ByteType | ShortType => + (n: String, v: Any) => FilterApi.eq( +intColumn(n), + Option(v).map(_.asInstanceOf[Number].intValue.asInstanceOf[Integer]).orNull) --- End diff -- value may be `null`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21682: [SPARK-24706][SQL] ByteType and ShortType support...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21682#discussion_r12316 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -93,6 +101,10 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean, pushDownStartWith: } private val makeLt: PartialFunction[DataType, (String, Any) => FilterPredicate] = { +case ByteType | ShortType => + (n: String, v: Any) => FilterApi.lt( +intColumn(n), +v.asInstanceOf[Number].intValue.asInstanceOf[Integer]) --- End diff -- value cannot be `null`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r12200 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/catalog/Table.java --- @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2.catalog; + +import org.apache.spark.sql.catalyst.expressions.Expression; +import org.apache.spark.sql.types.StructType; + +import java.util.List; +import java.util.Map; + +/** + * Represents table metadata from a {@link DataSourceCatalog}. + */ +public interface Table { --- End diff -- this is something we should decide now. IMO `schema` and `properties` are must-have, but others may not. e.g. if a data source uses a path to lookup table, then there is no database/table name to it. And we don't have a story to deal with partitions yet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21700 **[Test build #92588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92588/testReport)** for PR 21700 at commit [`d8b4bb8`](https://github.com/apache/spark/commit/d8b4bb84bc9216ebe9b31f8992e6d59e975b377d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for ca...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21306#discussion_r11918 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/CatalogSupport.java --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources.v2; + +import org.apache.spark.sql.sources.v2.catalog.DataSourceCatalog; + +/** + * A mix-in interface for {@link DataSourceV2} catalog support. Data sources can implement this + * interface to provide the ability to load, create, alter, and drop tables. + * + * Data sources must implement this interface to support logical operations that combine writing + * data with catalog tasks, like create-table-as-select. + */ +public interface CatalogSupport { --- End diff -- After thinking about it more, what we really need in the near future is all about table: create/alter/lookup/drop tables, instead of how the tables are organized, like databases, and how other information is stored, like view/function. How about we call it `TableSupport`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21681: Pin tag 210
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21681 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17843: [Streaming] groupByKey should also disable map si...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17843 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21336: [SPARK-24286][Documentation] DataFrameReader.csv ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21336 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21691: Branch 2.2
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21691 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #8849: [SPARK-9883][MLlib] Distance to each kmean cluster...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8849 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13477: [SPARK-15739][GraphX] Expose aggregateMessagesWit...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13477 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17907: SPARK-7856 Principal components and variance usin...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17907 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21076: Creating KafkaStreamToCassandra
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21076 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21700 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21507: Branch 1.6
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21507 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20809: [SPARK-23667][CORE] Better scala version check
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20809 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20932: [SPARK-23812][SQL] DFS should be removed from uns...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20932 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21708: [BUILD] Close stale PRs
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21708 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18766: [SPARK-8288][SQL] ScalaReflection can use compani...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18766 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21700 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92587/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14291: [SPARK-16658][GRAPHX] Add EdgePartition.withVerte...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14291 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20919: Feature/apply func to rdd
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20919 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21700 **[Test build #92587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92587/testReport)** for PR 21700 at commit [`c50da7b`](https://github.com/apache/spark/commit/c50da7b40645e8c6d8c1530cf3497ef2d3a09857). * This patch **fails Java style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21403 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21403 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92581/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21403 **[Test build #92581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92581/testReport)** for PR 21403 at commit [`d3e39ed`](https://github.com/apache/spark/commit/d3e39ed3f442958cfaaa1ef056cb72fedf0fce1c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org