[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16826#discussion_r103073487 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ExperimentalMethods.scala --- @@ -46,4 +46,10 @@ class ExperimentalMethods private[sql]() { @volatile var extraOptimizations: Seq[Rule[LogicalPlan]] = Nil + override def clone(): ExperimentalMethods = { --- End diff -- It sounds like we also need to add sync for both `extraStrategies ` and `extraOptimizations ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16826: [SPARK-19540][SQL] Add ability to clone SparkSess...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16826#discussion_r103073400 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -1178,4 +1181,36 @@ class SessionCatalog( } } + /** + * Get an identical copy of the `SessionCatalog`. + * The temporary tables and function registry are retained. --- End diff -- `temporary tables` -> `temporary views` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17027: [SPARK-19650] Commands should not trigger a Spark...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17027 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17027: [SPARK-19650] Commands should not trigger a Spark job
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17027 merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17063: [SPARK-19735][SQL] Remove HOLD_DDLTIME from Catal...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17063 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17027: [SPARK-19650] Commands should not trigger a Spark...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17027#discussion_r103073282 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -175,19 +175,14 @@ class Dataset[T] private[sql]( } @transient private[sql] val logicalPlan: LogicalPlan = { -def hasSideEffects(plan: LogicalPlan): Boolean = plan match { - case _: Command | - _: InsertIntoTable => true - case _ => false -} - +// For various commands (like DDL) and queries with side effects, we force query execution +// to happen right away to let these side effects take place eagerly. queryExecution.analyzed match { // For various commands (like DDL) and queries with side effects, we force query execution --- End diff -- actually let me remove it while merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17063: [SPARK-19735][SQL] Remove HOLD_DDLTIME from Catalog APIs
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17063 LGTM, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17065: [SPARK-17075][SQL][followup] fix some minor issues and c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17065 **[Test build #73466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73466/testReport)** for PR 17065 at commit [`5fa69b3`](https://github.com/apache/spark/commit/5fa69b36294940e1406bb3b1515539decbb9e03a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17065: [SPARK-17075][SQL][followup] fix some minor issues and c...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17065 CC @ron8hu @wzhfy --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17060: [SQL] Duplicate test exception in SQLQueryTestSuite due ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17060 **[Test build #73463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73463/testReport)** for PR 17060 at commit [`4837de8`](https://github.com/apache/spark/commit/4837de867c0e93a9b2801cce352573d68684aa35). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17060: [SQL] Duplicate test exception in SQLQueryTestSuite due ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17060 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17060: [SQL] Duplicate test exception in SQLQueryTestSuite due ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17060 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73463/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17065: [SPARK-17075][SQL][followup] fix some minor issue...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17065#discussion_r103073176 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -361,57 +343,52 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo */ def evaluateInSet( - attrRef: AttributeReference, + attr: Attribute, hSet: Set[Any], - update: Boolean) -: Option[Double] = { -if (!mutableColStats.contains(attrRef.exprId)) { - logDebug("[CBO] No statistics for " + attrRef) + update: Boolean): Option[Double] = { +if (!colStatsMap.contains(attr)) { + logDebug("[CBO] No statistics for " + attr) return None } -val aColStat = mutableColStats(attrRef.exprId) -val ndv = aColStat.distinctCount -val aType = attrRef.dataType -var newNdv: Long = 0 +val colStat = colStatsMap(attr) +val ndv = colStat.distinctCount +val dataType = attr.dataType +var newNdv = ndv // use [min, max] to filter the original hSet -aType match { - case _: NumericType | DateType | TimestampType => -val statsRange = - Range(aColStat.min, aColStat.max, aType).asInstanceOf[NumericRange] - -// To facilitate finding the min and max values in hSet, we map hSet values to BigDecimal. -// Using hSetBigdec, we can find the min and max values quickly in the ordered hSetBigdec. -val hSetBigdec = hSet.map(e => BigDecimal(e.toString)) -val validQuerySet = hSetBigdec.filter(e => e >= statsRange.min && e <= statsRange.max) -// We use hSetBigdecToAnyMap to help us find the original hSet value. -val hSetBigdecToAnyMap: Map[BigDecimal, Any] = - hSet.map(e => BigDecimal(e.toString) -> e).toMap +dataType match { + case _: NumericType | BooleanType | DateType | TimestampType => --- End diff -- add boolean type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17065: [SPARK-17075][SQL][followup] fix some minor issue...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17065#discussion_r103073163 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -297,6 +278,8 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo Some(DateTimeUtils.toJavaDate(litValue.toString.toInt)) case TimestampType => Some(DateTimeUtils.toJavaTimestamp(litValue.toString.toLong)) + case _: DecimalType => +Some(litValue.asInstanceOf[Decimal].toJavaBigDecimal) --- End diff -- @ron8hu the external value type of `DecimalType` is java decimal, and the internal value type is `Decimal`, we need to convert it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17065: [SPARK-17075][SQL][followup] fix some minor issue...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17065#discussion_r103073122 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -258,27 +246,20 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo */ def evaluateBinary( op: BinaryComparison, - attrRef: AttributeReference, + attr: Attribute, literal: Literal, - update: Boolean) -: Option[Double] = { -if (!mutableColStats.contains(attrRef.exprId)) { - logDebug("[CBO] No statistics for " + attrRef) - return None -} - -op match { - case EqualTo(l, r) => evaluateEqualTo(attrRef, literal, update) + update: Boolean): Option[Double] = { +attr.dataType match { + case _: NumericType | DateType | TimestampType => +evaluateBinaryForNumeric(op, attr, literal, update) + case StringType | BinaryType => +// TODO: It is difficult to support other binary comparisons for String/Binary +// type without min/max and advanced statistics like histogram. +logDebug("[CBO] No range comparison statistics for String/Binary type " + attr) +None case _ => -attrRef.dataType match { - case _: NumericType | DateType | TimestampType => -evaluateBinaryForNumeric(op, attrRef, literal, update) - case StringType | BinaryType => --- End diff -- previously we totally missed `BooleanType` and will throw `MatchError` if the attribute is bool. But the logic in `evaluateBinaryForNumeric` doesn't work for boolean, so I treat it as supported for now. @wzhfy do you have time to work on it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17065: [SPARK-17075][SQL][followup] fix some minor issue...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17065#discussion_r103073081 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -140,56 +129,56 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo * @param condition a single logical expression * @param update a boolean flag to specify if we need to update ColumnStat of a column * for subsequent conditions - * @return Option[Double] value to show the percentage of rows meeting a given condition. + * @return an optional double value to show the percentage of rows meeting a given condition. * It returns None if the condition is not supported. */ def calculateSingleCondition(condition: Expression, update: Boolean): Option[Double] = { condition match { // For evaluateBinary method, we assume the literal on the right side of an operator. // So we will change the order if not. - // EqualTo does not care about the order - case op @ EqualTo(ar: AttributeReference, l: Literal) => -evaluateBinary(op, ar, l, update) - case op @ EqualTo(l: Literal, ar: AttributeReference) => -evaluateBinary(op, ar, l, update) + // EqualTo/EqualNullSafe does not care about the order --- End diff -- also support `EqualNullSafe` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17065: [SPARK-17075][SQL][followup] fix some minor issue...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17065#discussion_r103073076 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala --- @@ -95,15 +84,16 @@ case class FilterEstimation(plan: Filter, catalystConf: CatalystConf) extends Lo * @param condition the compound logical expression * @param update a boolean flag to specify if we need to update ColumnStat of a column * for subsequent conditions - * @return a double value to show the percentage of rows meeting a given condition. + * @return an optional double value to show the percentage of rows meeting a given condition. * It returns None if the condition is not supported. */ def calculateFilterSelectivity(condition: Expression, update: Boolean = true): Option[Double] = { - condition match { case And(cond1, cond2) => -(calculateFilterSelectivity(cond1, update), calculateFilterSelectivity(cond2, update)) -match { +// For ease of debugging, we compute percent1 and percent2 in 2 statements. +val percent1 = calculateFilterSelectivity(cond1, update) +val percent2 = calculateFilterSelectivity(cond2, update) +(percent1, percent2) match { case (Some(p1), Some(p2)) => Some(p1 * p2) case (Some(p1), None) => Some(p1) --- End diff -- here we are actually over-estimating: if a condition is unsupported in `And`, we assume it's 100% selectivity, which may leads to under-estimation if this `And` is wrapped by `Not`. We should 1. if one condition is unsupported, this `And` is unsupported 2. do not handle nested `Not` cc @wzhfy @ron8hu --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17065: [SPARK-17075][SQL][followup] fix some minor issues and c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17065 **[Test build #73465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73465/testReport)** for PR 17065 at commit [`04cc681`](https://github.com/apache/spark/commit/04cc6811445790a636a534c42e2caf053a238eb8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17065: [SPARK-17075][SQL][followup] fix some minor issue...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/17065 [SPARK-17075][SQL][followup] fix some minor issues and clean up the code ## What changes were proposed in this pull request? This fixes some code style issues, naming issues, some missing cases in pattern match, etc. ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark follow-up Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17065.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17065 commit 04cc6811445790a636a534c42e2caf053a238eb8 Author: Wenchen Fan Date: 2017-02-25T01:06:37Z fix some minor issues and clean up the code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17063: [SPARK-19735][SQL] Remove HOLD_DDLTIME from Catalog APIs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17063 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17063: [SPARK-19735][SQL] Remove HOLD_DDLTIME from Catalog APIs
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17063 cc @cloud-fan @yhuai @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17063: [SPARK-19735][SQL] Remove HOLD_DDLTIME from Catalog APIs
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17063 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73462/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16592: [SPARK-19235] [SQL] [TESTS] Enable Test Cases in DDLSuit...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16592 Let me continue the work now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17063: [SPARK-19735][SQL] Remove HOLD_DDLTIME from Catalog APIs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17063 **[Test build #73462 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73462/testReport)** for PR 17063 at commit [`d4ac13c`](https://github.com/apache/spark/commit/d4ac13cf3a0ac3d4734eaa54dad50501374b7405). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17064: [SPARK-19736][SQL] refreshByPath should clear all cached...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17064 **[Test build #73464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73464/testReport)** for PR 17064 at commit [`dd6d8ca`](https://github.com/apache/spark/commit/dd6d8ca1c091d00bfc29363ebc0d518b12927325). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17064: [SPARK-19736][SQL] refreshByPath should clear all...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/17064 [SPARK-19736][SQL] refreshByPath should clear all cached plans with the specified path ## What changes were proposed in this pull request? `Catalog.refreshByPath` can refresh the cache entry and the associated metadata for all dataframes (if any), that contain the given data source path. However, `CacheManager.invalidateCachedPath` doesn't clear all cached plans with the specified path. It causes some strange behaviors reported in SPARK-15678. ## How was this patch tested? Jenkins tests. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 fix-refreshByPath Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17064.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17064 commit dd6d8ca1c091d00bfc29363ebc0d518b12927325 Author: Liang-Chi Hsieh Date: 2017-02-25T05:58:40Z refreshByPath should clear all cached plans with the specified path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17060: [SQL] Duplicate test exception in SQLQueryTestSuite due ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17060 **[Test build #73463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73463/testReport)** for PR 17060 at commit [`4837de8`](https://github.com/apache/spark/commit/4837de867c0e93a9b2801cce352573d68684aa35). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14789 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14789 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73461/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14789 **[Test build #73461 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73461/consoleFull)** for PR 14789 at commit [`b1cac5c`](https://github.com/apache/spark/commit/b1cac5c95f3c7f77a8253a88aff48d2a1934072c). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17060: [SQL] Duplicate test exception in SQLQueryTestSui...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/17060#discussion_r103071356 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -241,7 +243,8 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { private def listTestCases(): Seq[TestCase] = { listFilesRecursively(new File(inputFilePath)).map { file => val resultFile = file.getAbsolutePath.replace(inputFilePath, goldenFilePath) + ".out" - TestCase(file.getName, file.getAbsolutePath, resultFile) + val absPath = file.getAbsolutePath + TestCase(absPath.stripPrefix(inputFilePath + File.separator), absPath, resultFile) --- End diff -- @srowen Sure. Although i think we are safe, i would go ahead and make that change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-hash
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17056 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-hash
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73460/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-hash
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17056 **[Test build #73460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73460/testReport)** for PR 17056 at commit [`8595305`](https://github.com/apache/spark/commit/8595305c2dc3b276d6390724ca1f1469794540f5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17060: [SQL] Duplicate test exception in SQLQueryTestSui...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17060#discussion_r103071094 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -98,7 +98,9 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { /** List of test cases to ignore, in lower cases. */ private val blackList = Set( -"blacklist.sql" // Do NOT remove this one. It is here to test the blacklist functionality. +"blacklist.sql", // Do NOT remove this one. It is here to test the blacklist functionality. +".ds_store" // A meta-file that may be created on Mac by Finder App. --- End diff -- That's right, you'd have to lower-case both. I think that's clearer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17060: [SQL] Duplicate test exception in SQLQueryTestSui...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/17060#discussion_r103070909 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -98,7 +98,9 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { /** List of test cases to ignore, in lower cases. */ private val blackList = Set( -"blacklist.sql" // Do NOT remove this one. It is here to test the blacklist functionality. +"blacklist.sql", // Do NOT remove this one. It is here to test the blacklist functionality. +".ds_store" // A meta-file that may be created on Mac by Finder App. --- End diff -- @srowen I did think about it. If we want to keep the mixed case here in this list, then we have to change the code later to lowercase both the sides and then compare ? Would you prefer to change the comparison to the following ? ``` if (blackList.exists(t => testCase.name.toLowerCase.contains(t.toLowerCase))) { ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17060: [SQL] Duplicate test exception in SQLQueryTestSui...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17060#discussion_r103070856 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -241,7 +243,8 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { private def listTestCases(): Seq[TestCase] = { listFilesRecursively(new File(inputFilePath)).map { file => val resultFile = file.getAbsolutePath.replace(inputFilePath, goldenFilePath) + ".out" - TestCase(file.getName, file.getAbsolutePath, resultFile) + val absPath = file.getAbsolutePath + TestCase(absPath.stripPrefix(inputFilePath + File.separator), absPath, resultFile) --- End diff -- Does this get into problems where inputFilePath might already have a trailing separator and then isn't a prefix? maybe it can't happen, but maybe it's easier to strip leading separators from the final result, directly, if that's what you mean to do. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17063: [SPARK-19735][SQL] Remove HOLD_DDLTIME from Catalog APIs
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17063 **[Test build #73462 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73462/testReport)** for PR 17063 at commit [`d4ac13c`](https://github.com/apache/spark/commit/d4ac13cf3a0ac3d4734eaa54dad50501374b7405). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16976 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16976 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73457/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16976 **[Test build #73457 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73457/testReport)** for PR 16976 at commit [`b7d56c6`](https://github.com/apache/spark/commit/b7d56c6b6db7df7fe12831efb8338787b3108c0e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17063: [SPARK-19735][SQL] Remove HOLD_DDLTIME from Catal...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/17063 [SPARK-19735][SQL] Remove HOLD_DDLTIME from Catalog APIs ### What changes were proposed in this pull request? As explained in Hive JIRA https://issues.apache.org/jira/browse/HIVE-12224, HOLD_DDLTIME was broken as soon as it landed. Hive 2.0 removes HOLD_DDLTIME from the API. In Spark SQL, we always set it to FALSE. Like Hive, we should also remove it from our Catalog APIs. ### How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark removalHoldDDLTime Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17063.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17063 commit d4ac13cf3a0ac3d4734eaa54dad50501374b7405 Author: Xiao Li Date: 2017-02-25T04:14:34Z fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17060: [SQL] Duplicate test exception in SQLQueryTestSui...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/17060#discussion_r103070711 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -241,7 +243,8 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { private def listTestCases(): Seq[TestCase] = { listFilesRecursively(new File(inputFilePath)).map { file => val resultFile = file.getAbsolutePath.replace(inputFilePath, goldenFilePath) + ".out" - TestCase(file.getName, file.getAbsolutePath, resultFile) + val absPath = file.getAbsolutePath + TestCase(absPath.stripPrefix(inputFilePath + File.separator), absPath, resultFile) --- End diff -- @srowen I am adding a separator to make sure the test case name does not have a leading separator character. Example: ``` file = /home/spark/sql/core/src/test/resources/sql-tests/inputs/subquery/exists-subquery/exists-basic.sql inputFilePath = /home/spark/sql/core/src/test/resources/sql-tests/inputs ``` So i wanted the test case name to be `subquery/exists-subquery/exists-basic.sql` and not `/subquery/exists-subquery/exists-basic.sql` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17060: [SQL] Duplicate test exception in SQLQueryTestSui...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/17060#discussion_r103070622 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -121,7 +123,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { } private def createScalaTestCase(testCase: TestCase): Unit = { -if (blackList.contains(testCase.name.toLowerCase)) { +if (blackList.exists(testCase.name.toLowerCase.contains(_))) { --- End diff -- @srowen Thanks !! I will change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17062 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17062 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73459/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17062 **[Test build #73459 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73459/testReport)** for PR 17062 at commit [`332475c`](https://github.com/apache/spark/commit/332475c1641f61080aa41dda9f1ceec237351d75). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16976 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73456/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16976 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16976 **[Test build #73456 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73456/testReport)** for PR 16976 at commit [`6187f6c`](https://github.com/apache/spark/commit/6187f6c0617fa0aec75d1eb638a00c6e7a350d61). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17061: [SPARK-13446] [SQL] Support reading data from Hive 2.0.1...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17061 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17061: [SPARK-13446] [SQL] Support reading data from Hive 2.0.1...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73458/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17061: [SPARK-13446] [SQL] Support reading data from Hive 2.0.1...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17061 **[Test build #73458 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73458/testReport)** for PR 17061 at commit [`60af17f`](https://github.com/apache/spark/commit/60af17f0178ba8ab7d7881c118915334e2c824eb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16976 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73455/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16976 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16976 **[Test build #73455 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73455/testReport)** for PR 16976 at commit [`b4e6983`](https://github.com/apache/spark/commit/b4e6983351192b788708ba3cd2a4c8e7321f34b0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17047: [SPARK-19720][SPARK SUBMIT] Redact sensitive info...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17047#discussion_r103069680 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2574,13 +2575,30 @@ private[spark] object Utils extends Logging { def redact(conf: SparkConf, kvs: Seq[(String, String)]): Seq[(String, String)] = { val redactionPattern = conf.get(SECRET_REDACTION_PATTERN).r +redact(redactionPattern, kvs) + } + + private def redact(redactionPattern: Regex, kvs: Seq[(String, String)]): Seq[(String, String)] = { kvs.map { kv => redactionPattern.findFirstIn(kv._1) .map { ignore => (kv._1, REDACTION_REPLACEMENT_TEXT) } .getOrElse(kv) } } + /** + * Looks up the redaction regex from within the key value pairs and uses it to redact the rest + * of the key value pairs. No care is taken to make sure the redaction property itself is not + * redacted. So theoretically, the property itself could be configured to redact its own value + * when printing. + * @param kvs + * @return + */ + def redact(kvs: Map[String, String]): Seq[(String, String)] = { --- End diff -- (Nit: I'd omit param and return if they're not filled in.) So this is used in cases where there isn't a conf object available yet, but the argument itself has the redaction config? I was slightly worried about the parallel implementation but that would be a reasonable reason to do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16976 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73454/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16976 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16976: [SPARK-19610][SQL] Support parsing multiline CSV files
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16976 **[Test build #73454 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73454/testReport)** for PR 16976 at commit [`321d082`](https://github.com/apache/spark/commit/321d082e4f0dbac34c39618adbded99d119e06ac). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14963 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73451/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14963 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14963: [SPARK-16992][PYSPARK] Virtualenv for Pylint and pep8 in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14963 **[Test build #73451 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73451/testReport)** for PR 14963 at commit [`215b7b3`](https://github.com/apache/spark/commit/215b7b34170f112c4448fba98b02a50dbb19b2a7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17052: [SPARK-19690][SS] Join a streaming DataFrame with a batc...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/17052 @zsxwing got it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r103068598 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.fpm + +import scala.collection.mutable.ArrayBuffer +import scala.reflect.ClassTag + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasPredictionCol} +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.fpm.{AssociationRules => MLlibAssociationRules, + FPGrowth => MLlibFPGrowth} +import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset +import org.apache.spark.sql._ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ + +/** + * Common params for FPGrowth and FPGrowthModel + */ +private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with HasPredictionCol { + + /** + * Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears + * more than (minSupport * size-of-the-dataset) times will be output + * Default: 0.3 + * @group param + */ + @Since("2.2.0") + val minSupport: DoubleParam = new DoubleParam(this, "minSupport", +"the minimal support level of a frequent pattern", +ParamValidators.inRange(0.0, 1.0)) + setDefault(minSupport -> 0.3) + + /** @group getParam */ + @Since("2.2.0") + def getMinSupport: Double = $(minSupport) + + /** + * Number of partitions (>=1) used by parallel FP-growth. By default the param is not set, and + * partition number of the input dataset is used. + * @group expertParam + */ + @Since("2.2.0") + val numPartitions: IntParam = new IntParam(this, "numPartitions", +"Number of partitions used by parallel FP-growth", ParamValidators.gtEq[Int](1)) + + /** @group expertGetParam */ + @Since("2.2.0") + def getNumPartitions: Int = $(numPartitions) + + /** + * Minimal confidence for generating Association Rule. + * Note that minConfidence has no effect during fitting. + * Default: 0.8 + * @group param + */ + @Since("2.2.0") + val minConfidence: DoubleParam = new DoubleParam(this, "minConfidence", +"minimal confidence for generating Association Rule", +ParamValidators.inRange(0.0, 1.0)) + setDefault(minConfidence -> 0.8) + + /** @group getParam */ + @Since("2.2.0") + def getMinConfidence: Double = $(minConfidence) + + /** + * Validates and transforms the input schema. + * @param schema input schema + * @return output schema + */ + @Since("2.2.0") + protected def validateAndTransformSchema(schema: StructType): StructType = { +val inputType = schema($(featuresCol)).dataType +require(inputType.isInstanceOf[ArrayType], + s"The input column must be ArrayType, but got $inputType.") +SchemaUtils.appendColumn(schema, $(predictionCol), schema($(featuresCol)).dataType) + } +} + +/** + * :: Experimental :: + * A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in + * http://dx.doi.org/10.1145/1454008.1454027";>Li et al., PFP: Parallel FP-Growth for Query + * Recommendation. PFP distributes computation in such a way that each worker executes an + * independent group of mining tasks. The FP-Growth algorithm is described in + * http://dx.doi.org/10.1145/335191.335372";>Han et al., Mining frequent patterns without + * candidate generation. Note null values in the feature column are ignored during fit(). + * + * @see http://en.wikipedia.org/wiki/Association_rule_learning";> + * Association rule learning (Wikiped
[GitHub] spark pull request #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/15415#discussion_r103068619 --- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala --- @@ -0,0 +1,339 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.fpm + +import scala.collection.mutable.ArrayBuffer +import scala.reflect.ClassTag + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.param._ +import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasPredictionCol} +import org.apache.spark.ml.util._ +import org.apache.spark.mllib.fpm.{AssociationRules => MLlibAssociationRules, + FPGrowth => MLlibFPGrowth} +import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset +import org.apache.spark.sql._ +import org.apache.spark.sql.functions._ +import org.apache.spark.sql.types._ + +/** + * Common params for FPGrowth and FPGrowthModel + */ +private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with HasPredictionCol { + + /** + * Minimal support level of the frequent pattern. [0.0, 1.0]. Any pattern that appears + * more than (minSupport * size-of-the-dataset) times will be output + * Default: 0.3 + * @group param + */ + @Since("2.2.0") + val minSupport: DoubleParam = new DoubleParam(this, "minSupport", +"the minimal support level of a frequent pattern", +ParamValidators.inRange(0.0, 1.0)) + setDefault(minSupport -> 0.3) + + /** @group getParam */ + @Since("2.2.0") + def getMinSupport: Double = $(minSupport) + + /** + * Number of partitions (>=1) used by parallel FP-growth. By default the param is not set, and + * partition number of the input dataset is used. + * @group expertParam + */ + @Since("2.2.0") + val numPartitions: IntParam = new IntParam(this, "numPartitions", +"Number of partitions used by parallel FP-growth", ParamValidators.gtEq[Int](1)) + + /** @group expertGetParam */ + @Since("2.2.0") + def getNumPartitions: Int = $(numPartitions) + + /** + * Minimal confidence for generating Association Rule. + * Note that minConfidence has no effect during fitting. + * Default: 0.8 + * @group param + */ + @Since("2.2.0") + val minConfidence: DoubleParam = new DoubleParam(this, "minConfidence", +"minimal confidence for generating Association Rule", +ParamValidators.inRange(0.0, 1.0)) + setDefault(minConfidence -> 0.8) + + /** @group getParam */ + @Since("2.2.0") + def getMinConfidence: Double = $(minConfidence) + + /** + * Validates and transforms the input schema. + * @param schema input schema + * @return output schema + */ + @Since("2.2.0") + protected def validateAndTransformSchema(schema: StructType): StructType = { +val inputType = schema($(featuresCol)).dataType +require(inputType.isInstanceOf[ArrayType], + s"The input column must be ArrayType, but got $inputType.") +SchemaUtils.appendColumn(schema, $(predictionCol), schema($(featuresCol)).dataType) + } +} + +/** + * :: Experimental :: + * A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in + * http://dx.doi.org/10.1145/1454008.1454027";>Li et al., PFP: Parallel FP-Growth for Query + * Recommendation. PFP distributes computation in such a way that each worker executes an + * independent group of mining tasks. The FP-Growth algorithm is described in + * http://dx.doi.org/10.1145/335191.335372";>Han et al., Mining frequent patterns without + * candidate generation. Note null values in the feature column are ignored during fit(). + * + * @see http://en.wikipedia.org/wiki/Association_rule_learning";> + * Association rule learning (Wikiped
[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16626 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73450/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16626 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16944: [SPARK-19611][SQL] Introduce configurable table schema i...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16944 Few small comments left. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16626: [SPARK-19261][SQL] Alter add columns for Hive serde and ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16626 **[Test build #73450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73450/testReport)** for PR 16626 at commit [`4bd04fc`](https://github.com/apache/spark/commit/4bd04fc556fde3805f58e39a0e6ad50c2a1e8aec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r103068578 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,192 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.util.concurrent.{Executors, TimeUnit} + +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.metrics.source.HiveCatalogMetrics +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.execution.datasources.FileStatusCache +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.hive.client.HiveClient +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.SQLConf.HiveCaseSensitiveInferenceMode +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.sql.types._ + +class HiveSchemaInferenceSuite + extends QueryTest with TestHiveSingleton with SQLTestUtils with BeforeAndAfterEach { + + import HiveSchemaInferenceSuite._ + + override def beforeEach(): Unit = { +super.beforeEach() +FileStatusCache.resetForTesting() + } + + override def afterEach(): Unit = { +super.afterEach() +FileStatusCache.resetForTesting() + } + + private val externalCatalog = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog] + private val lowercaseSchema = StructType(Seq( +StructField("fieldone", LongType), +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + private val caseSensitiveSchema = StructType(Seq( +StructField("fieldOne", LongType), +// Partition columns remain case-insensitive +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + + // Create a CatalogTable instance modeling an external Hive Metastore table backed by + // Parquet data files. + private def hiveExternalCatalogTable( + tableName: String, + location: String, + schema: StructType, + partitionColumns: Seq[String], + properties: Map[String, String] = Map.empty): CatalogTable = { +CatalogTable( + identifier = TableIdentifier(table = tableName, database = Option(DATABASE)), + tableType = CatalogTableType.EXTERNAL, + storage = CatalogStorageFormat( +locationUri = Option(location), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serialization.format" -> "1")), + schema = schema, + provider = Option("hive"), + partitionColumnNames = partitionColumns, + properties = properties) + } + + // Creates CatalogTablePartition instances for adding partitions of data to our test table. + private def hiveCatalogPartition(location: String, index: Int): CatalogTablePartition += CatalogTablePartition( + spec = Map("partcol1" -> index.toString, "partcol2" -> index.toString), + storage = CatalogStorageFormat( +locationUri = Option(s"${location}/partCol1=$index/partCol2=$index/"), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serialization.format" -> "1"))) + + // Creates a ca
[GitHub] spark issue #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-hash
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17056 **[Test build #73460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73460/testReport)** for PR 17056 at commit [`8595305`](https://github.com/apache/spark/commit/8595305c2dc3b276d6390724ca1f1469794540f5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14789: [SPARK-17209][YARN] Add the ability to manually update c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14789 **[Test build #73461 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73461/consoleFull)** for PR 14789 at commit [`b1cac5c`](https://github.com/apache/spark/commit/b1cac5c95f3c7f77a8253a88aff48d2a1934072c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r103068501 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.util.concurrent.{Executors, TimeUnit} + +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.metrics.source.HiveCatalogMetrics +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.execution.datasources.FileStatusCache +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.hive.client.HiveClient +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.SQLConf.HiveCaseSensitiveInferenceMode +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.sql.types._ + +class HiveSchemaInferenceSuite + extends QueryTest with TestHiveSingleton with SQLTestUtils with BeforeAndAfterEach { + + import HiveSchemaInferenceSuite._ + import HiveExternalCatalog.SPARK_SQL_PREFIX + + override def beforeEach(): Unit = { +super.beforeEach() +FileStatusCache.resetForTesting() + } + + override def afterEach(): Unit = { +super.afterEach() +FileStatusCache.resetForTesting() + } + + private val externalCatalog = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog] + private val lowercaseSchema = StructType(Seq( +StructField("fieldone", LongType), +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + private val caseSensitiveSchema = StructType(Seq( +StructField("fieldOne", LongType), +// Partition columns remain case-insensitive +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + + // Create a CatalogTable instance modeling an external Hive Metastore table backed by + // Parquet data files. + private def hiveExternalCatalogTable( + tableName: String, + location: String, + schema: StructType, + partitionColumns: Seq[String], + properties: Map[String, String] = Map.empty): CatalogTable = { +CatalogTable( + identifier = TableIdentifier(table = tableName, database = Option(DATABASE)), + tableType = CatalogTableType.EXTERNAL, + storage = CatalogStorageFormat( +locationUri = Option(location), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serialization.format" -> "1")), + schema = schema, + provider = Option("hive"), + partitionColumnNames = partitionColumns, + properties = properties) + } + + // Creates CatalogTablePartition instances for adding partitions of data to our test table. + private def hiveCatalogPartition(location: String, index: Int): CatalogTablePartition += CatalogTablePartition( + spec = Map("partcol1" -> index.toString, "partcol2" -> index.toString), + storage = CatalogStorageFormat( +locationUri = Option(s"${location}/partCol1=$index/partCol2=$index/"), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serializa
[GitHub] spark issue #15415: [SPARK-14503][ML] spark.ml API for FPGrowth
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15415 I don't think we need to support the default prediction (for empty/null inputs) now. I agree we could use an inputer or add something as an option later on. Will take a final look now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 @srowen Yes, the behaviour of the method remains the same. This patch helped me get a measurable improvement on GC overhead in Spark 2.0, so I though that it would be beneficial for others. Anyway thanks for the comments. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r103068442 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.util.concurrent.{Executors, TimeUnit} + +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.metrics.source.HiveCatalogMetrics +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.execution.datasources.FileStatusCache +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.hive.client.HiveClient +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.SQLConf.HiveCaseSensitiveInferenceMode +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.sql.types._ + +class HiveSchemaInferenceSuite + extends QueryTest with TestHiveSingleton with SQLTestUtils with BeforeAndAfterEach { + + import HiveSchemaInferenceSuite._ + import HiveExternalCatalog.SPARK_SQL_PREFIX + + override def beforeEach(): Unit = { +super.beforeEach() +FileStatusCache.resetForTesting() + } + + override def afterEach(): Unit = { +super.afterEach() +FileStatusCache.resetForTesting() + } + + private val externalCatalog = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog] + private val lowercaseSchema = StructType(Seq( +StructField("fieldone", LongType), +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + private val caseSensitiveSchema = StructType(Seq( +StructField("fieldOne", LongType), +// Partition columns remain case-insensitive +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + + // Create a CatalogTable instance modeling an external Hive Metastore table backed by + // Parquet data files. + private def hiveExternalCatalogTable( + tableName: String, + location: String, + schema: StructType, + partitionColumns: Seq[String], + properties: Map[String, String] = Map.empty): CatalogTable = { +CatalogTable( + identifier = TableIdentifier(table = tableName, database = Option(DATABASE)), + tableType = CatalogTableType.EXTERNAL, + storage = CatalogStorageFormat( +locationUri = Option(location), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serialization.format" -> "1")), + schema = schema, + provider = Option("hive"), + partitionColumnNames = partitionColumns, + properties = properties) + } + + // Creates CatalogTablePartition instances for adding partitions of data to our test table. + private def hiveCatalogPartition(location: String, index: Int): CatalogTablePartition += CatalogTablePartition( + spec = Map("partcol1" -> index.toString, "partcol2" -> index.toString), + storage = CatalogStorageFormat( +locationUri = Option(s"${location}/partCol1=$index/partCol2=$index/"), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serializa
[GitHub] spark issue #17056: [SPARK-17495] [SQL] Support Decimal type in Hive-hash
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/17056 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16824: [SPARK-18069][PYTHON] Make PySpark doctests for S...
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/16824 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16824: [SPARK-18069][PYTHON] Make PySpark doctests for SQL self...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16824 Thanks @holdenk, let me close this for now. @davies, please give me your opinion. If you think it is worth I will reopen. If not, I will resolve the JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r103068403 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.util.concurrent.{Executors, TimeUnit} + +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.metrics.source.HiveCatalogMetrics +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.execution.datasources.FileStatusCache +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.hive.client.HiveClient +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.SQLConf.HiveCaseSensitiveInferenceMode +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.sql.types._ + +class HiveSchemaInferenceSuite + extends QueryTest with TestHiveSingleton with SQLTestUtils with BeforeAndAfterEach { + + import HiveSchemaInferenceSuite._ + import HiveExternalCatalog.SPARK_SQL_PREFIX + + override def beforeEach(): Unit = { +super.beforeEach() +FileStatusCache.resetForTesting() + } + + override def afterEach(): Unit = { +super.afterEach() +FileStatusCache.resetForTesting() + } + + private val externalCatalog = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog] + private val lowercaseSchema = StructType(Seq( +StructField("fieldone", LongType), +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + private val caseSensitiveSchema = StructType(Seq( +StructField("fieldOne", LongType), +// Partition columns remain case-insensitive +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + + // Create a CatalogTable instance modeling an external Hive Metastore table backed by + // Parquet data files. + private def hiveExternalCatalogTable( + tableName: String, + location: String, + schema: StructType, + partitionColumns: Seq[String], + properties: Map[String, String] = Map.empty): CatalogTable = { +CatalogTable( + identifier = TableIdentifier(table = tableName, database = Option(DATABASE)), + tableType = CatalogTableType.EXTERNAL, + storage = CatalogStorageFormat( +locationUri = Option(location), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serialization.format" -> "1")), + schema = schema, + provider = Option("hive"), + partitionColumnNames = partitionColumns, + properties = properties) + } + + // Creates CatalogTablePartition instances for adding partitions of data to our test table. + private def hiveCatalogPartition(location: String, index: Int): CatalogTablePartition += CatalogTablePartition( + spec = Map("partcol1" -> index.toString, "partcol2" -> index.toString), + storage = CatalogStorageFormat( +locationUri = Option(s"${location}/partCol1=$index/partCol2=$index/"), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serializa
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16826 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16826 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73448/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r103068378 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.util.concurrent.{Executors, TimeUnit} + +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.metrics.source.HiveCatalogMetrics +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.execution.datasources.FileStatusCache +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.hive.client.HiveClient +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.SQLConf.HiveCaseSensitiveInferenceMode +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.sql.types._ + +class HiveSchemaInferenceSuite + extends QueryTest with TestHiveSingleton with SQLTestUtils with BeforeAndAfterEach { + + import HiveSchemaInferenceSuite._ + import HiveExternalCatalog.SPARK_SQL_PREFIX + + override def beforeEach(): Unit = { +super.beforeEach() +FileStatusCache.resetForTesting() + } + + override def afterEach(): Unit = { +super.afterEach() +FileStatusCache.resetForTesting() + } + + private val externalCatalog = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog] + private val lowercaseSchema = StructType(Seq( +StructField("fieldone", LongType), +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + private val caseSensitiveSchema = StructType(Seq( +StructField("fieldOne", LongType), +// Partition columns remain case-insensitive +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + + // Create a CatalogTable instance modeling an external Hive Metastore table backed by + // Parquet data files. + private def hiveExternalCatalogTable( + tableName: String, + location: String, + schema: StructType, + partitionColumns: Seq[String], + properties: Map[String, String] = Map.empty): CatalogTable = { +CatalogTable( + identifier = TableIdentifier(table = tableName, database = Option(DATABASE)), + tableType = CatalogTableType.EXTERNAL, + storage = CatalogStorageFormat( +locationUri = Option(location), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serialization.format" -> "1")), + schema = schema, + provider = Option("hive"), + partitionColumnNames = partitionColumns, + properties = properties) + } + + // Creates CatalogTablePartition instances for adding partitions of data to our test table. + private def hiveCatalogPartition(location: String, index: Int): CatalogTablePartition += CatalogTablePartition( + spec = Map("partcol1" -> index.toString, "partcol2" -> index.toString), + storage = CatalogStorageFormat( +locationUri = Option(s"${location}/partCol1=$index/partCol2=$index/"), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serializa
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16826 **[Test build #73448 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73448/testReport)** for PR 16826 at commit [`437b0bc`](https://github.com/apache/spark/commit/437b0bca7bc29809083f26b8a4848d53d999d097). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17060: [SQL] Duplicate test exception in SQLQueryTestSui...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17060#discussion_r103068302 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -121,7 +123,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { } private def createScalaTestCase(testCase: TestCase): Unit = { -if (blackList.contains(testCase.name.toLowerCase)) { +if (blackList.exists(testCase.name.toLowerCase.contains(_))) { --- End diff -- Nit, you can remove the `(_)` here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17060: [SQL] Duplicate test exception in SQLQueryTestSui...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17060#discussion_r103068318 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -241,7 +243,8 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { private def listTestCases(): Seq[TestCase] = { listFilesRecursively(new File(inputFilePath)).map { file => val resultFile = file.getAbsolutePath.replace(inputFilePath, goldenFilePath) + ".out" - TestCase(file.getName, file.getAbsolutePath, resultFile) + val absPath = file.getAbsolutePath + TestCase(absPath.stripPrefix(inputFilePath + File.separator), absPath, resultFile) --- End diff -- Why add the separator here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17060: [SQL] Duplicate test exception in SQLQueryTestSui...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17060#discussion_r103068296 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -98,7 +98,9 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { /** List of test cases to ignore, in lower cases. */ private val blackList = Set( -"blacklist.sql" // Do NOT remove this one. It is here to test the blacklist functionality. +"blacklist.sql", // Do NOT remove this one. It is here to test the blacklist functionality. +".ds_store" // A meta-file that may be created on Mac by Finder App. --- End diff -- .DS_Store right? I know you compare lower-case later but might as well write it as it appears --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17059: [SPARK-19733][ML]Removed unnecessary castings and refact...
Github user datumbox commented on the issue: https://github.com/apache/spark/pull/17059 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r103068277 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.util.concurrent.{Executors, TimeUnit} + +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.metrics.source.HiveCatalogMetrics +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.execution.datasources.FileStatusCache +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.hive.client.HiveClient +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.SQLConf.HiveCaseSensitiveInferenceMode +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.sql.types._ + +class HiveSchemaInferenceSuite + extends QueryTest with TestHiveSingleton with SQLTestUtils with BeforeAndAfterEach { + + import HiveSchemaInferenceSuite._ + import HiveExternalCatalog.SPARK_SQL_PREFIX + + override def beforeEach(): Unit = { +super.beforeEach() +FileStatusCache.resetForTesting() + } + + override def afterEach(): Unit = { +super.afterEach() +FileStatusCache.resetForTesting() + } + + private val externalCatalog = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog] + private val lowercaseSchema = StructType(Seq( +StructField("fieldone", LongType), +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + private val caseSensitiveSchema = StructType(Seq( +StructField("fieldOne", LongType), +// Partition columns remain case-insensitive +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + + // Create a CatalogTable instance modeling an external Hive Metastore table backed by + // Parquet data files. + private def hiveExternalCatalogTable( + tableName: String, + location: String, + schema: StructType, + partitionColumns: Seq[String], + properties: Map[String, String] = Map.empty): CatalogTable = { +CatalogTable( + identifier = TableIdentifier(table = tableName, database = Option(DATABASE)), + tableType = CatalogTableType.EXTERNAL, + storage = CatalogStorageFormat( +locationUri = Option(location), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serialization.format" -> "1")), + schema = schema, + provider = Option("hive"), + partitionColumnNames = partitionColumns, + properties = properties) + } + + // Creates CatalogTablePartition instances for adding partitions of data to our test table. + private def hiveCatalogPartition(location: String, index: Int): CatalogTablePartition += CatalogTablePartition( + spec = Map("partcol1" -> index.toString, "partcol2" -> index.toString), + storage = CatalogStorageFormat( +locationUri = Option(s"${location}/partCol1=$index/partCol2=$index/"), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serializa
[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17062 **[Test build #73459 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73459/testReport)** for PR 17062 at commit [`332475c`](https://github.com/apache/spark/commit/332475c1641f61080aa41dda9f1ceec237351d75). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16944: [SPARK-19611][SQL] Introduce configurable table s...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16944#discussion_r103068259 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.hive + +import java.io.File +import java.util.concurrent.{Executors, TimeUnit} + +import org.scalatest.BeforeAndAfterEach + +import org.apache.spark.metrics.source.HiveCatalogMetrics +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.catalog._ +import org.apache.spark.sql.execution.datasources.FileStatusCache +import org.apache.spark.sql.QueryTest +import org.apache.spark.sql.hive.client.HiveClient +import org.apache.spark.sql.hive.test.TestHiveSingleton +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.internal.SQLConf.HiveCaseSensitiveInferenceMode +import org.apache.spark.sql.test.SQLTestUtils +import org.apache.spark.sql.types._ + +class HiveSchemaInferenceSuite + extends QueryTest with TestHiveSingleton with SQLTestUtils with BeforeAndAfterEach { + + import HiveSchemaInferenceSuite._ + import HiveExternalCatalog.SPARK_SQL_PREFIX + + override def beforeEach(): Unit = { +super.beforeEach() +FileStatusCache.resetForTesting() + } + + override def afterEach(): Unit = { +super.afterEach() +FileStatusCache.resetForTesting() + } + + private val externalCatalog = spark.sharedState.externalCatalog.asInstanceOf[HiveExternalCatalog] + private val lowercaseSchema = StructType(Seq( +StructField("fieldone", LongType), +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + private val caseSensitiveSchema = StructType(Seq( +StructField("fieldOne", LongType), +// Partition columns remain case-insensitive +StructField("partcol1", IntegerType), +StructField("partcol2", IntegerType))) + + // Create a CatalogTable instance modeling an external Hive Metastore table backed by + // Parquet data files. + private def hiveExternalCatalogTable( + tableName: String, + location: String, + schema: StructType, + partitionColumns: Seq[String], + properties: Map[String, String] = Map.empty): CatalogTable = { +CatalogTable( + identifier = TableIdentifier(table = tableName, database = Option(DATABASE)), + tableType = CatalogTableType.EXTERNAL, + storage = CatalogStorageFormat( +locationUri = Option(location), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serialization.format" -> "1")), + schema = schema, + provider = Option("hive"), + partitionColumnNames = partitionColumns, + properties = properties) + } + + // Creates CatalogTablePartition instances for adding partitions of data to our test table. + private def hiveCatalogPartition(location: String, index: Int): CatalogTablePartition += CatalogTablePartition( + spec = Map("partcol1" -> index.toString, "partcol2" -> index.toString), + storage = CatalogStorageFormat( +locationUri = Option(s"${location}/partCol1=$index/partCol2=$index/"), +inputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat"), +outputFormat = Option("org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat"), +serde = Option("org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"), +compressed = false, +properties = Map("serializa
[GitHub] spark issue #17060: [SQL] Duplicate test exception in SQLQueryTestSuite due ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17060 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16826 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16826 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73447/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17060: [SQL] Duplicate test exception in SQLQueryTestSuite due ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17060 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73449/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16826: [SPARK-19540][SQL] Add ability to clone SparkSession whe...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16826 **[Test build #73447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73447/testReport)** for PR 16826 at commit [`fd11ee2`](https://github.com/apache/spark/commit/fd11ee2289ae26b3061659dc26b1f09ded32d039). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17060: [SQL] Duplicate test exception in SQLQueryTestSuite due ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17060 **[Test build #73449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73449/testReport)** for PR 17060 at commit [`5f2a6a8`](https://github.com/apache/spark/commit/5f2a6a82244a623fed142555f55dd34060426e5a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/17062 [SPARK-17495] [SQL] Support date, timestamp and interval types in Hive hash ## What changes were proposed in this pull request? - Timestamp hashing is done as per [TimestampWritable.hashCode()](https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java#L406) in Hive - Interval hashing is done as per [HiveIntervalDayTime.hashCode()](https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/storage-api/src/java/org/apache/hadoop/hive/common/type/HiveIntervalDayTime.java#L178). Note that there are inherent differences in how Hive and Spark store intervals under the hood which limits the ability to be in completely sync with hive's hashing function. I have explained this in the method doc. - Date type was already supported. This PR adds test for that. ## How was this patch tested? Added unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/tejasapatil/spark SPARK-17495_time_related_types Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17062.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17062 commit cc359fc45547b7ba3fd4c1d11d3dcfbaf71ea66a Author: Tejas Patil Date: 2017-02-25T00:18:03Z [SPARK-17495] [SQL] Support date, timestamp datatypes in Hive hash commit 332475c1641f61080aa41dda9f1ceec237351d75 Author: Tejas Patil Date: 2017-02-25T02:23:41Z minor refac --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17062: [SPARK-17495] [SQL] Support date, timestamp and interval...
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/17062 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org