[GitHub] spark pull request #18324: [SPARK-21045][PYSPARK]Fixed executor blocked beca...
Github user dataknocker commented on a diff in the pull request: https://github.com/apache/spark/pull/18324#discussion_r197355680 --- Diff: python/pyspark/worker.py --- @@ -177,8 +180,11 @@ def process(): process() except Exception: try: +exc_info = traceback.format_exc() +if isinstance(exc_info, unicode): +exc_info = exc_info.encode('utf-8') --- End diff -- cc @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21590#discussion_r197355161 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -65,13 +65,38 @@ class JDBCOptions( // Required parameters // require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is required.") - require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option '$JDBC_TABLE_NAME' is required.") + // a JDBC URL val url = parameters(JDBC_URL) - // name of table - val table = parameters(JDBC_TABLE_NAME) + val tableName = parameters.get(JDBC_TABLE_NAME) + val query = parameters.get(JDBC_QUERY_STRING) --- End diff -- @maropu Thank you for taking the time to think about this throughly. A couple of questions/comments. 1) Looks like for read path we give precedence to dbtable over query. I feel its good to explicitly disallow this with a clear message in case of an ambiguity. 2) Usage of lazy here (especially to trigger errors) makes me a little nervous. Like if we want to introduce a debug statement to print the variables in side the QueryOptions class, things will not work any more, right ? Thats the reason, i had opted to check for the "invalid query option in write path" in the write function itself (i.e when i am sure of the calling context). Perhaps that how its used every where in which case it may be okay to follow the same approach here. I am okay with this. Lets get some opinion from @gatorsmile. Once i have the final set of comments, i will make the changes. Thanks again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r197352302 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -256,6 +283,22 @@ object EmptyBlock extends Block with Serializable { override def + (other: Block): Block = other } +/** + * A block inlines all types of input arguments into a string without + * tracking any reference of `JavaCode` instances. + */ +case class InlineBlock(block: String) extends Block { + override val code: String = block + override val exprValues: Set[ExprValue] = Set.empty + + override def + (other: Block): Block = other match { +case c: CodeBlock => Blocks(Seq(this, c)) +case i: InlineBlock => InlineBlock(block + i.block) +case b: Blocks => Blocks(Seq(this) ++ b.blocks) --- End diff -- Ok. I will do that PR first. Will ping you on the PR when it's ready. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r197352254 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1004,26 +1012,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" +code"($d.toBigDecimal().bigDecimal().multiply($block)).longValue()" + } + private[this] def longToTimeStampCode(l: ExprValue): Block = code"$l * 100L" + private[this] def timestampToIntegerCode(ts: ExprValue): Block = +code"java.lang.Math.floor((double) $ts / 100L)" + private[this] def timestampToDoubleCode(ts: ExprValue): Block = +code"$ts / 100.0" private[this] def castToBooleanCode(from: DataType): CastFunction = from match { case StringType => - val stringUtils = StringUtils.getClass.getName.stripSuffix("$") + val stringUtils = inline"${StringUtils.getClass.getName.stripSuffix("$")}" --- End diff -- inline is just used as a wrapper for string, as we disallow silent string interpolation. We expand the content of an inline into string into a code block. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #92201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92201/testReport)** for PR 21061 at commit [`37cee1f`](https://github.com/apache/spark/commit/37cee1f5b81dcce12026f1db0320a1090b87). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/401/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve Analyze Table command
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21608 cc: @wzhfy @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve Analyze Table command
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21608 ok, can you put the result in the description? Also, can you make the title more precise? e.g., Parallelize size computation in ANALYZE command --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21590#discussion_r197349327 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -65,13 +65,38 @@ class JDBCOptions( // Required parameters // require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is required.") - require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option '$JDBC_TABLE_NAME' is required.") + // a JDBC URL val url = parameters(JDBC_URL) - // name of table - val table = parameters(JDBC_TABLE_NAME) + val tableName = parameters.get(JDBC_TABLE_NAME) + val query = parameters.get(JDBC_QUERY_STRING) --- End diff -- I think, since the `tableName` and `query` variables don't need to be exposed to other classes, can we remove them? Btw, I feel sharing the `tableName` variable int both write/read paths makes code some complicated, so how about splitting the variable into two part: `tableOrQuery` for reading and `outputName` for writing? e.g., https://github.com/apache/spark/commit/d62372ab0e855c359122609f1805ce83661d510e --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21611 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21611 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/400/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21611: [SPARK-24569][SQL] Aggregator with output type Op...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21611#discussion_r197348976 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala --- @@ -148,6 +148,79 @@ object VeryComplexResultAgg extends Aggregator[Row, String, ComplexAggData] { } +case class OptionBooleanData(name: String, isGood: Option[Boolean]) +case class OptionBooleanIntData(name: String, isGood: Option[(Boolean, Int)]) + +case class OptionBooleanAggregator(colName: String) +extends Aggregator[Row, Option[Boolean], Option[Boolean]] { + + override def zero: Option[Boolean] = None + + override def reduce(buffer: Option[Boolean], row: Row): Option[Boolean] = { +val index = row.fieldIndex(colName) +val value = if (row.isNullAt(index)) { + Option.empty[Boolean] +} else { + Some(row.getBoolean(index)) +} +merge(buffer, value) + } + + override def merge(b1: Option[Boolean], b2: Option[Boolean]): Option[Boolean] = { +if ((b1.isDefined && b1.get) || (b2.isDefined && b2.get)) { + Some(true) +} else if (b1.isDefined) { + b1 +} else { + b2 +} + } + + override def finish(reduction: Option[Boolean]): Option[Boolean] = reduction + + override def bufferEncoder: Encoder[Option[Boolean]] = OptionalBoolEncoder + override def outputEncoder: Encoder[Option[Boolean]] = OptionalBoolEncoder + + def OptionalBoolEncoder: Encoder[Option[Boolean]] = ExpressionEncoder() +} + +case class OptionBooleanIntAggregator(colName: String) +extends Aggregator[Row, Option[(Boolean, Int)], Option[(Boolean, Int)]] { + + override def zero: Option[(Boolean, Int)] = None + + override def reduce(buffer: Option[(Boolean, Int)], row: Row): Option[(Boolean, Int)] = { +val index = row.fieldIndex(colName) +val value = if (row.isNullAt(index)) { + Option.empty[(Boolean, Int)] +} else { + val nestedRow = row.getStruct(index) + Some((nestedRow.getBoolean(0), nestedRow.getInt(1))) +} +merge(buffer, value) + } + + override def merge( + b1: Option[(Boolean, Int)], + b2: Option[(Boolean, Int)]): Option[(Boolean, Int)] = { +if ((b1.isDefined && b1.get._1) || (b2.isDefined && b2.get._1)) { + val newInt = b1.map(_._2).getOrElse(0) + b2.map(_._2).getOrElse(0) + Some((true, newInt)) +} else if (b1.isDefined) { + b1 +} else { + b2 +} + } + + override def finish(reduction: Option[(Boolean, Int)]): Option[(Boolean, Int)] = reduction + + override def bufferEncoder: Encoder[Option[(Boolean, Int)]] = OptionalBoolIntEncoder + override def outputEncoder: Encoder[Option[(Boolean, Int)]] = OptionalBoolIntEncoder + + def OptionalBoolIntEncoder: Encoder[Option[(Boolean, Int)]] = ExpressionEncoder(topLevel = false) --- End diff -- We can create Dataset like: ```scala scala> Seq((1, Some(1, 2)), (2, Some(3, 4))).toDS.printSchema root |-- _1: integer (nullable = false) |-- _2: struct (nullable = true) ||-- _1: integer (nullable = false) ||-- _2: integer (nullable = false) ``` But now we can't use it as buffer/output encoding here. But the encoder here is not for top-level. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve Analyze Table command
Github user Achuth17 commented on the issue: https://github.com/apache/spark/pull/21608 Yes, In the case where the data is stored in S3 I noticed a significant difference. Some rough numbers - When done serially for a table in S3 with 1000 partitions, the calculateTotalSize method took about 90 seconds vs 30-40 seconds when done in parallel. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21611: [SPARK-24569][SQL] Aggregator with output type Option sh...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21611 **[Test build #92200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92200/testReport)** for PR 21611 at commit [`dd4ea61`](https://github.com/apache/spark/commit/dd4ea61ac1c2beaf8ee897b1533e2088c6f8364a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21611: [SPARK-24569][SQL] Aggregator with output type Op...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/21611 [SPARK-24569][SQL] Aggregator with output type Option should produce consistent schema ## What changes were proposed in this pull request? SQL `Aggregator` with output type `Option[Boolean]` creates column of type `StructType`. It's not in consistency with a Dataset of similar java class. This changes the way `definedByConstructorParams` checks given type. For `Option[_]`, it goes to check its type argument. ## How was this patch tested? Added test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-24569 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21611.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21611 commit dd4ea61ac1c2beaf8ee897b1533e2088c6f8364a Author: Liang-Chi Hsieh Date: 2018-06-22T03:44:33Z Aggregator with output type Option[Boolean] should produce consistent schema. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21610: Updates to LICENSE and NOTICE
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21610#discussion_r197348104 --- Diff: NOTICE --- @@ -1,667 +1,11 @@ Apache Spark -Copyright 2014 and onwards The Apache Software Foundation. +Copyright 2014 - 20018 The Apache Software Foundation. This product includes software developed at The Apache Software Foundation (http://www.apache.org/). +Android Code +Copyright 2005-2008 The Android Open Source Project - -Common Development and Distribution License 1.0 - - -The following components are provided under the Common Development and Distribution License 1.0. See project link for details. - - (CDDL 1.0) Glassfish Jasper (org.mortbay.jetty:jsp-2.1:6.1.14 - http://jetty.mortbay.org/project/modules/jsp-2.1) - (CDDL 1.0) JAX-RS (https://jax-rs-spec.java.net/) - (CDDL 1.0) Servlet Specification 2.5 API (org.mortbay.jetty:servlet-api-2.5:6.1.14 - http://jetty.mortbay.org/project/modules/servlet-api-2.5) - (CDDL 1.0) (GPL2 w/ CPE) javax.annotation API (https://glassfish.java.net/nonav/public/CDDL+GPL.html) - (COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0) (GNU General Public Library) Streaming API for XML (javax.xml.stream:stax-api:1.0-2 - no url defined) - (Common Development and Distribution License (CDDL) v1.0) JavaBeans Activation Framework (JAF) (javax.activation:activation:1.1 - http://java.sun.com/products/javabeans/jaf/index.jsp) - - -Common Development and Distribution License 1.1 - - -The following components are provided under the Common Development and Distribution License 1.1. See project link for details. - - (CDDL 1.1) (GPL2 w/ CPE) org.glassfish.hk2 (https://hk2.java.net) - (CDDL 1.1) (GPL2 w/ CPE) JAXB API bundle for GlassFish V3 (javax.xml.bind:jaxb-api:2.2.2 - https://jaxb.dev.java.net/) - (CDDL 1.1) (GPL2 w/ CPE) JAXB RI (com.sun.xml.bind:jaxb-impl:2.2.3-1 - http://jaxb.java.net/) - (CDDL 1.1) (GPL2 w/ CPE) Jersey 2 (https://jersey.java.net) - - -Common Public License 1.0 - - -The following components are provided under the Common Public 1.0 License. See project link for details. - - (Common Public License Version 1.0) JUnit (junit:junit-dep:4.10 - http://junit.org) - (Common Public License Version 1.0) JUnit (junit:junit:3.8.1 - http://junit.org) - (Common Public License Version 1.0) JUnit (junit:junit:4.8.2 - http://junit.org) - - -Eclipse Public License 1.0 - - -The following components are provided under the Eclipse Public License 1.0. See project link for details. - - (Eclipse Public License v1.0) Eclipse JDT Core (org.eclipse.jdt:core:3.1.1 - http://www.eclipse.org/jdt/) - - -Mozilla Public License 1.0 - - -The following components are provided under the Mozilla Public License 1.0. See project link for details. - - (GPL) (LGPL) (MPL) JTransforms (com.github.rwl:jtransforms:2.4.0 - http://sourceforge.net/projects/jtransforms/) - (Mozilla Public License Version 1.1) jamon-runtime (org.jamon:jamon-runtime:2.3.1 - http://www.jamon.org/jamon-runtime/) - - - - -NOTICE files - - -The following NOTICEs are pertain to software distributed with this project. - - -// -- -// NOTICE file corresponding to the section 4d of The Apache License, -// Version 2.0, in this case for -// -- - -Apache Avro -Copyright 2009-2013 The Apache Software Foundation - -This product includes software developed at -The Apache Software Foundation (http://www.apache.org/). - -Apache Commons Codec -Copyright 2002-2009 The Apache Software Foundation - -This product includes software developed by -The Apache Software Foundation (http://ww
[GitHub] spark issue #21610: Updates to LICENSE and NOTICE
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21610 **[Test build #92199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92199/testReport)** for PR 21610 at commit [`b9d12d7`](https://github.com/apache/spark/commit/b9d12d700b9cb83402e42f264f21bca090e0d1e3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21610: Updates to LICENSE and NOTICE
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21610#discussion_r197347713 --- Diff: NOTICE --- @@ -1,667 +1,11 @@ Apache Spark -Copyright 2014 and onwards The Apache Software Foundation. +Copyright 2014 - 20018 The Apache Software Foundation. --- End diff -- 2018? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21610: Updates to LICENSE and NOTICE
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21610 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/21590#discussion_r197347130 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala --- @@ -65,13 +65,38 @@ class JDBCOptions( // Required parameters // require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is required.") - require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option '$JDBC_TABLE_NAME' is required.") + // a JDBC URL val url = parameters(JDBC_URL) - // name of table - val table = parameters(JDBC_TABLE_NAME) + val tableName = parameters.get(JDBC_TABLE_NAME) + val query = parameters.get(JDBC_QUERY_STRING) + // Following two conditions make sure that : + // 1. One of the option (dbtable or query) must be specified. + // 2. Both of them can not be specified at the same time as they are conflicting in nature. + require( +tableName.isDefined || query.isDefined, +s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required." + ) + + require( +!(tableName.isDefined && query.isDefined), +s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be specified." + ) + + // table name or a table expression. + val tableOrQuery = tableName.map(_.trim).getOrElse { +// We have ensured in the code above that either dbtable or query is specified. +query.get match { + case subQuery if subQuery.nonEmpty => s"(${subQuery}) spark_gen_${curId.getAndIncrement()}" + case subQuery => subQuery +} + } + + require(tableOrQuery.nonEmpty, +s"Empty string is not allowed in either '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' options" + ) + - // --- End diff -- nit: revert this line --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92197/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #92197 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92197/testReport)** for PR 21061 at commit [`195f3bd`](https://github.com/apache/spark/commit/195f3bd6b47da19b27cd0c8140bcd9aa6a063843). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve Analyze Table command
Github user maropu commented on the issue: https://github.com/apache/spark/pull/21608 This pr improves actual performance values? (My question is that the calculation is a bottleneck?) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21606 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92192/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21606 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21606 **[Test build #92192 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92192/testReport)** for PR 21606 at commit [`a16d9f9`](https://github.com/apache/spark/commit/a16d9f907b3ce0078da72b7e7bcc56e187cbc8f9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21482 I have no more comments except the one above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r197340906 --- Diff: python/pyspark/sql/functions.py --- @@ -468,6 +468,18 @@ def input_file_name(): return Column(sc._jvm.functions.input_file_name()) +@since(2.4) +def isinf(col): --- End diff -- Yes, please because I see it's exposed in Column.scala. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92194/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21594 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21594 **[Test build #92194 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92194/testReport)** for PR 21594 at commit [`2f00f2f`](https://github.com/apache/spark/commit/2f00f2fe0e1cf9a0d44285aab306ed55bd176d9c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21603 **[Test build #92198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92198/testReport)** for PR 21603 at commit [`b9b3160`](https://github.com/apache/spark/commit/b9b3160061ef1e17ae32599ed9fbcfd44b0565b4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21603 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21603: [SPARK-17091][SQL] Add rule to convert IN predicate to e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21603 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/399/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r197338867 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +270,11 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { case sources.Not(pred) => createFilter(schema, pred).map(FilterApi.not) + case sources.In(name, values) if canMakeFilterOn(name) && values.length < 20 => --- End diff -- It seems that the push-down performance is better when threshold is less than `300`: https://user-images.githubusercontent.com/5399861/41757743-7e411532-7616-11e8-8844-45132c50c535.png";> The code: ```scala withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true") { import testImplicits._ withTempPath { path => val total = 1000 (0 to total).toDF().coalesce(1) .write.option("parquet.block.size", 512) .parquet(path.getAbsolutePath) val df = spark.read.parquet(path.getAbsolutePath) // scalastyle:off println var lastSize = -1 var i = 16000 while (i < total) { val filter = Range(0, total).filter(_ % i == 0) i += 100 if (lastSize != filter.size) { if (lastSize == -1) println(s"start size: ${filter.size}") lastSize = filter.size sql("set spark.sql.parquet.pushdown.inFilterThreshold=100") val begin1 = System.currentTimeMillis() df.where(s"id in(${filter.mkString(",")})").count() val end1 = System.currentTimeMillis() val time1 = end1 - begin1 sql("set spark.sql.parquet.pushdown.inFilterThreshold=10") val begin2 = System.currentTimeMillis() df.where(s"id in(${filter.mkString(",")})").count() val end2 = System.currentTimeMillis() val time2 = end2 - begin2 if (time1 <= time2) println(s"Max threshold: $lastSize") } } } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21610: Updates to LICENSE and NOTICE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21610 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21610: Updates to LICENSE and NOTICE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21610 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21610: Updates to LICENSE and NOTICE
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21610 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21610: Updates to LICENSE and NOTICE
GitHub user justinmclean opened a pull request: https://github.com/apache/spark/pull/21610 Updates to LICENSE and NOTICE ## What changes were proposed in this pull request? LICENSE and NOTICE changes as per ASF policy ## How was this patch tested? N/A Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/justinmclean/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21610.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21610 commit b9d12d700b9cb83402e42f264f21bca090e0d1e3 Author: Justin Mclean Date: 2018-06-22T04:20:59Z Updates to LICENSE and NOTICE --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21609 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92196/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21609 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21609 **[Test build #92196 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92196/testReport)** for PR 21609 at commit [`3040763`](https://github.com/apache/spark/commit/3040763e51c8d32309f2dc38ce8b9fcc740ceb3d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21603: [SPARK-17091][SQL] Add rule to convert IN predica...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21603#discussion_r197336527 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -270,6 +270,11 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { case sources.Not(pred) => createFilter(schema, pred).map(FilterApi.not) + case sources.In(name, values) if canMakeFilterOn(name) && values.length < 20 => --- End diff -- +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21607 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92195/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21607 **[Test build #92195 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92195/testReport)** for PR 21607 at commit [`9d7e6ea`](https://github.com/apache/spark/commit/9d7e6eafff3daa519f7fda0b1f219f74d499874d). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21607 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21607 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21607 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92193/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21607 **[Test build #92193 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92193/testReport)** for PR 21607 at commit [`0520d60`](https://github.com/apache/spark/commit/0520d60b44987369fa62d7237427cb0cf022ed41). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21606 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92190/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21606 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21606 **[Test build #92190 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92190/testReport)** for PR 21606 at commit [`227d513`](https://github.com/apache/spark/commit/227d513ade176fd56f7e6d75a16deb6c654982db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21606 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92189/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21606 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21606 **[Test build #92189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92189/testReport)** for PR 21606 at commit [`5efaae7`](https://github.com/apache/spark/commit/5efaae74bf340fed4223b5209bed63475cc35516). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21320 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92191/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21320 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21320 **[Test build #92191 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92191/testReport)** for PR 21320 at commit [`a255bcb`](https://github.com/apache/spark/commit/a255bcb4c480d3c97f7ff0590bca0c20de034a31). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext
Github user zzcclp commented on the issue: https://github.com/apache/spark/pull/21609 Can this pr be merged ASAP? Currently there is an error on branch-2.2 . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/398/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21061 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 Yup, will fix the hive fork thing and be back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21570: [SPARK-24564][TEST] Add test suite for RecordBina...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21570#discussion_r197328626 --- Diff: sql/core/src/test/java/test/org/apache/spark/sql/execution/sort/RecordBinaryComparatorSuite.java --- @@ -0,0 +1,255 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package test.org.apache.spark.sql.execution.sort; + +import org.apache.spark.SparkConf; +import org.apache.spark.memory.TaskMemoryManager; --- End diff -- cc @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21061 **[Test build #92197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92197/testReport)** for PR 21061 at commit [`195f3bd`](https://github.com/apache/spark/commit/195f3bd6b47da19b27cd0c8140bcd9aa6a063843). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/21588 @HyukjinKwon , I'm in favor of @vanzin 's comment, we should fix things first and then back to this one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21548: [SPARK-24518][CORE] Using Hadoop credential provi...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/21548#discussion_r197327620 --- Diff: core/src/main/scala/org/apache/spark/SSLOptions.scala --- @@ -179,9 +185,11 @@ private[spark] object SSLOptions extends Logging { .orElse(defaults.flatMap(_.keyStore)) val keyStorePassword = conf.getWithSubstitution(s"$ns.keyStorePassword") + .orElse(Option(hadoopConf.getPassword(s"$ns.keyStorePassword")).map(new String(_))) --- End diff -- Hi @vanzin , I checked jdk8 doc again, I don't find a String constructor which takes both char array and charset as parameters. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21606 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92187/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21606 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21606 **[Test build #92187 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92187/testReport)** for PR 21606 at commit [`c884f4f`](https://github.com/apache/spark/commit/c884f4f27199b3c91f56ba0042b42d09bc243883). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21598: [SPARK-24605][SQL] size(null) returns null instea...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21598#discussion_r197326162 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1314,6 +1314,13 @@ object SQLConf { "Other column values can be ignored during parsing even if they are malformed.") .booleanConf .createWithDefault(true) + + val LEGACY_SIZE_OF_NULL = buildConf("spark.sql.legacy.sizeOfNull") --- End diff -- That's basically the same except that the postfix includes a specific version, which was just a rough idea. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21577: [SPARK-24589][core] Correctly identify tasks in output c...
Github user zzcclp commented on the issue: https://github.com/apache/spark/pull/21577 @vanzin @tgravescs , after merge this pr into branch-2.2, there is an error "stageAttemptNumber is not a member of org.apache.spark.TaskContext" in SparkHadoopMapRedUtil, I think it needs to merge PR-20082 first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21598 My assumption was that the PR and JIRA claim that it's the right behaviour, as I said multiple times. If there's no such thing, there should be of course no need to argue about the default value, as I said above. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21542: [SPARK-24529][Build][test-maven] Add spotbugs into maven...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21542 Even when we stop forking SpotBugs, the same error occurred. @HyukjinKwon is there any idea? I would appreciate your thoughts. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21607 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92188/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21607 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21607 **[Test build #92188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92188/testReport)** for PR 21607 at commit [`d1f3219`](https://github.com/apache/spark/commit/d1f3219a58f4dc4f1e65a793c6d01572b25a609e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 Will try to fix it then. We can just enable it back. If we want to support those Hive versions in Hadoop 3, we could simply enable them back with some fixes at that time. Adding the support sounds an incremental improvement. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21061: [SPARK-23914][SQL] Add array_union function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21061#discussion_r197319579 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2355,3 +2355,347 @@ case class ArrayRemove(left: Expression, right: Expression) override def prettyName: String = "array_remove" } + +object ArraySetLike { + def useGenericArrayData(elementSize: Int, length: Int): Boolean = { +// Use the same calculation in UnsafeArrayData.fromPrimitiveArray() +val headerInBytes = UnsafeArrayData.calculateHeaderPortionInBytes(length) +val valueRegionInBytes = elementSize.toLong * length +val totalSizeInLongs = (headerInBytes + valueRegionInBytes + 7) / 8 +totalSizeInLongs > Integer.MAX_VALUE / 8 + } + + def throwUnionLengthOverflowException(length: Int): Unit = { +throw new RuntimeException(s"Unsuccessful try to union arrays with $length " + + s"elements due to exceeding the array size limit " + + s"${ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH}.") + } +} + + +abstract class ArraySetLike extends BinaryArrayExpressionWithImplicitCast { + override def dataType: DataType = left.dataType + + override def checkInputDataTypes(): TypeCheckResult = { +val typeCheckResult = super.checkInputDataTypes() +if (typeCheckResult.isSuccess) { + TypeUtils.checkForOrderingExpr(dataType.asInstanceOf[ArrayType].elementType, +s"function $prettyName") +} else { + typeCheckResult +} + } + + @transient protected lazy val ordering: Ordering[Any] = +TypeUtils.getInterpretedOrdering(elementType) + + @transient protected lazy val elementTypeSupportEquals = elementType match { +case BinaryType => false +case _: AtomicType => true +case _ => false + } +} + +/** + * Returns an array of the elements in the union of x and y, without duplicates + */ +@ExpressionDescription( + usage = """ +_FUNC_(array1, array2) - Returns an array of the elements in the union of array1 and array2, + without duplicates. + """, + examples = """ +Examples: + > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5)); + array(1, 2, 3, 5) + """, + since = "2.4.0") +case class ArrayUnion(left: Expression, right: Expression) extends ArraySetLike { + var hsInt: OpenHashSet[Int] = _ + var hsLong: OpenHashSet[Long] = _ + + def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getInt(idx) +if (!hsInt.contains(elem)) { + resultArray.setInt(pos, elem) + hsInt.add(elem) + true +} else { + false +} + } + + def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: Int): Boolean = { +val elem = array.getLong(idx) +if (!hsLong.contains(elem)) { + resultArray.setLong(pos, elem) + hsLong.add(elem) + true +} else { + false +} + } + + def evalPrimitiveType( + array1: ArrayData, + array2: ArrayData, + size: Int, + resultArray: ArrayData, + isLongType: Boolean): ArrayData = { +// store elements into resultArray +var foundNullElement = false +var pos = 0 +Seq(array1, array2).foreach(array => { + var i = 0 + while (i < array.numElements()) { +if (array.isNullAt(i)) { + if (!foundNullElement) { +resultArray.setNullAt(pos) +pos += 1 +foundNullElement = true + } +} else { + val assigned = if (!isLongType) { +assignInt(array, i, resultArray, pos) + } else { +assignLong(array, i, resultArray, pos) + } + if (assigned) { +pos += 1 + } +} +i += 1 + } +}) +resultArray + } + + override def nullSafeEval(input1: Any, input2: Any): Any = { +val array1 = input1.asInstanceOf[ArrayData] +val array2 = input2.asInstanceOf[ArrayData] + +if (elementTypeSupportEquals) { + elementType match { +case IntegerType => + // avoid boxing of primitive int array elements + // calculate result array size + val hsSize = new OpenHashSet[Int] + Seq(array1, array2).foreach(array => { +var i = 0 +while (i < array.numElements()) { + if (hsSize.size > ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH) { +ArraySetLi
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21607 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21609 +1 pending tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21607 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/397/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21609 **[Test build #92196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92196/testReport)** for PR 21609 at commit [`3040763`](https://github.com/apache/spark/commit/3040763e51c8d32309f2dc38ce8b9fcc740ceb3d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21609 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/396/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21607 **[Test build #92195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92195/testReport)** for PR 21607 at commit [`9d7e6ea`](https://github.com/apache/spark/commit/9d7e6eafff3daa519f7fda0b1f219f74d499874d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21609 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/21588 > The tests were passed in this PR builder Against your private build of the Hive stuff. Again, fix that and this will become a lot easier to discuss. I'm also against disabling these tests without a proper discussion of what that means, and I've said multiple times. If we want to support those Hive versions in Hadoop 3, then this is the wrong change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21609: [SPARK-22897][CORE] Expose stageAttemptId in TaskContext
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21609 backport to branch-2.2, only changes was to mimaExcludes and test file that had one more call to TaskContext. @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21609: [SPARK-22897][CORE] Expose stageAttemptId in Task...
GitHub user tgravescs opened a pull request: https://github.com/apache/spark/pull/21609 [SPARK-22897][CORE] Expose stageAttemptId in TaskContext stageAttemptId added in TaskContext and corresponding construction modification Added a new test in TaskContextSuite, two cases are tested: 1. Normal case without failure 2. Exception case with resubmitted stages Link to [SPARK-22897](https://issues.apache.org/jira/browse/SPARK-22897) Author: Xianjin YE Closes #20082 from advancedxy/SPARK-22897. Conflicts: project/MimaExcludes.scala ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tgravescs/spark SPARK-22897 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21609.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21609 commit 4bc8d2805949b6b9d4d06ff4ad0493d9b33c7063 Author: Xianjin YE Date: 2018-01-02T15:30:38Z [SPARK-22897][CORE] Expose stageAttemptId in TaskContext stageAttemptId added in TaskContext and corresponding construction modification Added a new test in TaskContextSuite, two cases are tested: 1. Normal case without failure 2. Exception case with resubmitted stages Link to [SPARK-22897](https://issues.apache.org/jira/browse/SPARK-22897) Author: Xianjin YE Closes #20082 from advancedxy/SPARK-22897. Conflicts: project/MimaExcludes.scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21594 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21594 **[Test build #92194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92194/testReport)** for PR 21594 at commit [`2f00f2f`](https://github.com/apache/spark/commit/2f00f2fe0e1cf9a0d44285aab306ed55bd176d9c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21594 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/395/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21606 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/394/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21606 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21607 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/393/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21607 **[Test build #92193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92193/testReport)** for PR 21607 at commit [`0520d60`](https://github.com/apache/spark/commit/0520d60b44987369fa62d7237427cb0cf022ed41). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21607: branch-2.1: backport SPARK-24589 and SPARK-22897
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21607 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21588 The tests were passed in this PR builder. The only hack I used is that I landed a one liner fix to an artifact to use it in this PR, which is already in Hive, and is proposed in Hive's fork which is blocked by non-techinical reason. I am working on this to get through. Okay, if you think it should be blocked, let me get through this first. I am not dropping it. Isn't it what we already cover? I believe this is the most minimised and conservative fix to make Hadoop 3 working within Spark since we already added it. FWIW, we didn't document Hadoop 3 profile yet, so my impression is that it's in progress yet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21606: [SPARK-24552][core][SQL] Use task ID instead of attempt ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21606 **[Test build #92192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92192/testReport)** for PR 21606 at commit [`a16d9f9`](https://github.com/apache/spark/commit/a16d9f907b3ce0078da72b7e7bcc56e187cbc8f9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org