[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/16011 As far as I recall, the idea is that the `Bucketizer` can be used standalone, and because the `QuantileDiscretizer` itself produced the same thing as a bucketizer, it was used as the model rather than having a dedicated `QuantileDiscretizerModel`. `Bucketizer` is already a separate transformer (it is not required to be produced by a `QuantileDiscretizer`), since it's a `Model` and the constructor is public (by design). So it by itself can be used in a pipeline, and the `splits` param could be selected via cross-validation (for example). What you propose here makes using `QuantileDiscretizer` and a non-default `handleInvalid` param together with cross-validation impossible. In addition, as you've pointed out in your code example above, this would force a pretty clunky "workaround" to set the `handleInvalid` param in a pipeline. Why do this? What is the actual problem with what exists currently? To me it seems better the way it is. Also, I don't see any major benefit to adding a new `QuantileDiscretizerModel`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15995 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15995 **[Test build #69220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69220/consoleFull)** for PR 15995 at commit [`354a860`](https://github.com/apache/spark/commit/354a8605b5e539341f67f59ea507cc6f07a23eb3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15995 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69220/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16008: [SPARK-18585][SQL] Use `ev.isNull = "false"` if p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16008 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16008: [SPARK-18585][SQL] Use `ev.isNull = "false"` if possible...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16008 Thanks - merging in master/branch-2.1. We should look into the Janino change in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [SPARK-3359][DOCS] Make javadoc8 working for unidoc/genj...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16013 I think it is ready to be reviewed - @srowen. Thank you for your close look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16008: [SPARK-18585][SQL] Use `ev.isNull = "false"` if possible...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16008 Hmm, it is a great idea, but I think it would be very hard to submit such a patch (at least for me now). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89727420 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala --- @@ -40,14 +40,9 @@ case class JdbcType(databaseTypeDefinition : String, jdbcNullType : Int) * SQL dialect of a certain database or jdbc driver. * Lots of databases define types that aren't explicitly supported * by the JDBC spec. Some JDBC drivers also report inaccurate - * information---for instance, - * - * {{{ - * BIT(n>1) - * }}} - * - * being reported as a BIT type is quite common, even though BIT in JDBC is meant for single-bit - * values. Also, there does not appear to be a standard name for an unbounded string or binary + * information---for instance, BIT(n{@literal >}1) being reported as a BIT type is quite --- End diff -- - Java ![2016-11-28 4 20 51](https://cloud.githubusercontent.com/assets/6477701/20659399/b19d2c4c-b586-11e6-9098-f56c75676711.png) - Scala ![2016-11-28 4 21 03](https://cloud.githubusercontent.com/assets/6477701/20659401/b3f8dfa4-b586-11e6-9792-e8a8a1f85ca3.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89727312 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { case v => throw new IllegalArgumentException( s"Unsupported value type ${v.getClass.getName} ($v).") } + + /** + * (Scala-specific) Returns a new [[DataFrame]] that replaces null or NaN values in specified + * numeric, bool, string columns. If a specified column is not a numeric, boolean, string column, + * it is ignored. + * + * @since 2.1.0 + */ + private def fill1[T](value: T, cols: Seq[String]): DataFrame = { +value match { + case _: jl.Double | _: jl.Integer | _: jl.Float | _: jl.Boolean | _: jl.Long | _: String => + case _ => +throw new IllegalArgumentException( + s"Unsupported value type ${value.getClass.getName} ($value).") +} + +val targetColumnType = value match { + case _: jl.Double | _: jl.Integer | _: jl.Float | _: jl.Long => NumericType + case _: jl.Boolean => BooleanType + case _: String => StringType +} + +val columnEquals = df.sparkSession.sessionState.analyzer.resolver +val projections = df.schema.fields.map { f => + // Only fill if the column is part of the cols list. + if (((f.dataType.isInstanceOf[NumericType] && targetColumnType == NumericType) --- End diff -- Thanks! I have modified except one: If T is a double type , this should be apply to all Numeric columns(include LongType/IntegerType), or just apply to FractionType? The fill(value Double) apply to all Numeric columns, and I think fill(value Long) also keep the logic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89727297 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -41,7 +41,7 @@ private[ml] trait VectorIndexerParams extends Params with HasInputCol with HasOu /** * Threshold for the number of values a categorical feature can take. - * If a feature is found to have greater than maxCategories values, then it is declared + * If a feature is found to have {@literal >} maxCategories values, then it is declared --- End diff -- Scaladoc/javadoc not found. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89727185 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDDCheckpointData.scala --- @@ -24,9 +24,7 @@ import org.apache.spark.Partition /** * Enumeration to manage state transitions of an RDD through checkpointing * - * {{{ - * [ Initialized --> checkpointing in progress --> checkpointed ] - * }}} + * [ Initialized --{@literal >} checkpointing in progress --{@literal >} checkpointed ] --- End diff -- - Java ![2016-11-28 3 41 11](https://cloud.githubusercontent.com/assets/6477701/20658544/1c962f72-b581-11e6-9126-1b0a6fc8354a.png) Scaladoc not found. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89727091 --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala --- @@ -422,13 +422,8 @@ private[spark] object UIUtils extends Logging { * the whole string will rendered as a simple escaped text. * * Note: In terms of security, only anchor tags with root relative links are supported. So any - * attempts to embed links outside Spark UI, or other tags like - * - * {{{ - *
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89727059 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala --- @@ -33,9 +33,9 @@ import org.apache.spark.sql.types.StructType * use Spark SQL built-in function and UDFs to operate on these selected columns. * For example, [[SQLTransformer]] supports statements like: * {{{ - * - SELECT a, a + b AS a_b FROM __THIS__ - * - SELECT a, SQRT(b) AS b_sqrt FROM __THIS__ where a > 5 - * - SELECT a, b, SUM(c) AS c_sum FROM __THIS__ GROUP BY a, b + * SELECT a, a + b AS a_b FROM __THIS__ + * SELECT a, SQRT(b) AS b_sqrt FROM __THIS__ where a > 5 + * SELECT a, b, SUM(c) AS c_sum FROM __THIS__ GROUP BY a, b --- End diff -- - Java ![2016-11-28 2 28 18](https://cloud.githubusercontent.com/assets/6477701/20657138/f208e7ae-b576-11e6-82ee-bb709ee03c2f.png) - Scala ![2016-11-28 2 28 24](https://cloud.githubusercontent.com/assets/6477701/20657139/f35c7242-b576-11e6-9a4e-1b68943ad4b1.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89727120 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -41,9 +41,9 @@ private[spark] class JdbcPartition(idx: Int, val lower: Long, val upper: Long) e * The RDD takes care of closing the connection. * @param sql the text of the query. * The query must contain two ? placeholders for parameters used to partition the results. - * + * For example, * {{{ - * E.g. "select title, author from books where ? <= id and id <= ?" + * select title, author from books where ? <= id and id <= ? * }}} --- End diff -- - Java ![2016-11-28 2 21 59](https://cloud.githubusercontent.com/assets/6477701/20657043/29e4d314-b576-11e6-9946-fc6502920d4b.png) - Scala ![2016-11-28 2 22 34](https://cloud.githubusercontent.com/assets/6477701/20657044/29eaa686-b576-11e6-86cd-c0fcb9449e0f.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16013 **[Test build #69228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69228/consoleFull)** for PR 16013 at commit [`d2c6e86`](https://github.com/apache/spark/commit/d2c6e8606fd61e21f5bbe9bee4f70b7599b525f4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15994 **[Test build #69229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69229/consoleFull)** for PR 15994 at commit [`2043283`](https://github.com/apache/spark/commit/2043283f84fe046aa80232f3921918a176b06540). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89726626 --- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala --- @@ -153,11 +153,9 @@ class DoubleRDDFunctions(self: RDD[Double]) extends Logging with Serializable { /** * Compute a histogram using the provided buckets. The buckets are all open * to the right except for the last which is closed. - * {{{ * e.g. for the array * [1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50] - * e.g 1<=x<10 , 10<=x<20, 20<=x<=50 - * }}} + * e.g {@code <=x<10, 10<=x<20, 20<=x<=50} --- End diff -- - Java ![2016-11-28 3 03 44](https://cloud.githubusercontent.com/assets/6477701/20657813/4176191a-b57c-11e6-92b5-72e88667354f.png) - Scala ![2016-11-28 3 03 31](https://cloud.githubusercontent.com/assets/6477701/20657814/4177599c-b57c-11e6-83b1-83f98a57ef2a.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16029: [MINOR][ML] Remove duplicate import in GLR
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16029 **[Test build #69227 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69227/consoleFull)** for PR 16029 at commit [`2e01a62`](https://github.com/apache/spark/commit/2e01a622ce06a2d92390d3e32da145c556231520). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16029: [MINOR][ML] Remove duplicate import in GLR
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16029 [MINOR][ML] Remove duplicate import in GLR ## What changes were proposed in this pull request? there were two `import GeneralizedLinearRegression._` in trait GLR.GeneralizedLinearRegressionBase ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark del_duplicate_import Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16029.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16029 commit 2e01a622ce06a2d92390d3e32da145c556231520 Author: Zheng RuiFengDate: 2016-11-28T07:04:29Z create pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16028 **[Test build #69226 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69226/consoleFull)** for PR 16028 at commit [`ae74a3e`](https://github.com/apache/spark/commit/ae74a3e3272e8a9e40cc3225f65ae80e87e7e0ed). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16028 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69226/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16028 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16028 **[Test build #69226 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69226/consoleFull)** for PR 16028 at commit [`ae74a3e`](https://github.com/apache/spark/commit/ae74a3e3272e8a9e40cc3225f65ae80e87e7e0ed). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15986 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15986 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69218/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16028: [SPARK-18518][ML] HasSolver supports override
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/16028 [SPARK-18518][ML] HasSolver supports override ## What changes were proposed in this pull request? 1, make param support non-final with `finalFields` option 2, generate `HasSolver` with `finalFields = false` 3, override `solver` in LiR, GLR, and make MLPC inherit `HasSolver` ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark param_non_final Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16028.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16028 commit 58349ca56267350241b9714810aa1411dd3a5d71 Author: Zheng RuiFengDate: 2016-11-25T11:18:55Z create pr commit ae74a3e3272e8a9e40cc3225f65ae80e87e7e0ed Author: Zheng RuiFeng Date: 2016-11-28T06:16:15Z create pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15986 **[Test build #69218 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69218/consoleFull)** for PR 15986 at commit [`9c6ce7e`](https://github.com/apache/spark/commit/9c6ce7e8ceadcdae3ce36a147aac7cf680d5a86f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89724422 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDDCheckpointData.scala --- @@ -23,7 +23,8 @@ import org.apache.spark.Partition /** * Enumeration to manage state transitions of an RDD through checkpointing - * [ Initialized --> checkpointing in progress --> checkpointed ]. + * + * [ Initialized --{@literal >} checkpointing in progress --{@literal >} checkpointed ] --- End diff -- - Java ![2016-11-28 3 41 11](https://cloud.githubusercontent.com/assets/6477701/20658544/1c962f72-b581-11e6-9126-1b0a6fc8354a.png) Scaladoc not found. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16013 **[Test build #69225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69225/consoleFull)** for PR 16013 at commit [`7b13fad`](https://github.com/apache/spark/commit/7b13fad10fe93a8ee2c6f84626209d98745dc313). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15983 **[Test build #69223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69223/consoleFull)** for PR 15983 at commit [`ca75331`](https://github.com/apache/spark/commit/ca753311a6d61452d7c29a349b8c34e66998f5ee). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15976 **[Test build #69224 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69224/consoleFull)** for PR 15976 at commit [`6db5af9`](https://github.com/apache/spark/commit/6db5af95e456d6529a37c243f41a4632a69f40d0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15979 retest it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15983 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15837: [SPARK-18395][SQL] Evaluate common subexpression like la...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15837 @kiszk Do you mean to avoid subexpression elimination? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15976: [SPARK-18403][SQL] Fix unsafe data false sharing issue i...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15976 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15975 LGTM except https://github.com/apache/spark/pull/15975/files#r89722356, what's the status of it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15975#discussion_r89722356 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -404,6 +425,7 @@ class JDBCSuite extends SparkFunSuite numPartitions = 0, --- End diff -- it's merged, has it been fixed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15837: [SPARK-18395][SQL] Evaluate common subexpression like la...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15837 @cloud-fan Sure, no problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15837: [SPARK-18395][SQL] Evaluate common subexpression like la...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15837 Sorry for the delay, but I may not have time to review it before the 2.1 release, can you hold it off until 2.1 release? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16013 **[Test build #69222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69222/consoleFull)** for PR 16013 at commit [`0e6ed2b`](https://github.com/apache/spark/commit/0e6ed2b5098af4c5d2abbdeca6e2ed45523e00e5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89721863 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -405,13 +406,14 @@ case class WrapOption(child: Expression, optType: DataType) * A place holder for the loop variable used in [[MapObjects]]. This should never be constructed * manually, but will instead be passed into the provided lambda function. */ -case class LambdaVariable(value: String, isNull: String, dataType: DataType) extends LeafExpression +case class LambdaVariable(value: String, isNull: String, dataType: DataType, +valueNullable: Boolean = true) extends LeafExpression --- End diff -- I meant that we could use the parameter name `nullable` like: ```scala case class LambdaVariable(value: String, isNull: String, dataType: DataType, nullable: Boolean = true) extends LeafExpression ``` and remove `override def nullable: Boolean = valueNullable`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15780: [SPARK-18284][SQL] Make ExpressionEncoder.serializer.nul...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15780 **[Test build #69221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69221/consoleFull)** for PR 15780 at commit [`214c6bb`](https://github.com/apache/spark/commit/214c6bb2d7aaf773d01a846795eb78f1e07e4ed1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89721435 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -177,9 +177,10 @@ case class Invoke( functionName: String, dataType: DataType, arguments: Seq[Expression] = Nil, -propagateNull: Boolean = true) extends InvokeLike { +propagateNull: Boolean = true, +returnNullable : Boolean = true) extends InvokeLike { - override def nullable: Boolean = true + override def nullable: Boolean = targetObject.nullable || returnNullable --- End diff -- i see. done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89721405 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { case v => throw new IllegalArgumentException( s"Unsupported value type ${v.getClass.getName} ($v).") } + + /** + * (Scala-specific) Returns a new [[DataFrame]] that replaces null or NaN values in specified + * numeric, bool, string columns. If a specified column is not a numeric, boolean, string column, + * it is ignored. + * + * @since 2.1.0 + */ + private def fill1[T](value: T, cols: Seq[String]): DataFrame = { +value match { + case _: jl.Double | _: jl.Integer | _: jl.Float | _: jl.Boolean | _: jl.Long | _: String => + case _ => +throw new IllegalArgumentException( + s"Unsupported value type ${value.getClass.getName} ($value).") +} + +val targetColumnType = value match { --- End diff -- nit: we can combine the check here: ``` val targetType = value match { case _: Long => LongType case _: Double => DoubleType case _: String => StringType case _ => throw ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89721142 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -437,4 +444,38 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { case v => throw new IllegalArgumentException( s"Unsupported value type ${v.getClass.getName} ($v).") } + + /** + * (Scala-specific) Returns a new [[DataFrame]] that replaces null or NaN values in specified --- End diff -- we don't need `(Scala-specific)` and the `since` tag for private methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89721112 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -153,19 +168,20 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { * (Scala-specific) Returns a new [[DataFrame]] that replaces null or NaN values in specified * numeric columns. If a specified column is not a numeric column, it is ignored. * + * @since 2.1.0 + */ + def fill(value: Long, cols: Seq[String]): DataFrame = { +fill1[Long](value, cols) --- End diff -- nit: `fill1(value, cols)` should work, scala has type inference. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss ...
Github user windpiger commented on a diff in the pull request: https://github.com/apache/spark/pull/15994#discussion_r89721087 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala --- @@ -128,66 +128,49 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { } /** - * Returns a new [[DataFrame]] that replaces null or NaN values in numeric columns with `value`. + * Returns a new [[DataFrame]] that replaces null or NaN values + * in numeric, boolean, string columns with `value`. * * @since 1.3.1 */ - def fill(value: Double): DataFrame = fill(value, df.columns) + def fill[T](value: T): DataFrame = fill(value, df.columns) --- End diff -- ok, thanks a lot ! I have put it as a private. @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16003: [SPARK-18482][SQL] make sure Spark can access the...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16003 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15995 **[Test build #69220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69220/consoleFull)** for PR 15995 at commit [`354a860`](https://github.com/apache/spark/commit/354a8605b5e539341f67f59ea507cc6f07a23eb3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15995: [SPARK-18566][SQL] remove OverwriteOptions
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15995 @ericl you are right, I pushed a new commit to do `convertStaticPartitions` right before we convert `InsertIntoTable` to `InsertIntoHadoopFsRelation`, so the partitioning information won't be erased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16003: [SPARK-18482][SQL] make sure Spark can access the table ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16003 Merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16003: [SPARK-18482][SQL] make sure Spark can access the table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16003 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16003: [SPARK-18482][SQL] make sure Spark can access the table ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16003 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69217/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16003: [SPARK-18482][SQL] make sure Spark can access the table ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16003 **[Test build #69217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69217/consoleFull)** for PR 16003 at commit [`117f532`](https://github.com/apache/spark/commit/117f5321cac62f01a5726c308efaf7369a9cdc9d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89719801 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/optimization/NNLS.scala --- @@ -53,8 +53,13 @@ private[spark] object NNLS { * projected gradient method. That is, find x minimising ||Ax - b||_2 given A^T A and A^T b. * * We solve the problem - * min_x 1/2 x^T ata x^T - x^T atb - * subject to x = 0 + * + * + *$$ + *min_x 1/2 x^T ata x^T - x^T atb + *$$ + * --- End diff -- - Java ![2016-11-28 2 32 13](https://cloud.githubusercontent.com/assets/6477701/20657212/799229f6-b577-11e6-9616-30a1e3f7ee1f.png) - Scala (not found but manually built after changing the access modifier) ![2016-11-28 2 03 57](https://cloud.githubusercontent.com/assets/6477701/20657201/5a8d37a8-b577-11e6-8af5-8ed07c65a0ac.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15986 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89719600 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/SQLTransformer.scala --- @@ -32,9 +32,11 @@ import org.apache.spark.sql.types.StructType * the output, it can be any select clause that Spark SQL supports. Users can also * use Spark SQL built-in function and UDFs to operate on these selected columns. * For example, [[SQLTransformer]] supports statements like: - * - SELECT a, a + b AS a_b FROM __THIS__ - * - SELECT a, SQRT(b) AS b_sqrt FROM __THIS__ where a > 5 - * - SELECT a, b, SUM(c) AS c_sum FROM __THIS__ GROUP BY a, b + * {{{ + * SELECT a, a + b AS a_b FROM __THIS__ + * SELECT a, SQRT(b) AS b_sqrt FROM __THIS__ where a > 5 + * SELECT a, b, SUM(c) AS c_sum FROM __THIS__ GROUP BY a, b + * }}} --- End diff -- - Java ![2016-11-28 2 28 18](https://cloud.githubusercontent.com/assets/6477701/20657138/f208e7ae-b576-11e6-82ee-bb709ee03c2f.png) - Scala ![2016-11-28 2 28 24](https://cloud.githubusercontent.com/assets/6477701/20657139/f35c7242-b576-11e6-9a4e-1b68943ad4b1.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15986 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69216/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15986 **[Test build #69216 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69216/consoleFull)** for PR 15986 at commit [`3392903`](https://github.com/apache/spark/commit/3392903734bf5f00258f0652c971938846e64bcd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89719290 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -41,7 +41,10 @@ private[spark] class JdbcPartition(idx: Int, val lower: Long, val upper: Long) e * The RDD takes care of closing the connection. * @param sql the text of the query. * The query must contain two ? placeholders for parameters used to partition the results. - * E.g. "select title, author from books where ? <= id and id <= ?" + * For example, + * {{{ + * select title, author from books where ? <= id and id <= ? + * }}} --- End diff -- - Java ![2016-11-28 2 21 59](https://cloud.githubusercontent.com/assets/6477701/20657043/29e4d314-b576-11e6-9946-fc6502920d4b.png) - Scala ![2016-11-28 2 22 34](https://cloud.githubusercontent.com/assets/6477701/20657044/29eaa686-b576-11e6-86cd-c0fcb9449e0f.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89719223 --- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala --- @@ -152,10 +152,10 @@ class DoubleRDDFunctions(self: RDD[Double]) extends Logging with Serializable { /** * Compute a histogram using the provided buckets. The buckets are all open - * to the right except for the last which is closed + * to the right except for the last which is closed. * e.g. for the array * [1, 10, 20, 50] the buckets are [1, 10) [10, 20) [20, 50] - * e.g 1<=x<10 , 10<=x<20, 20<=x<=50 + * e.g {@code <=x<10 , 10<=x<20, 20<=x<=50} --- End diff -- - Java ![2016-11-28 2 20 56](https://cloud.githubusercontent.com/assets/6477701/20657020/f32869bc-b575-11e6-968b-59642b5edfc6.png) - Scala ![2016-11-28 2 21 14](https://cloud.githubusercontent.com/assets/6477701/20657021/f48b5cce-b575-11e6-9d0e-d399a539acb4.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #10942: [SPARK-12850] [SQL] Support Bucket Pruning (Predicate Pu...
Github user yucai commented on the issue: https://github.com/apache/spark/pull/10942 @gatorsmile, seems like getBuckets() is removed by below PR, which makes this feture not work any more, could you kindly help check? [SPARK-14535][SQL] Remove buildInternalScan from FileFormat Much thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16013 **[Test build #69219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69219/consoleFull)** for PR 16013 at commit [`29d65cc`](https://github.com/apache/spark/commit/29d65cce3e5f2e29010609c9323cd79ca889b9f8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for unidoc...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16013 Let me leave some images that I changed and some comments to double check. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16013: [WIP][SPARK-3359][DOCS] Make javadoc8 working for...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16013#discussion_r89718720 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -57,9 +57,17 @@ import org.apache.spark.util.SerializableJobConf * @param partition a map from the partition key to the partition value (optional). If the partition * value is optional, dynamic partition insert will be performed. * As an example, `INSERT INTO tbl PARTITION (a=1, b=2) AS ...` would have - * Map('a' - Some('1'), 'b' - Some('2')), + * + * {{{ + * Map('a' -> Some('1'), 'b' -> Some('2')) + * }}}, + * * and `INSERT INTO tbl PARTITION (a=1, b) AS ...` - * would have Map('a' - Some('1'), 'b' - None). + * would have + * + * {{{ + * Map('a' -> Some('1'), 'b' -> None) + * }}}. --- End diff -- Fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89718517 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -177,9 +177,10 @@ case class Invoke( functionName: String, dataType: DataType, arguments: Seq[Expression] = Nil, -propagateNull: Boolean = true) extends InvokeLike { +propagateNull: Boolean = true, +returnNullable : Boolean = true) extends InvokeLike { - override def nullable: Boolean = true + override def nullable: Boolean = targetObject.nullable || returnNullable --- End diff -- Yes, that's right. When `needNullCheck == true`, i.e. `propagateNull && arguments.exists(_.nullable)`, if there is null argument, `Invoke` returns null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89718401 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -177,9 +177,10 @@ case class Invoke( functionName: String, dataType: DataType, arguments: Seq[Expression] = Nil, -propagateNull: Boolean = true) extends InvokeLike { +propagateNull: Boolean = true, +returnNullable : Boolean = true) extends InvokeLike { - override def nullable: Boolean = true + override def nullable: Boolean = targetObject.nullable || returnNullable --- End diff -- OK. I see. What you want is to have `Invoke` as null if any input argument is null, no matters what the invoked method returns. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89718250 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -177,9 +177,10 @@ case class Invoke( functionName: String, dataType: DataType, arguments: Seq[Expression] = Nil, -propagateNull: Boolean = true) extends InvokeLike { +propagateNull: Boolean = true, +returnNullable : Boolean = true) extends InvokeLike { - override def nullable: Boolean = true + override def nullable: Boolean = targetObject.nullable || returnNullable --- End diff -- When `returnNullable` is false, meaning the invoked method doesn't return a null, and `targetObject` is also not null, why `Invoke.nullable` is true? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89718035 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -177,9 +177,10 @@ case class Invoke( functionName: String, dataType: DataType, arguments: Seq[Expression] = Nil, -propagateNull: Boolean = true) extends InvokeLike { +propagateNull: Boolean = true, +returnNullable : Boolean = true) extends InvokeLike { - override def nullable: Boolean = true + override def nullable: Boolean = targetObject.nullable || returnNullable --- End diff -- What do you think the behavior of `Invoke` when `returnNullable == false && needNullCheck == true`? IMO, `returnNullable` is for the calling method itself, not for `Invoke`. I think there will be the case that we want to propagate null if the method won't return a null value. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16011: [SPARK-18587][ML] Remove handleInvalid from QuantileDisc...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16011 @MLnick Yeah, I think this is the most common case that copying Params from estimators to models. However, I also found some algorithms do not comply this rule, such as ```ALS``` which has ```ALSParams``` and ```ALSModelParams``` for estimator and model separately. I think we can set params to models not via estimator, for example: ``` val discretizer = new QuantileDiscretizer() val pipeline = new Pipeline().setStages(Array(discretizer)) val model = pipeline.fit(df) model.stages(0).asInstanceOf[Bucketizer].setHandleInvalid("skip") ``` I know this way is a little tricky, a better way may be we can have ```QuantileDiscretizerModel``` which is produced by ```QuantileDiscretizer```. Think more about it, ```Bucketizer``` is a separate transformer which mainly has two params(```splits``` and ```handleInvalid```) can be set. Users can provides candidates for these two params when doing cross validation to select the best model. But if we constrict it must be produced by ```QuantileDiscretizer```, the ```splits``` would be a member variable of the model rather than a param. From this perspective, it's more make sense to see ```Bucketizer``` as a separate transformer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89716676 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -177,9 +177,10 @@ case class Invoke( functionName: String, dataType: DataType, arguments: Seq[Expression] = Nil, -propagateNull: Boolean = true) extends InvokeLike { +propagateNull: Boolean = true, +returnNullable : Boolean = true) extends InvokeLike { - override def nullable: Boolean = true + override def nullable: Boolean = targetObject.nullable || returnNullable --- End diff -- If `returnNullable` is given false from caller, I think it means we do make sure this `Invoke` won't return a null value, e.g., for a primitive type. And it should be even `needNullCheck` is true. Under this case (`returnNullable == false`), the only case this `Invoke` returning null should be `targetObject` is nullable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15994 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69214/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/15986#discussion_r89716522 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -350,20 +350,16 @@ private[spark] class TaskSchedulerImpl( removeExecutor(execId, reason.get) failedExecutor = Some(execId) } +} +if (TaskState.isFinished(state)) { + cleanupTaskState(tid) --- End diff -- I don't think that's necessary because all access to the TSM is gated on the TaskSchedulerImpl, so even though the TaskResultGetter might do some stuff, the TSM's state won't be accessed until the later handleSuccessfulTask call to the TaskSchedulerImpl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15994 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15994 **[Test build #69214 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69214/consoleFull)** for PR 15994 at commit [`508aaa0`](https://github.com/apache/spark/commit/508aaa0b68f049fb463f1334784b6417d739e816). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16027: Make sure CollapseWindow returns the attributes in the s...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16027 Could you add `[SPARK-18604][SQL]` before merging? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskSetManag...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15986 **[Test build #69218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69218/consoleFull)** for PR 15986 at commit [`9c6ce7e`](https://github.com/apache/spark/commit/9c6ce7e8ceadcdae3ce36a147aac7cf680d5a86f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15994 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16027: Make sure CollapseWindow returns the attributes i...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16027#discussion_r89716395 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/CollapseWindowSuite.scala --- @@ -46,12 +46,15 @@ class CollapseWindowSuite extends PlanTest { .window(Seq(sum(b).as('sum_b)), partitionSpec1, orderSpec1) .window(Seq(avg(b).as('avg_b)), partitionSpec1, orderSpec1) -val optimized = Optimize.execute(query.analyze) +val analyzed = query.analyze +val optimized = Optimize.execute(analyzed) +assert(analyzed.output === optimized.output) + val correctAnswer = testRelation.window(Seq( -avg(b).as('avg_b), -sum(b).as('sum_b), -max(a).as('max_a), -min(a).as('min_a)), partitionSpec1, orderSpec1) --- End diff -- While making this, I didn't notice this is strange. :( --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15994 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69213/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16027: Make sure CollapseWindow returns the attributes in the s...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16027 LGTM. Thank you for correct this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16027: Make sure CollapseWindow returns the attributes i...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16027#discussion_r89716308 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -545,7 +545,7 @@ object CollapseRepartition extends Rule[LogicalPlan] { object CollapseWindow extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { case w @ Window(we1, ps1, os1, Window(we2, ps2, os2, grandChild)) if ps1 == ps2 && os1 == os2 => - w.copy(windowExpressions = we1 ++ we2, child = grandChild) --- End diff -- Thank you for fixing this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15994: [SPARK-18555][SQL]DataFrameNaFunctions.fill miss up orig...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15994 **[Test build #69213 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69213/consoleFull)** for PR 15994 at commit [`7117447`](https://github.com/apache/spark/commit/71174472e1d01be450162cd22843345e4d14b00c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/15986#discussion_r89716251 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -350,20 +350,16 @@ private[spark] class TaskSchedulerImpl( removeExecutor(execId, reason.get) failedExecutor = Some(execId) } +} +if (TaskState.isFinished(state)) { + cleanupTaskState(tid) --- End diff -- I can do that, but I we might want to make sure that `taskSet.removeRunningTask` is called prior to the `taskResultGetter` call. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69215/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15874: [Spark-18408][ML] API Improvements for LSH
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15874 **[Test build #69215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69215/consoleFull)** for PR 15874 at commit [`e198080`](https://github.com/apache/spark/commit/e198080557c598286363184855a6f368d60b45e3). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class ClusteringSummary(JavaWrapper):` * `class GaussianMixtureSummary(ClusteringSummary):` * `class BisectingKMeansSummary(ClusteringSummary):` * `trait CollectionGenerator extends Generator ` * `case class Stack(children: Seq[Expression]) extends Generator ` * `abstract class ExplodeBase extends UnaryExpression with CollectionGenerator with Serializable ` * `case class Explode(child: Expression) extends ExplodeBase ` * `case class PosExplode(child: Expression) extends ExplodeBase ` * `case class Inline(child: Expression) extends UnaryExpression with CollectionGenerator ` * `case class OuterReference(e: NamedExpression)` * `trait InvokeLike extends Expression with NonSQLExpression ` * `case class ColumnStat(` * `case class UncacheTableCommand(` * `case class OffsetSeq(offsets: Seq[Option[Offset]], metadata: Option[String] = None) ` * `case class SparkListenerDriverAccumUpdates(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89715587 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -590,7 +591,11 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } - val fieldValue = Invoke(inputObject, fieldName, dataTypeFor(fieldType)) + // primitive take only non-null or struct takes non-null object guarded by isNull --- End diff -- Yes, I agree that this part is little tricky. After waiting for other comments, I will rephrase the comment on Tuesday. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89715546 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala --- @@ -590,7 +591,11 @@ object ScalaReflection extends ScalaReflection { "cannot be used as field name\n" + walkedTypePath.mkString("\n")) } - val fieldValue = Invoke(inputObject, fieldName, dataTypeFor(fieldType)) + // primitive take only non-null or struct takes non-null object guarded by isNull --- End diff -- I think we can only guarantee that the `inputObject` is not null in the false case, so we should use `AssertNotNull()` for `inputObject` and the `fieldValue` will be like: ```scala val fieldValue = Invoke( AssertNotNull(inputObject, walkedTypePath), fieldName, dataTypeFor(fieldType), returnNullable = !fieldType.typeSymbol.asClass.isPrimitive) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89712321 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -177,9 +177,10 @@ case class Invoke( functionName: String, dataType: DataType, arguments: Seq[Expression] = Nil, -propagateNull: Boolean = true) extends InvokeLike { +propagateNull: Boolean = true, +returnNullable : Boolean = true) extends InvokeLike { - override def nullable: Boolean = true + override def nullable: Boolean = targetObject.nullable || returnNullable --- End diff -- I think this would be `targetObject.nullable || needNullCheck || returnNullable`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89713304 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -177,9 +177,10 @@ case class Invoke( functionName: String, dataType: DataType, arguments: Seq[Expression] = Nil, -propagateNull: Boolean = true) extends InvokeLike { +propagateNull: Boolean = true, +returnNullable : Boolean = true) extends InvokeLike { --- End diff -- Add a `@param` document for `returnNullable`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/15780#discussion_r89712401 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala --- @@ -405,13 +406,14 @@ case class WrapOption(child: Expression, optType: DataType) * A place holder for the loop variable used in [[MapObjects]]. This should never be constructed * manually, but will instead be passed into the provided lambda function. */ -case class LambdaVariable(value: String, isNull: String, dataType: DataType) extends LeafExpression +case class LambdaVariable(value: String, isNull: String, dataType: DataType, +valueNullable: Boolean = true) extends LeafExpression --- End diff -- We can use `nullable: Boolean` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16027: Make sure CollapseWindow returns the attributes in the s...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16027 Oh, I missed this. Yep. I'll take a look at this, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16026: [SPARK-18597][SQL] Do push-down predicates to rig...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/16026#discussion_r89715298 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala --- @@ -514,6 +514,39 @@ class FilterPushdownSuite extends PlanTest { comparePlans(optimized, analysis.EliminateSubqueryAliases(correctAnswer)) } + test("joins: push down where clause into left anti join") { +val x = testRelation.subquery('x) +val y = testRelation.subquery('y) +val originalQuery = + x.join(y, LeftAnti, Some("x.b".attr === "y.b".attr)) +.where("x.a".attr > 10) +.analyze +val optimized = Optimize.execute(originalQuery) +val correctAnswer = + x.where("x.a".attr > 10) +.join(y, LeftAnti, Some("x.b".attr === "y.b".attr)) +.analyze +comparePlans(optimized, analysis.EliminateSubqueryAliases(correctAnswer)) + } + + test("joins: only push down to the right of a left anti join") { --- End diff -- Do we need a JIRA issue number here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16026: [SPARK-18597][SQL] Do push-down predicates to right side...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16026 LGTM, @hvanhovell . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16026: [SPARK-18597][SQL] Do push-down predicates to right side...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16026 LGTM except a minor comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/15986#discussion_r89715118 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala --- @@ -350,20 +350,16 @@ private[spark] class TaskSchedulerImpl( removeExecutor(execId, reason.get) failedExecutor = Some(execId) } +} +if (TaskState.isFinished(state)) { + cleanupTaskState(tid) --- End diff -- ok last comment: do you think it's more readable to structure this code as: `if (TaskState.isFinished(state) { if (state == TaskState.LOST) { taskResultGetter.enqueueFailed } else if (SET(TaskState.FAILED, TaskState.KILLED).contains(state)) { taskResultGetter.enqueueFailedTask(taskSet, tid, state, serializedData) } else if (state == TaskState.FINISHED) { taskResultGetter.enqueueSuccessful(...) } cleanupTaskState taskSet.removeRunningTask }` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15986: [SPARK-18553][CORE][branch-2.0] Fix leak of TaskS...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/15986#discussion_r89715211 --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala --- @@ -274,4 +276,70 @@ class TaskSchedulerImplSuite extends SparkFunSuite with LocalSparkContext with L assert("executor1" === taskDescriptions3(0).executorId) } + test("if an executor is lost then the state for its running tasks is cleaned up (SPARK-18553)") { +sc = new SparkContext("local", "TaskSchedulerImplSuite") +val taskScheduler = new TaskSchedulerImpl(sc) +taskScheduler.initialize(new FakeSchedulerBackend) +// Need to initialize a DAGScheduler for the taskScheduler to use for callbacks. +new DAGScheduler(sc, taskScheduler) { + override def taskStarted(task: Task[_], taskInfo: TaskInfo) {} + override def executorAdded(execId: String, host: String) {} +} + +val e0Offers = Seq(WorkerOffer("executor0", "host0", 1)) +val attempt1 = FakeTask.createTaskSet(1) + +// submit attempt 1, offer resources, task gets scheduled +taskScheduler.submitTasks(attempt1) +val taskDescriptions = taskScheduler.resourceOffers(e0Offers).flatten +assert(1 === taskDescriptions.length) + +// mark executor0 as dead +taskScheduler.executorLost("executor0", SlaveLost()) +assert(!taskScheduler.isExecutorAlive("executor0")) +assert(!taskScheduler.hasExecutorsAliveOnHost("host0")) +assert(taskScheduler.getExecutorsAliveOnHost("host0").isEmpty) + + +// Check that state associated with the lost task attempt is cleaned up: +assert(taskScheduler.taskIdToExecutorId.isEmpty) +assert(taskScheduler.taskIdToTaskSetManager.isEmpty) + assert(taskScheduler.runningTasksByExecutors().get("executor0").isEmpty) + } + + test("if a task finishes with TaskState.LOST then mark its executor as dead") { --- End diff -- super nit but can you write this as "if a task finishes with TaskState.LOST its executor is marked as dead" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16017: [SPARK-18592][ML] Move DT/RF/GBT Param setter met...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/16017#discussion_r89715218 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala --- @@ -52,33 +52,49 @@ class DecisionTreeClassifier @Since("1.4.0") ( // Override parameter setters from parent trait for Java API compatibility. + /** @group setParam */ @Since("1.4.0") - override def setMaxDepth(value: Int): this.type = super.setMaxDepth(value) + override def setMaxDepth(value: Int): this.type = set(maxDepth, value) + /** @group setParam */ @Since("1.4.0") - override def setMaxBins(value: Int): this.type = super.setMaxBins(value) + override def setMaxBins(value: Int): this.type = set(maxBins, value) + /** @group setParam */ @Since("1.4.0") - override def setMinInstancesPerNode(value: Int): this.type = -super.setMinInstancesPerNode(value) + override def setMinInstancesPerNode(value: Int): this.type = set(minInstancesPerNode, value) + /** @group setParam */ @Since("1.4.0") - override def setMinInfoGain(value: Double): this.type = super.setMinInfoGain(value) + override def setMinInfoGain(value: Double): this.type = set(minInfoGain, value) + /** @group expertSetParam */ @Since("1.4.0") - override def setMaxMemoryInMB(value: Int): this.type = super.setMaxMemoryInMB(value) + override def setMaxMemoryInMB(value: Int): this.type = set(maxMemoryInMB, value) + /** @group expertSetParam */ @Since("1.4.0") - override def setCacheNodeIds(value: Boolean): this.type = super.setCacheNodeIds(value) + override def setCacheNodeIds(value: Boolean): this.type = set(cacheNodeIds, value) + /** + * Specifies how often to checkpoint the cached node IDs. + * E.g. 10 means that the cache will get checkpointed every 10 iterations. + * This is only used if cacheNodeIds is true and if the checkpoint directory is set in + * [[org.apache.spark.SparkContext]]. + * Must be >= 1. + * (default = 10) + * @group setParam --- End diff -- The cause of this change was suggested at https://github.com/apache/spark/pull/15913#discussion_r89662469 , since Param setter methods in traits used to have the wrong type in Java. We would like to remove the setter method from the trait since it does not make sense to have it in the Model classes. We could put the setter method in each subclass and then deprecate the method in the Model classes. So if we remove the setter method from the traits, we can not inherit docs from them. BTW, the current change is consistent with other ML algorithms which inherit traits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org