[GitHub] spark pull request: [SPARK-14676] Wrap and re-throw Await.result e...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12433#discussion_r60180864 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -802,7 +807,12 @@ private[spark] class BlockManager( logDebug("Put block %s locally took %s".format(blockId, Utils.getUsedTimeMs(startTimeMs))) if (level.replication > 1) { // Wait for asynchronous replication to finish -Await.ready(replicationFuture, Duration.Inf) +try { + Await.ready(replicationFuture, Duration.Inf) --- End diff -- @ScrapCodes, towards your other comment, I think that timeouts in this case might already happen to be covered by network / RPC timeouts within the `replicationFuture`'s code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14676] Wrap and re-throw Await.result e...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12433#discussion_r60180666 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -260,7 +260,12 @@ private[spark] class BlockManager( def waitForAsyncReregister(): Unit = { val task = asyncReregisterTask if (task != null) { - Await.ready(task, Duration.Inf) + try { +Await.ready(task, Duration.Inf) --- End diff -- According to the Scaladoc (and actual usages), it looks like this particular `waitForAsyncReregister` method is only used in test code and I'm guessing that it's probably called from within an interrupt-based timeout block. As for the other usages, we'd have to consider them on a case-by-case basis. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-211760253 When using member method as udf., for example, `def createTransformFunc` in `org.apache.spark.ml.Transformer`, jenkins tests always get an exception. Otherwise, it works well. BTW, I can't reproduce that exception locally. Maybe java version matters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14127][SQL][WIP] Describe table
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/12460#discussion_r60180497 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -254,6 +251,21 @@ class SparkSqlAstBuilder extends AstBuilder { } } + /** +* A column path can be specified as an parameter to describe command. It is a dot separated +* elements where the last element can be a String. +* TODO - check with Herman --- End diff -- Yeah Herman. Not supporting it would certainly simplify things. FYI - I checked that the unit test case describe_xpath.q which exercises this syntax is not getting tested in HiveCompatibleSuite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14719] WriteAheadLogBasedBlockHandler s...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/12484#discussion_r60180310 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/ReceivedBlockHandlerSuite.scala --- @@ -204,26 +222,26 @@ class ReceivedBlockHandlerSuite sparkConf.set("spark.storage.unrollFraction", "0.4") // Block Manager with 12000 * 0.4 = 4800 bytes of free space for unroll blockManager = createBlockManager(12000, sparkConf) +// This block is way too large to possibly be cached in memory: +def hugeBlock: IteratorBlock = IteratorBlock(List.fill(100)(new Array[Byte](1000)).iterator) // there is not enough space to store this block in MEMORY, // But BlockManager will be able to serialize this block to WAL // and hence count returns correct value. --- End diff -- @dibbhatt, I'm confused because it seems like your comment says that we should fail a job if blocks cannot be persisted because without that persistence the job will not work correctly even if the WAL is enabled. However, that claim seems to be contradicted by the comment describing this test case, which seems to suggest that this job should succeed despite the block being far too large to be successfully stored. In the old test case, however, the block appeared to be too small and actually _was_ being stored in memory, meaning that this comment wasn't describing the actual behavior of the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12352 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12352#issuecomment-211759198 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12352#discussion_r60180155 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala --- @@ -53,33 +55,77 @@ class FileScanRDD( override def compute(split: Partition, context: TaskContext): Iterator[InternalRow] = { val iterator = new Iterator[Object] with AutoCloseable { + private val inputMetrics = context.taskMetrics().inputMetrics + private val existingBytesRead = inputMetrics.bytesRead + + // Find a function that will return the FileSystem bytes read by this thread. Do this before + // apply readFunction, because it might read some bytes. + private val getBytesReadCallback: Option[() => Long] = +SparkHadoopUtil.get.getFSBytesReadOnThreadCallback() + + // For Hadoop 2.5+, we get our input bytes from thread-local Hadoop FileSystem statistics. + // If we do a coalesce, however, we are likely to compute multiple partitions in the same + // task and in the same thread, in which case we need to avoid override values written by + // previous partitions (SPARK-13071). + private def updateBytesRead(): Unit = { +getBytesReadCallback.foreach { getBytesRead => + inputMetrics.setBytesRead(existingBytesRead + getBytesRead()) +} + } + + // If we can't get the bytes read from the FS stats, fall back to the file size, + // which may be inaccurate. + private def updateBytesReadWithFileSize(): Unit = { +if (getBytesReadCallback.isEmpty && currentFile != null) { + inputMetrics.incBytesRead(currentFile.length) +} + } + private[this] val files = split.asInstanceOf[FilePartition].files.toIterator + private[this] var currentFile: PartitionedFile = null private[this] var currentIterator: Iterator[Object] = null def hasNext = (currentIterator != null && currentIterator.hasNext) || nextIterator() - def next() = currentIterator.next() + def next() = { +val nextElement = currentIterator.next() +// TODO: we should have a better separation of row based and batch based scan, so that we --- End diff -- i think in the future maybe we should just make everything batch based, and then this problem goes away. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12352#issuecomment-211759031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56194/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12352#issuecomment-211759029 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Support single argument version of sqlContext....
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12488#issuecomment-211759087 Just a minor doc comment. LGTM otherwise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Support single argument version of sqlContext....
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12488#discussion_r60180069 --- Diff: python/pyspark/sql/context.py --- @@ -147,12 +147,24 @@ def setConf(self, key, value): self._ssql_ctx.setConf(key, value) @since(1.3) -def getConf(self, key, defaultValue): +def getConf(self, key, defaultValue=None): """Returns the value of Spark SQL configuration property for the given key. -If the key is not set, returns defaultValue. +If the key is not set, returns defaultValue, if set, otherwise, return the --- End diff -- Maybe ``` If the key is not set and defaultValue is not None, return defaultValue. If the key is not set and defaultValue is None, return the system default value. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12352#issuecomment-211758857 **[Test build #56194 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56194/consoleFull)** for PR 12352 at commit [`c265546`](https://github.com/apache/spark/commit/c26554639f4a2615907d7b46af3005ff3f335d08). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12353#discussion_r60180011 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeCodegenSuite.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.SimpleCatalystConf +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.Literal._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules._ + + +class OptimizeCodegenSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = Batch("OptimizeCodegen", Once, OptimizeCodegen(SimpleCatalystConf(true))) :: Nil + } + + protected def assertEquivalent(e1: Expression, e2: Expression): Unit = { +val correctAnswer = Project(Alias(e2, "out")() :: Nil, OneRowRelation).analyze +val actual = Optimize.execute(Project(Alias(e1, "out")() :: Nil, OneRowRelation).analyze) +comparePlans(actual, correctAnswer) + } + + test("Codegen only when the number of branches is small.") { --- End diff -- Oh. Sure. I'll add those testcases, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12353#discussion_r60179863 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -142,16 +139,54 @@ case class CaseWhen(branches: Seq[(Expression, Expression)], elseValue: Option[E } } - def shouldCodegen: Boolean = { -branches.length < CaseWhen.MAX_NUM_CASES_FOR_CODEGEN + override def toString: String = { +val cases = branches.map { case (c, v) => s" WHEN $c THEN $v" }.mkString +val elseCase = elseValue.map(" ELSE " + _).getOrElse("") +"CASE" + cases + elseCase + " END" } + override def sql: String = { +val cases = branches.map { case (c, v) => s" WHEN ${c.sql} THEN ${v.sql}" }.mkString +val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("") +"CASE" + cases + elseCase + " END" + } +} + + +/** + * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END". + * When a = true, returns b; when c = true, returns d; else returns e. + * + * @param branches seq of (branch condition, branch value) + * @param elseValue optional value for the else branch + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END - When a = true, returns b; when c = true, return d; else return e.") +// scalastyle:on line.size.limit +case class CaseWhen( +val branches: Seq[(Expression, Expression)], +val elseValue: Option[Expression] = None) + extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable { --- End diff -- That would be right. `CaseWhenCodegen` is always generated from `CaseWhen`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12353#discussion_r60179727 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystConf.scala --- @@ -29,6 +29,7 @@ trait CatalystConf { def groupByOrdinal: Boolean def optimizerMaxIterations: Int + def maxCaseBranches: Int --- End diff -- Thank you for quick review. Sure. And also `maxCaseBranchesForCodegen` in SQLConf.scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14719] WriteAheadLogBasedBlockHandler s...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/12484#issuecomment-211758183 @dibbhatt, Are you suggesting that this pull request introduces a bug? If so, are there any regression tests that will demonstrate it? I'm still unclear on precisely what the problem is from Spark Streaming's point of view, since your linked PR only adds unit tests for BlockManager functionality and doesn't have end-to-end application-level tests which exhibit how the old BlockManager behavior caused problems for streaming. The PR discussion that you linked to is really long and has a somewhat unclear resolution. If there was a bug which motivated that PR, do you know whether it was previously resolved through another patch / other fixes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-211758161 What's the problem with runtime mirror? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12490#issuecomment-211757342 **[Test build #56201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56201/consoleFull)** for PR 12490 at commit [`942e145`](https://github.com/apache/spark/commit/942e145b03f2d31a21c90736b80fd380ebf25940). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12490#issuecomment-211756345 **[Test build #56200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56200/consoleFull)** for PR 12490 at commit [`c214204`](https://github.com/apache/spark/commit/c2142049cf9f4e577d9a0d1f57a21c869ae8486a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...
Github user jyshen15 commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211756190 i will handle the python style issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60178619 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/MatrixUDT.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.linalg.udt --- End diff -- You meant to move it to `org.apache.spark.ml.linalg.udt`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/9565 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11593][SQL] Replace catalyst converter ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/9565#issuecomment-211755597 Close this now. Maybe revisit this in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/11926 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14609][SQL] Native support for LOAD DAT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12412#issuecomment-211755452 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12490#issuecomment-211755392 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56199/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12490#issuecomment-211755389 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12490#issuecomment-211755381 **[Test build #56199 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56199/consoleFull)** for PR 12490 at commit [`0630ea3`](https://github.com/apache/spark/commit/0630ea3e0a9c829760fe5cb470dec41c4c1bf677). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14111][SQL] Correct output nullability ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11926#issuecomment-211755298 Close this and think better solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14609][SQL] Native support for LOAD DAT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12412#issuecomment-211755456 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56192/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14609][SQL] Native support for LOAD DAT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12412#issuecomment-211755041 **[Test build #56192 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56192/consoleFull)** for PR 12412 at commit [`08acf5c`](https://github.com/apache/spark/commit/08acf5c9a2638a94ce16df6fab124d3aeeea13d6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14724] Use radix sort for shuffles and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12490#issuecomment-211754828 **[Test build #56199 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56199/consoleFull)** for PR 12490 at commit [`0630ea3`](https://github.com/apache/spark/commit/0630ea3e0a9c829760fe5cb470dec41c4c1bf677). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60178343 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/udt/UDTSuite.scala --- @@ -0,0 +1,99 @@ +/* --- End diff -- Let's create `VectorUDTSuite.scala`, and `MatrixUDT.scala` for maintainability. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60178107 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/udt/MatrixUDT.scala --- @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.linalg.udt + +import org.apache.spark.ml.linalg.{DenseMatrix, Matrix, SparseMatrix} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.GenericMutableRow +import org.apache.spark.sql.catalyst.util.GenericArrayData +import org.apache.spark.sql.types._ + +private[spark] class MatrixUDT extends UserDefinedType[Matrix] { --- End diff -- `private[ml]` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12353#discussion_r60177562 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -242,6 +261,12 @@ object CaseWhen { } } +/** Factory methods for CaseWhenCodegen. */ +object CaseWhenCodegen { --- End diff -- we can remove this given the above comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12353#issuecomment-211751566 cc @cloud-fan this change actually makes your other thing easier i think. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211750751 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56198/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12353#discussion_r60177491 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeCodegenSuite.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.SimpleCatalystConf +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.expressions.Literal._ +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.catalyst.rules._ + + +class OptimizeCodegenSuite extends PlanTest { + + object Optimize extends RuleExecutor[LogicalPlan] { +val batches = Batch("OptimizeCodegen", Once, OptimizeCodegen(SimpleCatalystConf(true))) :: Nil + } + + protected def assertEquivalent(e1: Expression, e2: Expression): Unit = { +val correctAnswer = Project(Alias(e2, "out")() :: Nil, OneRowRelation).analyze +val actual = Optimize.execute(Project(Alias(e1, "out")() :: Nil, OneRowRelation).analyze) +comparePlans(actual, correctAnswer) + } + + test("Codegen only when the number of branches is small.") { --- End diff -- can you make sure you construct a few more test cases one with nested casewhen, and one with multiple case when in one operator, and one with multiple casewhen in different operators --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211750733 **[Test build #56198 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56198/consoleFull)** for PR 11812 at commit [`ecde52c`](https://github.com/apache/spark/commit/ecde52c3d0e73c5210940c743e135f68e8d1386a). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211750648 Thanks. Have updated the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14678][SQL]Add a file sink log to suppo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12435#issuecomment-211750061 **[Test build #56197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56197/consoleFull)** for PR 12435 at commit [`e8c14d6`](https://github.com/apache/spark/commit/e8c14d60deb1c068f770d7ff3fc9bef000aff899). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-211750023 **[Test build #56198 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56198/consoleFull)** for PR 11812 at commit [`ecde52c`](https://github.com/apache/spark/commit/ecde52c3d0e73c5210940c743e135f68e8d1386a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12353#discussion_r60177331 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -142,16 +139,54 @@ case class CaseWhen(branches: Seq[(Expression, Expression)], elseValue: Option[E } } - def shouldCodegen: Boolean = { -branches.length < CaseWhen.MAX_NUM_CASES_FOR_CODEGEN + override def toString: String = { +val cases = branches.map { case (c, v) => s" WHEN $c THEN $v" }.mkString +val elseCase = elseValue.map(" ELSE " + _).getOrElse("") +"CASE" + cases + elseCase + " END" } + override def sql: String = { +val cases = branches.map { case (c, v) => s" WHEN ${c.sql} THEN ${v.sql}" }.mkString +val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("") +"CASE" + cases + elseCase + " END" + } +} + + +/** + * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END". + * When a = true, returns b; when c = true, returns d; else returns e. + * + * @param branches seq of (branch condition, branch value) + * @param elseValue optional value for the else branch + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END - When a = true, returns b; when c = true, return d; else return e.") +// scalastyle:on line.size.limit +case class CaseWhen( +val branches: Seq[(Expression, Expression)], +val elseValue: Option[Expression] = None) + extends CaseWhenBase(branches, elseValue) with CodegenFallback with Serializable { --- End diff -- maybe just have a toCodegen function that creates CaseWhenCodegen? We can then remove `object CaseWhenCodegen` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12353#discussion_r60177186 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystConf.scala --- @@ -29,6 +29,7 @@ trait CatalystConf { def groupByOrdinal: Boolean def optimizerMaxIterations: Int + def maxCaseBranches: Int --- End diff -- maxCaseBranchesForCodegen? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12492#discussion_r60176644 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -48,8 +49,8 @@ case class Size(child: Expression) extends UnaryExpression with ExpectsInputType */ // scalastyle:off line.size.limit @ExpressionDescription( - usage = "_FUNC_(array(obj1, obj2,...)) - Sorts the input array in ascending order according to the natural ordering of the array elements.", - extended = " > SELECT _FUNC_(array('b', 'd', 'c', 'a'));\n 'a', 'b', 'c', 'd'") + usage = "_FUNC_(array(array, ascendingOrder)) - Sorts the input array in ascending order according to the natural ordering of the array elements.", --- End diff -- this is wrong? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14398] [SQL] Audit non-reserved keyword...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/12191#issuecomment-211741548 The compiler should emit a `tableswitch` instead of a `lookupswitch` when the nonReserved keywords are grouped together; which is a bit faster. I don't think the improvement is large enought to warrant another change and another PR. So lets merge this one and be done. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12353#issuecomment-211740746 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56191/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12353#issuecomment-211740744 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14577][SQL] Add spark.sql.codegen.maxCa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12353#issuecomment-211740602 **[Test build #56191 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56191/consoleFull)** for PR 12353 at commit [`a9294bd`](https://github.com/apache/spark/commit/a9294bdd01c125dcc7a7b232a7b14b476678e731). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Refactor MemoryManager internals to simp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12381#issuecomment-211740371 **[Test build #56189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56189/consoleFull)** for PR 12381 at commit [`5290476`](https://github.com/apache/spark/commit/5290476d7ca6af010fb539f3ae7c69b7fea0c852). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Refactor MemoryManager internals to simp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12381#issuecomment-211740457 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56189/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Refactor MemoryManager internals to simp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12381#issuecomment-211740456 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12492#issuecomment-211740393 **[Test build #56196 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56196/consoleFull)** for PR 12492 at commit [`9238c41`](https://github.com/apache/spark/commit/9238c4186f4ccde0b240ede692598d60bf6bbcfb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12472#issuecomment-211740321 We should be able to remove almost all the methods on InternalAccumulators.scala, shouldn't we? All that includes create, createAll, createShuffleReadAccums, ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14127][SQL][WIP] Describe table
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12460#discussion_r60175987 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -254,6 +251,21 @@ class SparkSqlAstBuilder extends AstBuilder { } } + /** +* A column path can be specified as an parameter to describe command. It is a dot separated +* elements where the last element can be a String. +* TODO - check with Herman --- End diff -- It is a bit more complicates than I thought. We allow strings here because Hive allows us to use the `'$elem'`, `'$keys'` and `'$values'` 'keywords'. That is why I added strings to the rule. I am not sure if we should support this. What do you guys think? This is what I found in the Hive manual: ```SQL DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name[ col_name ( [.field_name] | [.'$elem$'] | [.'$key$'] | [.'$value$'] )* ]; ``` See also: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12472#discussion_r60175966 --- Diff: core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala --- @@ -36,7 +36,7 @@ class StageInfo( val rddInfos: Seq[RDDInfo], val parentIds: Seq[Int], val details: String, -val internalAccumulators: Seq[Accumulator[_]] = Seq.empty, +val taskMetrics: TaskMetrics = null, --- End diff -- agree --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12492#issuecomment-211739747 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12472#discussion_r60175851 --- Diff: core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala --- @@ -36,7 +36,7 @@ class StageInfo( val rddInfos: Seq[RDDInfo], val parentIds: Seq[Int], val details: String, -val internalAccumulators: Seq[Accumulator[_]] = Seq.empty, +val taskMetrics: TaskMetrics = null, --- End diff -- is this only null in tests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12472#discussion_r60175871 --- Diff: core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala --- @@ -36,7 +36,7 @@ class StageInfo( val rddInfos: Seq[RDDInfo], val parentIds: Seq[Int], val details: String, -val internalAccumulators: Seq[Accumulator[_]] = Seq.empty, +val taskMetrics: TaskMetrics = null, --- End diff -- if that's the case, it might be better to always create a taskmetric rather than leave it at null --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12492#issuecomment-211739754 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56195/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12492#issuecomment-211739740 **[Test build #56195 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56195/consoleFull)** for PR 12492 at commit [`67fb4f0`](https://github.com/apache/spark/commit/67fb4f022a1e12dec9d9f467c6fa26f38abbb040). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12472#discussion_r60175818 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -217,21 +170,45 @@ class TaskMetrics private[spark] (initialAccums: Seq[Accumulator[_]]) extends Se */ private[spark] def mergeShuffleReadMetrics(): Unit = synchronized { if (tempShuffleReadMetrics.nonEmpty) { - _shuffleReadMetrics.setMergeValues(tempShuffleReadMetrics) + shuffleReadMetrics.setMergeValues(tempShuffleReadMetrics) } } - /** - * Metrics related to shuffle write, defined only in shuffle map stages. - */ - def shuffleWriteMetrics: ShuffleWriteMetrics = _shuffleWriteMetrics + // Only used for test + private[spark] val testAccum = +sys.props.get("spark.testing").map(_ => TaskMetrics.createAccum[Long](TEST_ACCUM)) + + @transient private[spark] lazy val internalAccums: Seq[Accumulable[_, _]] = { --- End diff -- we collect these internal accumulators together, so that it's easier to: 1. register all of them in scheduler. 2. get the internal accumulator info out of given accumulator updates in `TaskMetrics.fromAccumulatorUpdates` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] Fixed the Wrong Description and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12492#issuecomment-211739523 **[Test build #56195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56195/consoleFull)** for PR 12492 at commit [`67fb4f0`](https://github.com/apache/spark/commit/67fb4f022a1e12dec9d9f467c6fa26f38abbb040). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8398 hadoop input/output format advanced...
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/6848#issuecomment-211739456 IMO, this is useful in one way that hadoop configuration need not be a global state. We can have a default set of configuration that we use everywhere as a default. And then in every hadoop related method a user has an alternative to override the default. Binary compatibility will definitely be broken, but source compatibility might not be affected i.e. one might need to recompile the project with newer spark version. As it is asked already, it should be okay for 2.0 ? @andrewor14 ping ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12472#discussion_r60175666 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -268,23 +243,11 @@ private[spark] class ListenerTaskMetrics( } private[spark] object TaskMetrics extends Logging { + import InternalAccumulator._ def empty: TaskMetrics = new TaskMetrics - /** - * Get an accumulator from the given map by name, assuming it exists. - */ - def getAccum[T](accumMap: Map[String, Accumulator[_]], name: String): Accumulator[T] = { -require(accumMap.contains(name), s"metric '$name' is missing") -val accum = accumMap(name) -try { - // Note: we can't do pattern matching here because types are erased by compile time - accum.asInstanceOf[Accumulator[T]] -} catch { - case e: ClassCastException => -throw new SparkException(s"accumulator $name was of unexpected type", e) -} - } + def createAccum[T](name: String): Accumulator[T] = create(name).asInstanceOf[Accumulator[T]] --- End diff -- I'd move the creation of accumulators in here, rather than delegating to InternalAccumulators. Also maybe just have createLongAccumulator and createCollectionAccumulator; then it becomes obvious at the callsite what's going on, and we also don't need to have conditional branches in create. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12259#issuecomment-211739045 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56190/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] Fixed the Typos in Collection Fu...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/12492 [SPARK-12457] Fixed the Typos in Collection Functions What changes were proposed in this pull request? https://github.com/apache/spark/pull/12185 contains the original PR I submitted in https://github.com/apache/spark/pull/10418 However, it misses one of the extended example, a wrong description and a few typos for collection functions. This PR is fix all these issues. How was this patch tested? The existing test cases already cover it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark expressionUpdate Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12492.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12492 commit 67fb4f022a1e12dec9d9f467c6fa26f38abbb040 Author: gatorsmile Date: 2016-04-19T05:36:49Z fixed a few typos in collection functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12259#issuecomment-211739043 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12259#issuecomment-211738829 **[Test build #56190 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56190/consoleFull)** for PR 12259 at commit [`85d1df0`](https://github.com/apache/spark/commit/85d1df0acdca497cc63363783db07701eff93ba6). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class VectorUDT extends UserDefinedType[Vector] ` * ` s\"Can not load in UserDefinedType $` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14712][ML]spark.ml.LogisticRegressionMo...
Github user hujy commented on a diff in the pull request: https://github.com/apache/spark/pull/12491#discussion_r60175227 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -159,7 +159,9 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas @Since("1.2.0") @Experimental class LogisticRegression @Since("1.2.0") ( -@Since("1.4.0") override val uid: String) +@Since("1.4.0") override val uid: String, +@Since("2.0.0") val numFeatures: Int = 0, --- End diff -- I think the values are passed in when user create the object. When user call toString, the values are returned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14712][ML]spark.ml.LogisticRegressionMo...
Github user hujy commented on a diff in the pull request: https://github.com/apache/spark/pull/12491#discussion_r60175230 --- Diff: python/pyspark/mllib/classification.py --- @@ -262,6 +262,8 @@ def load(cls, sc, path): model.setThreshold(threshold) return model +def __repr__(self): +return self._call_java("toString") --- End diff -- ok :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12457] [SQL] Add ExpressionDescription ...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/10418 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12472#discussion_r60174901 --- Diff: core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala --- @@ -217,21 +170,45 @@ class TaskMetrics private[spark] (initialAccums: Seq[Accumulator[_]]) extends Se */ private[spark] def mergeShuffleReadMetrics(): Unit = synchronized { if (tempShuffleReadMetrics.nonEmpty) { - _shuffleReadMetrics.setMergeValues(tempShuffleReadMetrics) + shuffleReadMetrics.setMergeValues(tempShuffleReadMetrics) } } - /** - * Metrics related to shuffle write, defined only in shuffle map stages. - */ - def shuffleWriteMetrics: ShuffleWriteMetrics = _shuffleWriteMetrics + // Only used for test + private[spark] val testAccum = +sys.props.get("spark.testing").map(_ => TaskMetrics.createAccum[Long](TEST_ACCUM)) + + @transient private[spark] lazy val internalAccums: Seq[Accumulable[_, _]] = { --- End diff -- is this here for a reason? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14719] WriteAheadLogBasedBlockHandler s...
Github user dibbhatt commented on the pull request: https://github.com/apache/spark/pull/12484#issuecomment-211734038 Hi @JoshRosen , Isn't this fix is somehow related to the issue discussed here https://github.com/apache/spark/pull/6990. You can refer to the final comments from @andrewor14 https://github.com/apache/spark/pull/6990#issuecomment-120515683 The issue here is , If a block fails to unroll, the ReceivedBlockHandler wont be getting the block id and will never know about the block and will not include it in a future computation. the problem is that if you can't store a block locally, the receiver thinks the block has not been stored anywhere -- even if it has been successfully written to WAL . isn't it ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14595][SQL] add input metrics for FileS...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12352#issuecomment-211733351 **[Test build #56194 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56194/consoleFull)** for PR 12352 at commit [`c265546`](https://github.com/apache/spark/commit/c26554639f4a2615907d7b46af3005ff3f335d08). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests
Github user lresende commented on the pull request: https://github.com/apache/spark/pull/12270#issuecomment-211732862 Ok, I will work with @JoshRosen on the trigger part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14369][SQL] Locality support for FileSc...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12153#discussion_r60174480 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -621,20 +621,40 @@ class HDFSFileCatalog( def getStatus(path: Path): Array[FileStatus] = leafDirToChildrenFiles(path) + private implicit class LocatedFileStatusIterator(iterator: RemoteIterator[LocatedFileStatus]) +extends Iterator[LocatedFileStatus] { + +override def hasNext: Boolean = iterator.hasNext + +override def next(): LocatedFileStatus = iterator.next() + } + private def listLeafFiles(paths: Seq[Path]): mutable.LinkedHashSet[FileStatus] = { if (paths.length >= sqlContext.conf.parallelPartitionDiscoveryThreshold) { HadoopFsRelation.listLeafFilesInParallel(paths, hadoopConf, sqlContext.sparkContext) --- End diff -- Let's also have a test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14369][SQL] Locality support for FileSc...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/12153#discussion_r60174434 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -621,20 +621,40 @@ class HDFSFileCatalog( def getStatus(path: Path): Array[FileStatus] = leafDirToChildrenFiles(path) + private implicit class LocatedFileStatusIterator(iterator: RemoteIterator[LocatedFileStatus]) +extends Iterator[LocatedFileStatus] { + +override def hasNext: Boolean = iterator.hasNext + +override def next(): LocatedFileStatus = iterator.next() + } + private def listLeafFiles(paths: Seq[Path]): mutable.LinkedHashSet[FileStatus] = { if (paths.length >= sqlContext.conf.parallelPartitionDiscoveryThreshold) { HadoopFsRelation.listLeafFilesInParallel(paths, hadoopConf, sqlContext.sparkContext) --- End diff -- Seems we also need to update the `listLeafFiles` that is called by `listLeafFilesInParallel`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12472#discussion_r60174384 --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala --- @@ -36,15 +36,10 @@ private[spark] class TaskContextImpl( override val taskMemoryManager: TaskMemoryManager, localProperties: Properties, @transient private val metricsSystem: MetricsSystem, -initialAccumulators: Seq[Accumulator[_]] = InternalAccumulator.createAll()) +val taskMetrics: TaskMetrics) --- End diff -- add override --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12472#issuecomment-211732260 **[Test build #56193 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56193/consoleFull)** for PR 12472 at commit [`6226058`](https://github.com/apache/spark/commit/622605830643014aac5d0a2f5f30dab567530faf). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13811][SPARK-13836] [SQL] Removed IsNot...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/11649 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12472#discussion_r60174365 --- Diff: core/src/main/scala/org/apache/spark/TaskContext.scala --- @@ -65,7 +65,7 @@ object TaskContext { * An empty task context that does not represent an actual task. --- End diff -- while you are at this, can you document this is only used for testing? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12472#issuecomment-211732276 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56193/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12472#issuecomment-211732273 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/12472#issuecomment-211731808 ready for review :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12270#issuecomment-211731818 We can run them via some trigger phrase though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12270#issuecomment-211731762 They have been flaky and causing other pull requests to fail. That's why we shouldn't run them on Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests
Github user lresende commented on the pull request: https://github.com/apache/spark/pull/12270#issuecomment-211731480 @rxin Let me move them to a specific docker profile. But I would still run them on Jenkins, as the infrastructure is already setup there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14704][CORE] create accumulators in Tas...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12472#issuecomment-211731220 **[Test build #56193 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56193/consoleFull)** for PR 12472 at commit [`6226058`](https://github.com/apache/spark/commit/622605830643014aac5d0a2f5f30dab567530faf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12270#issuecomment-211730704 I don't even think they should run on pull requests. Tests that require extensive external setup (or downloading things) in general are flaky. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests
Github user lresende commented on the pull request: https://github.com/apache/spark/pull/12270#issuecomment-211729901 @rxin, just trying to understand, is the oracle test the only one failing ? Or you are suggesting we move the whole docker based tests to a separate profile that would only run on Jenkins ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14676] Wrap and re-throw Await.result e...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/12433#discussion_r60173470 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -260,7 +260,12 @@ private[spark] class BlockManager( def waitForAsyncReregister(): Unit = { val task = asyncReregisterTask if (task != null) { - Await.ready(task, Duration.Inf) + try { +Await.ready(task, Duration.Inf) --- End diff -- Unrelated to this PR, But waiting for infinite time has a downside, that if this (main)thread blocks then the app running will appear to have hanged with no way to know unless one checks the thread dump somehow. However if it is for finite time duration, an exception is thrown on timeout. In the case `Duration.Inf` there is no exception ever thrown. If I am correct about the above, I am not sure why it is being used widely ? I am just asking so I understand if there is some side to it that I do not understand. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12270#issuecomment-211729460 I just failed to build Spark locally once due to the docker oracle test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14609][SQL] Native support for LOAD DAT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12412#issuecomment-211729217 **[Test build #56192 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56192/consoleFull)** for PR 12412 at commit [`08acf5c`](https://github.com/apache/spark/commit/08acf5c9a2638a94ce16df6fab124d3aeeea13d6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14712][ML]spark.ml.LogisticRegressionMo...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/12491#issuecomment-211729265 Thanks for taking the initiative on this PR - at first glance it seems like this approach might not quite work but its easier to tell with some tests - could you add a test case and run it locally? You add your test in LogisticRegressionSuite.scala for the scala test. As well you may find the linter tools ./dev/lint-scala & ./dev/lint-python to be useful :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14504][SQL] Enable Oracle docker tests
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/12270#issuecomment-211728923 These tests are too flaky. I've already seen a few failures. We should disable them from the normal tests and maybe occasionally running them (via some trigger or just run it once before the release). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14712][ML]spark.ml.LogisticRegressionMo...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/12491#discussion_r60173022 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -159,7 +159,9 @@ private[classification] trait LogisticRegressionParams extends ProbabilisticClas @Since("1.2.0") @Experimental class LogisticRegression @Since("1.2.0") ( -@Since("1.4.0") override val uid: String) +@Since("1.4.0") override val uid: String, +@Since("2.0.0") val numFeatures: Int = 0, --- End diff -- Why are we adding these vals here? Where do they get set from? Are they needed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14712][ML]spark.ml.LogisticRegressionMo...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/12491#discussion_r60172784 --- Diff: python/pyspark/mllib/classification.py --- @@ -262,6 +262,8 @@ def load(cls, sc, path): model.setThreshold(threshold) return model +def __repr__(self): +return self._call_java("toString") --- End diff -- I think Python style asks for two new lines here (try running ./dev/lint-python locally :)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14487][SQL] User Defined Type registrat...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/12259#discussion_r60172565 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/linalg/udt/UDTSuite.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.linalg.udt --- End diff -- ditto --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org