[GitHub] spark pull request: [SPARK-14124] [SQL] Implement Database-related...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12009#discussion_r58168784 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -45,46 +45,135 @@ abstract class NativeDDLCommand(val sql: String) extends RunnableCommand { } +/** + * A command for users to create a new database. + * + * It will issue an error message when the database with the same name already exists, + * unless 'ifNotExists' is true. + * The syntax of using this command in SQL is: + * {{{ + *CREATE DATABASE|SCHEMA [IF NOT EXISTS] database_name + * }}} + */ case class CreateDatabase( databaseName: String, ifNotExists: Boolean, path: Option[String], comment: Option[String], -props: Map[String, String])(sql: String) - extends NativeDDLCommand(sql) with Logging +props: Map[String, String]) + extends RunnableCommand { + + override def run(sqlContext: SQLContext): Seq[Row] = { +val catalog = sqlContext.sessionState.catalog +catalog.createDatabase( + CatalogDatabase( +databaseName, +comment.getOrElse(""), +path.getOrElse(catalog.getDefaultDBPath(databaseName)), +props), + ifNotExists) +Seq.empty[Row] + } --- End diff -- @yhuai I did try it. Actually, the code is done... However, if we create a directory before issuing Hive client API `createDatabase`, we will get the following error message from Hive: ``` Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Database db3 already exists; ``` Just feel free to let me know what I should do next. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/10355#issuecomment-204275787 One very minor comment on imports, and think there are now merge conflicts with latest master that need fixing, subject to those LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14322] [MLlib] Use treeReduce instead o...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/12106#discussion_r58168589 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -451,10 +451,10 @@ final class OnlineLDAOptimizer extends LDAOptimizer { } Iterator((stat, gammaPart)) } -val statsSum: BDM[Double] = stats.map(_._1).reduce(_ += _) +val statsSum: BDM[Double] = stats.map(_._1).treeReduce(_ += _) expElogbetaBc.unpersist() val gammat: BDM[Double] = breeze.linalg.DenseMatrix.vertcat( - stats.map(_._2).reduce(_ ++ _).map(_.toDenseMatrix): _*) + stats.map(_._2).treeReduce(_ ++ _).map(_.toDenseMatrix): _*) --- End diff -- I don't think treeReduce helps here because the partial reduce doesn't reduce the size of the result. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14322] [MLlib] Use treeReduce instead o...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/12106#discussion_r58168561 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -451,10 +451,10 @@ final class OnlineLDAOptimizer extends LDAOptimizer { } Iterator((stat, gammaPart)) } -val statsSum: BDM[Double] = stats.map(_._1).reduce(_ += _) --- End diff -- This doesn't seem right because the first arg is modified in-place, which violates reduce contract. It should be an aggregate (or treeAggregate) instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14303] [ML] [SparkR] Define and use KMe...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/12039#issuecomment-204275158 Merged and updated the JIRA to make it more specific. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14303] [ML] [SparkR] Define and use KMe...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12039 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11262][ML] Unit test for gradient, loss...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9229 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11262][ML] Unit test for gradient, loss...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/9229#issuecomment-204274118 LGTM. Merged into master. Thanks and sorry for the long delay in code review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7425] [ML] spark.ml Predictor should su...
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/10355#discussion_r58168089 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Predictor.scala --- @@ -26,7 +26,7 @@ import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, Row} import org.apache.spark.sql.functions._ -import org.apache.spark.sql.types.{DataType, DoubleType, StructType} +import org.apache.spark.sql.types.{DataType, DoubleType, NumericType, StructType} --- End diff -- minor, but I believe the `NumericType` import is no longer necessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14295][SPARK-14274][SQL] Implements bui...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12088 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13112]CoarsedExecutorBackend register t...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/12078#issuecomment-204273968 I think current Spark code already do care about this race condition issue, though not so elegant. From the description of JIRA, I think a more broad problem is how to avoid task assigning on this slow machine, that might require blacklist mechanism. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14295][SPARK-14274][SQL] Implements bui...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/12088#issuecomment-204273771 Merged into master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13995][SQL] Extract correct IsNotNull c...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11809#issuecomment-204269003 @sameeragarwal Thanks for reviewing. Waiting for @marmbrus checking this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14191][SQL] Remove invalid Expand opera...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11995#issuecomment-204268904 ping @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14251][SQL] Add SQL command for printin...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12099#discussion_r58166528 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala --- @@ -101,4 +101,19 @@ class HiveExplainSuite extends QueryTest with SQLTestUtils with TestHiveSingleto "Physical Plan should not contain Subquery since it's eliminated by optimizer") } } + + test("SPARK-14251: EXPLAIN CODEGEN command") { --- End diff -- Ok. Regarding the name of the test. I would typically only add the JIRA number to test name if it fixes a bug (that needs some explanation). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14251][SQL] Add SQL command for printin...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/12099#discussion_r58166375 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -576,7 +576,7 @@ frameBound explainOption -: LOGICAL | FORMATTED | EXTENDED +: LOGICAL | FORMATTED | EXTENDED | CODEGEN --- End diff -- Yeah it is fixed. He added it to the nonReserved rule (see below). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-204267311 **[Test build #2721 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2721/consoleFull)** for PR 12105 at commit [`6fd07db`](https://github.com/apache/spark/commit/6fd07db11b5c9eed795dde11177f1c245a6fef16). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14251][SQL] Add SQL command for printin...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12099#discussion_r58165643 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala --- @@ -237,15 +237,18 @@ case class ExplainCommand( logicalPlan: LogicalPlan, override val output: Seq[Attribute] = Seq(AttributeReference("plan", StringType, nullable = true)()), -extended: Boolean = false) +extended: Boolean = false, +codegen: Boolean = false) --- End diff -- maybe we should have a separate command? otherwise we are going to add a lot of parameters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14251][SQL] Add SQL command for printin...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12099#discussion_r58165608 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -576,7 +576,7 @@ frameBound explainOption -: LOGICAL | FORMATTED | EXTENDED +: LOGICAL | FORMATTED | EXTENDED | CODEGEN --- End diff -- @hvanhovell does this fix the problem already? i wasn't sure what was before your comment vs after. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14251][SQL] Add SQL command for printin...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12099#discussion_r58165593 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala --- @@ -101,4 +101,19 @@ class HiveExplainSuite extends QueryTest with SQLTestUtils with TestHiveSingleto "Physical Plan should not contain Subquery since it's eliminated by optimizer") } } + + test("SPARK-14251: EXPLAIN CODEGEN command") { --- End diff -- it's really long -- i think having the first couple lines are good enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14251][SQL] Add SQL command for printin...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/12099#discussion_r58165570 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala --- @@ -81,29 +100,11 @@ package object debug { * WholeStageCodegen subtree). */ def debugCodegen(): Unit = { - debugPrint(debugCodegenString()) + debugPrint(codegenString(query.queryExecution.executedPlan)) } /** Visible for testing. */ -def debugCodegenString(): String = { - val plan = query.queryExecution.executedPlan - val codegenSubtrees = new collection.mutable.HashSet[WholeStageCodegen]() - plan transform { -case s: WholeStageCodegen => - codegenSubtrees += s - s -case s => s - } - var output = s"Found ${codegenSubtrees.size} WholeStageCodegen subtrees.\n" - for ((s, i) <- codegenSubtrees.toSeq.zipWithIndex) { -output += s"== Subtree ${i + 1} / ${codegenSubtrees.size} ==\n" -output += s -output += "\nGenerated code:\n" -val (_, source) = s.doCodeGen() -output += s"${CodeFormatter.format(source)}\n" - } - output -} +def debugCodegenString(): String = codegenString(query.queryExecution.executedPlan) --- End diff -- do we still need this function? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14070] [SQL] Use ORC data source for SQ...
Github user tejasapatil commented on the pull request: https://github.com/apache/spark/pull/11891#issuecomment-204259326 ping @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13432][SQL] add the source file name an...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/11301#issuecomment-204257179 @kiszk Now I'm inspecting this patch and I noticed some defects so far. 1. I ran a job like as follows. ``` sc.parallelize(1 to 1, 1).toDF.show ``` And then, I got generated code like as follows. ``` /* 006 */ class SpecificSafeProjection extends org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection { /* 007 */ /* 008 */ private Object[] references; /* 009 */ private MutableRow mutableRow; /* 010 */ private org.apache.spark.sql.types.StructType schema; /* 011 */ /* 012 */ /* 013 */ public SpecificSafeProjection(Object[] references) { /* 014 */ this.references = references; /* 015 */ mutableRow = (MutableRow) references[references.length - 1]; /* 016 */ this.schema = (org.apache.spark.sql.types.StructType) references[0]; /* 017 */ } /* 018 */ /* 019 */ public java.lang.Object apply(java.lang.Object _i) { /* 020 */ InternalRow i = (InternalRow) _i; /* 021 */ /* createexternalrow(if (isnull(input[0, int])) null else input[0, int], StructField(value,IntegerType,false)) @ selectExpr at :27 */ /* 022 */ boolean isNull = false; /* 023 */ final Object[] values = new Object[1]; ``` The code above is about `SpecificSafeProjection` generated by `boundTEncoder.fromRow` called in `Dataset#collect`. I think the code above is not directly related to query or DSL written by application developers so It's funny that call sites are in the comments. 2. When columns are described in DSL style like $"column", call sites in comment can be `$`. You will reproduce it by the code bellow. ``` sc.parallelize(1 to 1, 1).select($"value").show ``` 3. Call sites of multiple jobs can be scrambled. For example, you can reproduce by following code. ``` val df = sc.parallelize(1 to 1, 1).toDF df.select($"value").show df.selectExpr("value").show ``` After you run the second job above, you can see generated code as follows. ``` /* 006 */ class SpecificSafeProjection extends org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection { /* 007 */ /* 008 */ private Object[] references; /* 009 */ private MutableRow mutableRow; /* 010 */ private org.apache.spark.sql.types.StructType schema; /* 011 */ /* 012 */ /* 013 */ public SpecificSafeProjection(Object[] references) { /* 014 */ this.references = references; /* 015 */ mutableRow = (MutableRow) references[references.length - 1]; /* 016 */ this.schema = (org.apache.spark.sql.types.StructType) references[0]; /* 017 */ } /* 018 */ /* 019 */ public java.lang.Object apply(java.lang.Object _i) { /* 020 */ InternalRow i = (InternalRow) _i; /* 021 */ /* createexternalrow(if (isnull(input[0, int])) null else input[0, int], StructField(value,IntegerType,false)) @ selectExpr at :27 */ /* 022 */ boolean isNull = false; /* 023 */ final Object[] values = new Object[1]; /* 024 */ /* if (isnull(input[0, int])) null else input[0, int] @ selectExpr at :27 */ ``` As you can see, `select` and `selectExpr` appear in the comment above although `select` is not used in the second job. Another case is here. ``` val df = sc.parallelize(1 to 1, 1).toDF df.select($"value" + 1).show df.selectExpr("value + 2").show ... /* 009 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { ... /* 034 */ /*** CONSUME: Project [(value#1 + 2) AS (value + 2)#7] */ /* 035 */ /* 036 */ /*** CONSUME: WholeStageCodegen */ /* 037 */ /* 038 */ /* (input[0, int] + 2) @ selectExpr at :27 */ /* 039 */ /* input[0, int] @ select at :27 */ /* 040 */ /* input[0, int] @ selectExpr at :27 */ /* 041 */ int inputadapter_value = inputadapter_row.getInt(0); ... /* 050 */ } ``` 4. As I mentioned before, could you make `TreeNode` serializable? >Sorry, I said we don't need to make TreeNode serializable but it's needed otherwise origin is not serialized and origin information like callsite is not in the comment in the generated code when WholeStageCodegen is disabled. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spa
[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/11978#issuecomment-204255875 @rajeshbalamohan you need to clean `sc.hadoopRDD` too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11978#issuecomment-204255888 **[Test build #54687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54687/consoleFull)** for PR 11978 at commit [`0c53ed2`](https://github.com/apache/spark/commit/0c53ed23e3fb24cc5d882272ddca629843005629). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/11978#issuecomment-204255624 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13674][SQL] Add wholestage codegen supp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11517#issuecomment-204251175 **[Test build #54686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54686/consoleFull)** for PR 11517 at commit [`ef588db`](https://github.com/apache/spark/commit/ef588db4a8f8d9d742225c16dbad9d8cb17e2c71). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14320][SQL] Make ColumnarBatch.Row muta...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/12103#discussion_r58162424 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java --- @@ -232,6 +233,56 @@ public MapData getMap(int ordinal) { public Object get(int ordinal, DataType dataType) { throw new NotImplementedException(); } + +@Override +public void setNullAt(int ordinal) { + columns[ordinal].putNull(rowId); --- End diff -- Is it better to add checks whether ColumnVector.isConstant == false? cc @nongli --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12864][YARN] initialize executorIdCount...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10794#issuecomment-204248244 **[Test build #54685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54685/consoleFull)** for PR 10794 at commit [`ebe3c7f`](https://github.com/apache/spark/commit/ebe3c7f290929588c822137b8bf27b18fe75393f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13674][SQL] Add wholestage codegen supp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11517#discussion_r58161386 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -61,6 +63,14 @@ public long durationMs() { public abstract void init(Iterator iters[]); /** + * Initializes from array of iterators of InternalRow. + */ + public void init(int index, Iterator iters[]) { +partitionIndex = index; --- End diff -- ah. ok. Thanks! Not aware of that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14303] [ML] [SparkR] Define and use KMe...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/12039#issuecomment-204241139 LGTM. Having some issues with the merge script. I will merge it later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14322] [MLlib] Use treeReduce instead o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12106#issuecomment-204240413 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54684/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14322] [MLlib] Use treeReduce instead o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12106#issuecomment-204240409 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14322] [MLlib] Use treeReduce instead o...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12106#issuecomment-204240344 **[Test build #54684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54684/consoleFull)** for PR 12106 at commit [`38cf0f3`](https://github.com/apache/spark/commit/38cf0f3cc90d09435f12ad86021469eff32985db). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13674][SQL] Add wholestage codegen supp...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11517#discussion_r58160147 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -61,6 +63,14 @@ public long durationMs() { public abstract void init(Iterator iters[]); /** + * Initializes from array of iterators of InternalRow. + */ + public void init(int index, Iterator iters[]) { +partitionIndex = index; --- End diff -- You can get the partition id via TaskContext.partitionId --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14255] [SQL] Streaming Aggregation
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/12048#discussion_r58160046 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/package.scala --- @@ -28,37 +28,36 @@ package object state { implicit class StateStoreOps[T: ClassTag](dataRDD: RDD[T]) { /** Map each partition of a RDD along with data in a [[StateStore]]. */ -def mapPartitionWithStateStore[U: ClassTag]( -storeUpdateFunction: (StateStore, Iterator[T]) => Iterator[U], +def mapPartitionsWithStateStore[U: ClassTag]( --- End diff -- nit: isnt it technically more correct to say `mapPartitionsWithStateStore*s*` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14255] [SQL] Streaming Aggregation
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/12048#discussion_r58160004 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala --- @@ -0,0 +1,72 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.streaming + +import org.apache.spark.sql.SQLContext +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.execution.{QueryExecution, SparkPlan, SparkPlanner, UnaryNode} + +/** + * A variant of [[QueryExecution]] that allows the execution of the given [[LogicalPlan]] + * plan incrementally. Possibly preserving state in between each execution. + */ +class IncrementalExecution( +ctx: SQLContext, +logicalPlan: LogicalPlan, +checkpointLocation: String, +currentBatchId: Long) extends QueryExecution(ctx, logicalPlan) { + + // TODO: make this always part of planning. + val stateStrategy = sqlContext.sessionState.planner.StatefulAggregationStrategy :: Nil + + // Modified planner with stateful operations. + override def planner: SparkPlanner = +new SparkPlanner( + sqlContext.sparkContext, + sqlContext.conf, + stateStrategy) + + /** + * Records the current id for a given stateful operator in the query plan as the `state` + * preperation walks the query plan. + */ + private var operatorId = 0 + + /** Locates save/restore pairs surrounding aggregation. */ + val state = new Rule[SparkPlan] { +override def apply(plan: SparkPlan): SparkPlan = plan transform { + case StateStoreSave(keys, None, + UnaryNode(agg, + StateStoreRestore(keys2, None, child))) => +val stateId = OperatorStateId(checkpointLocation, operatorId, currentBatchId - 1) +operatorId += 1 + +StateStoreSave( + keys, + Some(stateId), + agg.withNewChildren( +StateStoreRestore( + keys, + Some(stateId), + child) :: Nil)) --- End diff -- nit: maybe its me, but this nested tree is a little hard to read --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14255] [SQL] Streaming Aggregation
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/12048#discussion_r58159893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/utils.scala --- @@ -261,4 +246,90 @@ object Utils { finalAndCompleteAggregate :: Nil } + + /** + * Plans a streaming aggregation using the following progression: + * - Partial Aggregation + * - Shuffle + * - Partial Merge (now there is at most 1 tuple per group) + * - StateStoreRestore (now there is 1 tuple from this batch + optionally one from the previous) + * - PartialMerge (now there is at most 1 tuple per group) + * - StateStoreSave (saves the tuple for the next batch) + * - Complete (output the current result of the aggregation) --- End diff -- is this Complete or Final? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13674][SQL] Add wholestage codegen supp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11517#discussion_r58159707 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -61,6 +63,14 @@ public long durationMs() { public abstract void init(Iterator iters[]); /** + * Initializes from array of iterators of InternalRow. + */ + public void init(int index, Iterator iters[]) { +partitionIndex = index; --- End diff -- That said, we add new API called `zipPartitionsWithIndex`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14255] [SQL] Streaming Aggregation
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/12048#discussion_r58159654 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StreamTest.scala --- @@ -224,11 +239,8 @@ trait StreamTest extends QueryTest with Timeouts { """.stripMargin def verify(condition: => Boolean, message: String): Unit = { - try { -Assertions.assert(condition) - } catch { -case NonFatal(e) => - failTest(message, e) + if (!condition) { --- End diff -- i had written this in this way so that if there are any errors in the lazy eval of `condition` that gets caught and message printed correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13674][SQL] Add wholestage codegen supp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11517#discussion_r58159480 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -61,6 +63,14 @@ public long durationMs() { public abstract void init(Iterator iters[]); /** + * Initializes from array of iterators of InternalRow. + */ + public void init(int index, Iterator iters[]) { +partitionIndex = index; --- End diff -- We can slightly change `ZippedPartitionsRDD2` etc. to pass partition index into the given closure. As: rdds.head.zipPartitions(rdds(1)) { (index, leftIter, rightIter) => ... } Do you think it is doable and makes sense? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13674][SQL] Add wholestage codegen supp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11517#discussion_r58159371 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -61,6 +63,14 @@ public long durationMs() { public abstract void init(Iterator iters[]); /** + * Initializes from array of iterators of InternalRow. + */ + public void init(int index, Iterator iters[]) { +partitionIndex = index; --- End diff -- One problem is, as we consume two RDDs with the codes like: rdds.head.zipPartitions(rdds(1)) { (leftIter, rightIter) => ... } We can't obtain and pass partition index as 1 RDD case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13674][SQL] Add wholestage codegen supp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11517#discussion_r58158879 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -61,6 +63,14 @@ public long durationMs() { public abstract void init(Iterator iters[]); /** + * Initializes from array of iterators of InternalRow. + */ + public void init(int index, Iterator iters[]) { +partitionIndex = index; --- End diff -- I got your point. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14322] [MLlib] Use treeReduce instead o...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12106#issuecomment-204229682 **[Test build #54684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54684/consoleFull)** for PR 12106 at commit [`38cf0f3`](https://github.com/apache/spark/commit/38cf0f3cc90d09435f12ad86021469eff32985db). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14322] [MLlib] Use treeReduce instead o...
GitHub user hhbyyh opened a pull request: https://github.com/apache/spark/pull/12106 [SPARK-14322] [MLlib] Use treeReduce instead of reduce in OnlineLDAOptimizer ## What changes were proposed in this pull request? OnlineLDAOptimizer uses RDD.reduce in two places where it could use treeReduce. This can cause scalability issues. This should be an easy fix. ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/hhbyyh/spark ldaTreeReduce Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12106.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12106 commit 38cf0f3cc90d09435f12ad86021469eff32985db Author: Yuhao Yang Date: 2016-04-01T03:17:43Z change reduce to treeReduce for lda --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14137] [SQL] Cleanup hash join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12102#issuecomment-204228061 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54683/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-204228144 Jenkins, OK to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14137] [SQL] Cleanup hash join
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12102#issuecomment-204228059 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid copy in com...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12038 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13674][SQL] Add wholestage codegen supp...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11517#discussion_r58157898 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -61,6 +63,14 @@ public long durationMs() { public abstract void init(Iterator iters[]); /** + * Initializes from array of iterators of InternalRow. + */ + public void init(int index, Iterator iters[]) { +partitionIndex = index; --- End diff -- partttion id is always -1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13674][SQL] Add wholestage codegen supp...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11517#discussion_r58157878 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -61,6 +63,14 @@ public long durationMs() { public abstract void init(Iterator iters[]); /** + * Initializes from array of iterators of InternalRow. + */ + public void init(int index, Iterator iters[]) { +partitionIndex = index; --- End diff -- No failure, but the seed is not setup as expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14137] [SQL] Cleanup hash join
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12102#issuecomment-204227840 **[Test build #54683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54683/consoleFull)** for PR 12102 at commit [`37724be`](https://github.com/apache/spark/commit/37724bef4c87186ac151fcbb87601880365c7113). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid copy in com...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/12038#issuecomment-204227663 Merging to master. Thanks, @liyezhang556520 ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12105#issuecomment-204227649 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14321. [SQL] Reduce date format cost and...
GitHub user rajeshbalamohan opened a pull request: https://github.com/apache/spark/pull/12105 SPARK-14321. [SQL] Reduce date format cost and string-to-date cost i… ## What changes were proposed in this pull request? Here is the generated code snippet when executing date functions. SimpleDateFormat is fairly expensive and can show up bottleneck when processing millions of records. It would be better to instantiate it once. ``` /* 066 */ UTF8String primitive5 = null; /* 067 */ if (!isNull4) { /* 068 */ try { /* 069 */ primitive5 = UTF8String.fromString(new java.text.SimpleDateFormat("-MM-dd HH:mm:ss").format( /* 070 */ new java.util.Date(primitive7 * 1000L))); /* 071 */ } catch (java.lang.Throwable e) { /* 072 */ isNull4 = true; /* 073 */ } /* 074 */ } ``` With modified code, here is the generated code ``` /* 010 */ private java.text.SimpleDateFormat sdf2; /* 011 */ private UnsafeRow result13; /* 012 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder bufferHolder14; /* 013 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter15; /* 014 */ ... ... /* 065 */ boolean isNull0 = isNull3; /* 066 */ UTF8String primitive1 = null; /* 067 */ if (!isNull0) { /* 068 */ try { /* 069 */ if (sdf2 == null) { /* 070 */ sdf2 = new java.text.SimpleDateFormat("-MM-dd HH:mm:ss"); /* 071 */ } /* 072 */ primitive1 = UTF8String.fromString(sdf2.format( /* 073 */ new java.util.Date(primitive4 * 1000L))); /* 074 */ } catch (java.lang.Throwable e) { /* 075 */ isNull0 = true; /* 076 */ } /* 077 */ } ``` Similarly Calendar.getInstance was used in DateTimeUtils which can be lazily inited. ## How was this patch tested? org.apache.spark.sql.catalyst.expressions.DateExpressionsSuite,org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite Also tried with couple of sample SQL queries with single executor (6GB) which showed good improvement with the fix. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rajeshbalamohan/spark SPARK-14321 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12105.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12105 commit 6fd07db11b5c9eed795dde11177f1c245a6fef16 Author: Rajesh Balamohan Date: 2016-04-01T02:41:07Z SPARK-14321. [SQL] Reduce date format cost and string-to-date cost in date functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3724][ML] RandomForest: More options fo...
Github user sethah commented on the pull request: https://github.com/apache/spark/pull/11989#issuecomment-204225070 @yongtang I believe only Spark committers can do that. Maybe @jkbradley could help? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13674][SQL] Add wholestage codegen supp...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/11517#discussion_r58157092 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/BufferedRowIterator.java --- @@ -61,6 +63,14 @@ public long durationMs() { public abstract void init(Iterator iters[]); /** + * Initializes from array of iterators of InternalRow. + */ + public void init(int index, Iterator iters[]) { +partitionIndex = index; --- End diff -- Not sure about this. I just ran test like: test("sort merge join/sample") { val N = 2 << 20 runBenchmark("sort merge join", N) { val df1 = sqlContext.range(N) .selectExpr(s"(id * 15485863) % ${N*10} as k1") val df2 = sqlContext.range(N) .selectExpr(s"(id * 15485867) % ${N*10} as k2") df1.join(df2, col("k1") === col("k2")).sample(true, 0.2).count() } I didn't see failure happened. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204221481 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54682/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204221480 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204221322 **[Test build #54682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54682/consoleFull)** for PR 11805 at commit [`a604078`](https://github.com/apache/spark/commit/a604078f339416f60c659ba39609d0ba830c8884). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204219686 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54681/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204219684 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204219598 **[Test build #54681 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54681/consoleFull)** for PR 11805 at commit [`fde020f`](https://github.com/apache/spark/commit/fde020f32d1d03b99cb58187f65f38e16b88b0fd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/12036#issuecomment-204217824 @yhuai OK. Please let me know if any changes are not clear to you. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/12036#issuecomment-204208195 @viirya Thank you for working on it. I think the overall approach makes sense. Looks like we can adjust how we organize SessionState, SessionCatalog, and FunctionRegistry to improve the structure of the code (@andrewor14 also mentioned about it). I'd like to play with it based on this PR. Would you mind to hold off your future changes until I send out commits based on your current branch? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12083#issuecomment-204207886 Hi @vanzin , the memory copy place is given out by @zsxwing , the call stack is as follows: ``` at java.nio.Bits.copyFromArray(Bits.java:754) at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:371) at java.nio.DirectByteBuffer.put(DirectByteBuffer.java:342) at sun.nio.ch.IOUtil.write(IOUtil.java:60) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:466) - locked <0x7f8a8a28d400> (a java.lang.Object) at org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:131) at org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:114) ``` The whole buffer copy is in line http://www.grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/nio/ch/IOUtil.java#60, but the buffer cannot be totally written if its side more than the available underlying buffer. Which is in line http://www.grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/sun/nio/ch/IOUtil.java#65. So each time we will make a copy of the input `ByteBuf`, and write only a part of it if the input size is big relatively. This results in multiply copies of the input `ByteBuf` that is not necessary. The method of handling the issue in this PR is the same as that in Hadoop, please refer to https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java#L2957 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13825][CORE] Upgrade to Scala 2.11.8
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11681#issuecomment-204204935 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13825][CORE] Upgrade to Scala 2.11.8
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11681#issuecomment-204204937 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54679/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3724][ML] RandomForest: More options fo...
Github user yongtang commented on the pull request: https://github.com/apache/spark/pull/11989#issuecomment-204204858 Hi @sethah by the way, could you help start a Jenkins test if possible? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13825][CORE] Upgrade to Scala 2.11.8
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11681#issuecomment-204204607 **[Test build #54679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54679/consoleFull)** for PR 11681 at commit [`3e3356c`](https://github.com/apache/spark/commit/3e3356c84328aba025a98b9e3959c3b9bad9d185). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3724][ML] RandomForest: More options fo...
Github user yongtang commented on the pull request: https://github.com/apache/spark/pull/11989#issuecomment-204204359 Hi @sethah thanks for the review. There are some issues with the regex but managed to get it done. The test also has been moved to ML. Let me know if there are any other issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14137] [SQL] Cleanup hash join
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12102#issuecomment-204204056 **[Test build #54683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54683/consoleFull)** for PR 12102 at commit [`37724be`](https://github.com/apache/spark/commit/37724bef4c87186ac151fcbb87601880365c7113). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12036#issuecomment-204204179 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12036#issuecomment-204204182 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54680/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14123][SQL] Implement function related ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12036#issuecomment-204203643 **[Test build #54680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54680/consoleFull)** for PR 12036 at commit [`05709f0`](https://github.com/apache/spark/commit/05709f09cfa5fb171e926c839929135b1cda112e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-14113. Consider marking JobConf closure-...
Github user rajeshbalamohan commented on the pull request: https://github.com/apache/spark/pull/11978#issuecomment-204200260 Thanks @andrewor14 . Addressed your review comments in latest commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-529] [sql] Modify SQLConf to use new co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11570#issuecomment-204199961 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54675/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-529] [sql] Modify SQLConf to use new co...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11570#issuecomment-204199960 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14242][CORE][Network] avoid copy in com...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/12038#issuecomment-204199865 @zsxwing , I updated the commit description. Thank you @zsxwing and @vanzin for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-529] [sql] Modify SQLConf to use new co...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11570#issuecomment-204199841 **[Test build #54675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54675/consoleFull)** for PR 11570 at commit [`de594b6`](https://github.com/apache/spark/commit/de594b6aa6d1d28fd86b9ca9c335df7b0c6e2c49). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14224] [SPARK-14223] [SQL] fix RowEncod...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12047#issuecomment-204197995 **[Test build #2720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2720/consoleFull)** for PR 12047 at commit [`872ecf5`](https://github.com/apache/spark/commit/872ecf5341138f075fcc5e8c0caedbd0d1d0e8fd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Backport [SPARK-11327] [MESOS] Dispatcher does...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12101#issuecomment-204197800 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54672/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Backport [SPARK-11327] [MESOS] Dispatcher does...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12101#issuecomment-204197799 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Backport [SPARK-11327] [MESOS] Dispatcher does...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12101#issuecomment-204197711 **[Test build #54672 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54672/consoleFull)** for PR 12101 at commit [`8cd2a24`](https://github.com/apache/spark/commit/8cd2a247de3a8408bdc0b71a943ec977f3269ba1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14323] [SQL] fix the show functions by ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12104#issuecomment-204197089 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14323] [SQL] fix the show functions by ...
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/12104 [SPARK-14323] [SQL] fix the show functions by using catalog listFunctions ## What changes were proposed in this pull request? The syntax of "SHOW FUNCTIONS" can be found here: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ShowFunctions) By leveraging catalog.listFunctions(), we can get the proper list of functions with proper regex. ## How was this patch tested? I have added a test case for this issue. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark SPARK-14323 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/12104.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #12104 commit d6040d3c83a5fd2f10d5d139853b194157b44dd2 Author: bomeng Date: 2016-04-01T01:05:02Z fix the show functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14294] [SQL] native support alter table...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/12086#issuecomment-204194142 Thanks @bomeng, I will post the PR shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11262][ML] Unit test for gradient, loss...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9229#issuecomment-204193466 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54671/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204193556 **[Test build #54682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54682/consoleFull)** for PR 11805 at commit [`a604078`](https://github.com/apache/spark/commit/a604078f339416f60c659ba39609d0ba830c8884). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11262][ML] Unit test for gradient, loss...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9229#issuecomment-204193464 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14251][SQL] Add SQL command for printin...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/12099#issuecomment-204193533 Hi, @rxin . Could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14320][SQL] Make ColumnarBatch.Row muta...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12103#issuecomment-204193375 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14320][SQL] Make ColumnarBatch.Row muta...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12103#issuecomment-204193378 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54676/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11262][ML] Unit test for gradient, loss...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9229#issuecomment-204193319 **[Test build #54671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54671/consoleFull)** for PR 9229 at commit [`94dcec0`](https://github.com/apache/spark/commit/94dcec08b5bf7cb1af054e7e27b258ab0ce870a9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14320][SQL] Make ColumnarBatch.Row muta...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12103#issuecomment-204193208 **[Test build #54676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54676/consoleFull)** for PR 12103 at commit [`a1fe9f8`](https://github.com/apache/spark/commit/a1fe9f83c182ea02b6c9a8825ba90f10a5e6d638). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` public static final class Row extends MutableRow ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204192959 **[Test build #54681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54681/consoleFull)** for PR 11805 at commit [`fde020f`](https://github.com/apache/spark/commit/fde020f32d1d03b99cb58187f65f38e16b88b0fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204192677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54678/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204192674 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13992] Add support for off-heap caching
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11805#issuecomment-204192578 **[Test build #54678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54678/consoleFull)** for PR 11805 at commit [`61920a9`](https://github.com/apache/spark/commit/61920a913ff6fc65d712f12ad565ed52bfb6769e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14251][SQL] Add SQL command for printin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12099#issuecomment-204191621 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54673/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org