[GitHub] spark pull request #16345: [SPARK-17755][Core]Use workerRef to send Register...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16345 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70584/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #70584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70584/testReport)** for PR 13909 at commit [`69d5e33`](https://github.com/apache/spark/commit/69d5e33d2035fc5f6f4dfec65bde60c7dfc39548). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16345: [SPARK-17755][Core]Use workerRef to send RegisterWorkerR...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16345 Thanks. Merging to master. This seems risky to backport to other branches. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16402: [SPARK-18999][SQL][minor] simplify Literal codegen
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16402 **[Test build #70585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70585/testReport)** for PR 16402 at commit [`6010857`](https://github.com/apache/spark/commit/60108571773d9d196cd491512a5cbcd01d878afa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16402: [SPARK-18999][SQL][minor] simplify Literal codegen
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16402 cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16402: [SPARK-18999][SQL][minor] simplify Literal codege...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/16402 [SPARK-18999][SQL][minor] simplify Literal codegen ## What changes were proposed in this pull request? `Literal` can use `CodegenContex.addReferenceObj` to implement codegen, instead of `CodegenFallback`. This can also simplify the generated code a little bit, before we will generate: `((Expression) references[1]).eval(null)`, now it's just `references[1]`. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark minor Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16402.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16402 commit 60108571773d9d196cd491512a5cbcd01d878afa Author: Wenchen FanDate: 2016-12-26T07:22:32Z simplify Literal codegen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #70584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70584/testReport)** for PR 13909 at commit [`69d5e33`](https://github.com/apache/spark/commit/69d5e33d2035fc5f6f4dfec65bde60c7dfc39548). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93847467 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -133,49 +209,26 @@ case class CreateMap(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName val mapClass = classOf[ArrayBasedMapData].getName -val keyArray = ctx.freshName("keyArray") -val valueArray = ctx.freshName("valueArray") -ctx.addMutableState("Object[]", keyArray, s"this.$keyArray = null;") -ctx.addMutableState("Object[]", valueArray, s"this.$valueArray = null;") - -val keyData = s"new $arrayClass($keyArray)" -val valueData = s"new $arrayClass($valueArray)" -ev.copy(code = s""" - $keyArray = new Object[${keys.size}]; - $valueArray = new Object[${values.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -keys.zipWithIndex.map { case (key, i) => - val eval = key.genCode(ctx) - s""" -${eval.code} -if (${eval.isNull}) { - throw new RuntimeException("Cannot use null as map key!"); -} else { - $keyArray[$i] = ${eval.value}; -} - """ -}) + - ctx.splitExpressions( -ctx.INPUT_ROW, -values.zipWithIndex.map { case (value, i) => - val eval = value.genCode(ctx) - s""" -${eval.code} -if (${eval.isNull}) { - $valueArray[$i] = null; -} else { - $valueArray[$i] = ${eval.value}; -} - """ -}) + +val MapType(keyDt, valueDt, _) = dataType +val evalKeys = keys.map(e => e.genCode(ctx)) +val evalValues = values.map(e => e.genCode(ctx)) +val (preprocessKeyData, assignKeys, postprocessKeyData, keyArrayData, keyArray) = + GenArrayData.genCodeToCreateArrayData(ctx, keyDt, evalKeys, false) +val (preprocessValueData, assignValues, postprocessValueData, valueArrayData, valueArray) = --- End diff -- Oh, good catch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15664#discussion_r93847101 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -112,7 +112,25 @@ object JdbcUtils extends Logging { */ def insertStatement(conn: Connection, table: String, rddSchema: StructType, dialect: JdbcDialect) : PreparedStatement = { -val columns = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name)).mkString(",") +// Use database column names instead of RDD schema column names +val tableSchemaQuery = conn.prepareStatement(dialect.getSchemaQuery(table)) +var columns: String = "" +try { + val tableSchema = getSchema(tableSchemaQuery.executeQuery(), dialect) + val nameMap = tableSchema.fields.map(f => f.name -> f.name).toMap + val lowercaseNameMap = tableSchema.fields.map(f => f.name.toLowerCase -> f.name).toMap + columns = rddSchema.fields.map { x => +if (nameMap.isDefinedAt(x.name)) { + dialect.quoteIdentifier(x.name) +} else if (lowercaseNameMap.isDefinedAt(x.name.toLowerCase)) { + dialect.quoteIdentifier(lowercaseNameMap(x.name.toLowerCase)) +} else { + throw new SQLException(s"""Column "${x.name}" not found""") +} + }.mkString(",") +} finally { + tableSchemaQuery.close() +} val placeholders = rddSchema.fields.map(_ => "?").mkString(",") val sql = s"INSERT INTO $table ($columns) VALUES ($placeholders)" conn.prepareStatement(sql) --- End diff -- Can we build the INSERT SQL statement in `saveTable` based on the schema? No need to prepare the generated statement in `saveTable`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15664#discussion_r93846787 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -112,7 +112,25 @@ object JdbcUtils extends Logging { */ def insertStatement(conn: Connection, table: String, rddSchema: StructType, dialect: JdbcDialect) : PreparedStatement = { -val columns = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name)).mkString(",") +// Use database column names instead of RDD schema column names +val tableSchemaQuery = conn.prepareStatement(dialect.getSchemaQuery(table)) +var columns: String = "" +try { + val tableSchema = getSchema(tableSchemaQuery.executeQuery(), dialect) + val nameMap = tableSchema.fields.map(f => f.name -> f.name).toMap + val lowercaseNameMap = tableSchema.fields.map(f => f.name.toLowerCase -> f.name).toMap + columns = rddSchema.fields.map { x => +if (nameMap.isDefinedAt(x.name)) { + dialect.quoteIdentifier(x.name) +} else if (lowercaseNameMap.isDefinedAt(x.name.toLowerCase)) { + dialect.quoteIdentifier(lowercaseNameMap(x.name.toLowerCase)) +} else { + throw new SQLException(s"""Column "${x.name}" not found""") +} --- End diff -- The name resolution should be still controlled by `spark.sql.caseSensitive`, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15819 https://github.com/apache/spark/pull/16399 has been merged, feel free if you wanna backport to 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13909 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70578/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #70578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70578/testReport)** for PR 13909 at commit [`293b344`](https://github.com/apache/spark/commit/293b344e761bc4b9c04891c02c702a374472345a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16399 thanks, merging to 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16401#discussion_r93846292 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -642,6 +642,13 @@ object SQLConf { .doubleConf .createWithDefault(0.05) + val CBO_ENABLED = +SQLConfigBuilder("spark.sql.cbo.enabled") + .internal() + .doc("Enables CBO for estimation of plan statistics when set true.") + .booleanConf + .createWithDefault(false) --- End diff -- shall we enable it by default? cc @hvanhovell @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16383: [SPARK-18980][SQL] implement Aggregator with TypedImpera...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16383 **[Test build #70583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70583/testReport)** for PR 16383 at commit [`0a73fe2`](https://github.com/apache/spark/commit/0a73fe208ed7e211daf75dd2268aec91868c7ee3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16383: [SPARK-18980][SQL] implement Aggregator with TypedImpera...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16383 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16388: [SPARK-18989][SQL] DESC TABLE should not fail with forma...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16388 **[Test build #70581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70581/testReport)** for PR 16388 at commit [`1f277f4`](https://github.com/apache/spark/commit/1f277f4fcc1b19bb94e2a9debd1fe7f9786e7de4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15505: [SPARK-17931][CORE] taskScheduler has some unneeded seri...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15505 **[Test build #70582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70582/testReport)** for PR 15505 at commit [`be912cb`](https://github.com/apache/spark/commit/be912cb2650364fcd12c45ad5a63a23f1a158779). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/16401 cc @rxin @cloud-fan @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16391: [SPARK-18990][SQL] make DatasetBenchmark fairer f...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16391#discussion_r93844973 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -170,36 +176,39 @@ object DatasetBenchmark { val benchmark3 = aggregate(spark, numRows) /* -OpenJDK 64-Bit Server VM 1.8.0_91-b14 on Linux 3.10.0-327.18.2.el7.x86_64 -Intel Xeon E3-12xx v2 (Ivy Bridge) +Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.12.1 +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz + back-to-back map:Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -RDD 3448 / 3646 29.0 34.5 1.0X -DataFrame 2647 / 3116 37.8 26.5 1.3X -Dataset 4781 / 5155 20.9 47.8 0.7X +RDD 3963 / 3976 25.2 39.6 1.0X +DataFrame 826 / 834121.1 8.3 4.8X +Dataset 5178 / 5198 19.3 51.8 0.8X --- End diff -- ah, scala compiler is smart! I think we can create a ticket to optimize this, i.e. call the primitive apply version, and update the benchmark result. For byte code analysis, let's discuss about it in the ticket later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15664#discussion_r93844970 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala --- @@ -112,7 +112,25 @@ object JdbcUtils extends Logging { */ def insertStatement(conn: Connection, table: String, rddSchema: StructType, dialect: JdbcDialect) : PreparedStatement = { -val columns = rddSchema.fields.map(x => dialect.quoteIdentifier(x.name)).mkString(",") +// Use database column names instead of RDD schema column names +val tableSchemaQuery = conn.prepareStatement(dialect.getSchemaQuery(table)) --- End diff -- We can get the table schema [when we checking whether the table exists](https://github.com/apache/spark/blob/fb07bbe575aabe68422fd3a31865101fb7fa1722/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala#L63). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16401: [SPARK-18998] [SQL] Add a cbo conf to switch between def...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16401 **[Test build #70580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70580/testReport)** for PR 16401 at commit [`53c1b26`](https://github.com/apache/spark/commit/53c1b26e9fc7c253b1654145f910e7881db34de7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...
GitHub user wzhfy opened a pull request: https://github.com/apache/spark/pull/16401 [SPARK-18998] [SQL] Add a cbo conf to switch between default statistics and estimated statistics ## What changes were proposed in this pull request? We add a cbo configuration to switch between default stats and estimated stats. We also define a new statistics method `planStats` in LogicalPlan with conf as its parameter, in order to pass the cbo switch and other estimation related configurations in the future. `planStats` is used on the caller sides (i.e. Optimizer and Strategies) to make transformation decisions based on stats. ## How was this patch tested? Add a test case using a dummy LogicalPlan. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wzhfy/spark cboSwitch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16401.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16401 commit 53c1b26e9fc7c253b1654145f910e7881db34de7 Author: Zhenhua WangDate: 2016-12-24T15:43:53Z add cbo switch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93844718 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -133,49 +209,26 @@ case class CreateMap(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName val mapClass = classOf[ArrayBasedMapData].getName -val keyArray = ctx.freshName("keyArray") -val valueArray = ctx.freshName("valueArray") -ctx.addMutableState("Object[]", keyArray, s"this.$keyArray = null;") -ctx.addMutableState("Object[]", valueArray, s"this.$valueArray = null;") - -val keyData = s"new $arrayClass($keyArray)" -val valueData = s"new $arrayClass($valueArray)" -ev.copy(code = s""" - $keyArray = new Object[${keys.size}]; - $valueArray = new Object[${values.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -keys.zipWithIndex.map { case (key, i) => - val eval = key.genCode(ctx) - s""" -${eval.code} -if (${eval.isNull}) { - throw new RuntimeException("Cannot use null as map key!"); -} else { - $keyArray[$i] = ${eval.value}; -} - """ -}) + - ctx.splitExpressions( -ctx.INPUT_ROW, -values.zipWithIndex.map { case (value, i) => - val eval = value.genCode(ctx) - s""" -${eval.code} -if (${eval.isNull}) { - $valueArray[$i] = null; -} else { - $valueArray[$i] = ${eval.value}; -} - """ -}) + +val MapType(keyDt, valueDt, _) = dataType +val evalKeys = keys.map(e => e.genCode(ctx)) +val evalValues = values.map(e => e.genCode(ctx)) +val (preprocessKeyData, assignKeys, postprocessKeyData, keyArrayData, keyArray) = + GenArrayData.genCodeToCreateArrayData(ctx, keyDt, evalKeys, false) +val (preprocessValueData, assignValues, postprocessValueData, valueArrayData, valueArray) = --- End diff -- are `keyArray` and `valueArray` used? I think we don't need to return the array name in `genCodeToCreateArrayData` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93844582 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,108 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, _) = + GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true) +ev.copy( + code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + postprocess, + value = arrayData, + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + /** + * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class + * + * @param ctx a [[CodegenContext]] + * @param elementType data type of underlying array elements + * @param elementsCode a set of [[ExprCode]] for each element of an underlying array + * @param allowNull if to assign null value to an array element is allowed + * @return (code pre-assignments, assignments to each array elements, code post-assignments, + * arrayData name, underlying array name) + */ + def genCodeToCreateArrayData( + ctx: CodegenContext, + elementType: DataType, + elementsCode: Seq[ExprCode], + allowNull: Boolean): (String, Seq[String], String, String, String) = { +val arrayName = ctx.freshName("array") +val arrayDataName = ctx.freshName("arrayData") +val numElements = elementsCode.length + +if (!ctx.isPrimitiveType(elementType)) { + val arrayClass = classOf[ArrayData].getName + val genericArrayClass = classOf[GenericArrayData].getName + ctx.addMutableState("Object[]", arrayName, +s"this.$arrayName = new Object[${numElements}];") + + val assignments = elementsCode.zipWithIndex.map { case (eval, i) => +val isNullAssignment = if (allowNull) { + s"$arrayName[$i] = null;" +} else { + "throw new RuntimeException(\"Cannot use null as map key!\");" +} +eval.code + s""" + if (${eval.isNull}) { + $isNullAssignment + } else { + $arrayName[$i] = ${eval.value}; + } + """ + } + + /* +TODO: When we declare arrayDataName as GenericArrayData, + we have to solve the following exception + https://github.com/apache/spark/pull/13909/files#r93813725 + */ + ("", + assignments, + s"final $arrayClass $arrayDataName = new $genericArrayClass($arrayName);", + arrayDataName, + arrayName) +} else { + val unsafeArrayClass = classOf[UnsafeArrayData].getName --- End diff -- this is not needed, `UnsafeArrayData` is imported by default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93844573 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,108 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, _) = + GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true) +ev.copy( + code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + postprocess, + value = arrayData, + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + /** + * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class + * + * @param ctx a [[CodegenContext]] + * @param elementType data type of underlying array elements + * @param elementsCode a set of [[ExprCode]] for each element of an underlying array + * @param allowNull if to assign null value to an array element is allowed + * @return (code pre-assignments, assignments to each array elements, code post-assignments, + * arrayData name, underlying array name) + */ + def genCodeToCreateArrayData( + ctx: CodegenContext, + elementType: DataType, + elementsCode: Seq[ExprCode], + allowNull: Boolean): (String, Seq[String], String, String, String) = { +val arrayName = ctx.freshName("array") +val arrayDataName = ctx.freshName("arrayData") +val numElements = elementsCode.length + +if (!ctx.isPrimitiveType(elementType)) { + val arrayClass = classOf[ArrayData].getName --- End diff -- this is not needed, `ArrayData` is imported by default. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93844552 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, array) = + GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true) +ev.copy( + code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + postprocess, + value = arrayData, + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + /** + * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class + * + * @param ctx a [[CodegenContext]] + * @param elementType data type of an underlying array + * @param elementsCode a set of [[ExprCode]] for each element of an underlying array + * @return (code pre-assignments, assignments to each array elements, code post-assignments, + * arrayData name, underlying array name) + */ + def genCodeToCreateArrayData( + ctx: CodegenContext, + elementType: DataType, + elementsCode: Seq[ExprCode], + allowNull: Boolean): (String, Seq[String], String, String, String) = { +val arrayName = ctx.freshName("array") +val arrayDataName = ctx.freshName("arrayData") +val numElements = elementsCode.length + +if (!ctx.isPrimitiveType(elementType)) { + val arrayClass = classOf[ArrayData].getName + val genericArrayClass = classOf[GenericArrayData].getName + ctx.addMutableState("Object[]", arrayName, +s"this.$arrayName = new Object[${numElements}];") + + val assignments = elementsCode.zipWithIndex.map { case (eval, i) => +val isNullAssignment = if (allowNull) { + s"$arrayName[$i] = null;" +} else { + "throw new RuntimeException(\"Cannot use null as map key!\");" +} +eval.code + s""" + if (${eval.isNull}) { + $isNullAssignment + } else { + $arrayName[$i] = ${eval.value}; + } + """ + } + + /* +TODO: When we declare arrayDataName as GenericArrayData, --- End diff -- it's a very minor problem, let's not bother about it and remove this todo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9759: [SPARK-11753][SQL][test-hadoop2.2] Make allowNonNumericNu...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/9759 @srowen @rxin @zsxwing The Option json serialization issue (https://github.com/FasterXML/jackson-module-scala/issues/240) looks like fixed now. Do you think it is ok I try to upgrade Jackson now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16400: [SPARK-18941][SQL][DOC] Add a new behavior document on `...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16400 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70579/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16400: [SPARK-18941][SQL][DOC] Add a new behavior document on `...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16400 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16400: [SPARK-18941][SQL][DOC] Add a new behavior document on `...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16400 **[Test build #70579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70579/testReport)** for PR 16400 at commit [`3ea7860`](https://github.com/apache/spark/commit/3ea7860a3c030ba40ffda40d6e5c586ecce078c3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16355: [SPARK-16473][MLLIB] Fix BisectingKMeans Algorithm faili...
Github user alokob commented on the issue: https://github.com/apache/spark/pull/16355 @imatiach-msft Did you find the dataset suitable. Is anything else needed from my side? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14473: [SPARK-16495] [MLlib]Add ADMM optimizer in mllib package
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/14473 ADMM is already available as a breeze solver (BFGS, OWLQN, NonlinearMinimizer) which is integrated with ml/mllib...It will be great if you can look into it and let me know if you need pointers in running comparisons with OWLQN: https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala This is implemented based on the paper you cited. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16400: [SPARK-18941][SQL][DOC] Add a new behavior document on `...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16400 **[Test build #70579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70579/testReport)** for PR 16400 at commit [`3ea7860`](https://github.com/apache/spark/commit/3ea7860a3c030ba40ffda40d6e5c586ecce078c3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/16400 [SPARK-18941][SQL][DOC] Add a new behavior document on `CREATE/DROP TABLE` with `LOCATION` ## What changes were proposed in this pull request? This PR adds a new behavior change description on `CREATE TABLE ... LOCATION` at `sql-programming-guide.md` clearly under `Upgrading From Spark SQL 1.6 to 2.0`. This change is introduced at Apache Spark 2.0.0 as [SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276). ## How was this patch tested? ``` SKIP_API=1 jekyll build ``` **Newly Added Description** https://cloud.githubusercontent.com/assets/9700541/21475905/d55c3e1e-cae6-11e6-8651-9bf2be53b6dd.png;> You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-18941 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16400.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16400 commit 3ea7860a3c030ba40ffda40d6e5c586ecce078c3 Author: Dongjoon HyunDate: 2016-12-26T05:06:12Z [SPARK-18941][SQL][DOC] Add a new behavior document on `CREATE/DROP TABLE` with `LOCATION` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14452 @davies It is true that pushing down different predicates results in different CTE logical/physical plans. I spend some LOC changes in this to address that cases, i.e., preparing a disjunctive predicate for duplicated CTE with different predicates. For Q64, a disjunctive predicate will be pushed down too. I am not sure what the problem is you mentioned. Let me try to get and show the pushed down predicate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/12574 Can we close it ? Looks like SPARK-18235 opened up recommendForAll --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12574: [SPARK-13857][ML][WIP] Add "recommend all" functionality...
Github user debasish83 commented on the issue: https://github.com/apache/spark/pull/12574 test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15664 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70577/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15664 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15664 **[Test build #70577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70577/testReport)** for PR 15664 at commit [`11f5874`](https://github.com/apache/spark/commit/11f587465c257ba194a157b57244f53ff5eb47fd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16320 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16320 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70576/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16320 **[Test build #70576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70576/testReport)** for PR 16320 at commit [`308de12`](https://github.com/apache/spark/commit/308de12950599a6900766a76a0ea39ac72aba59f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16255: [SPARK-18609][SQL]Fix when CTE with Join between ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16255#discussion_r93841751 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -200,6 +200,8 @@ object RemoveAliasOnlyProject extends Rule[LogicalPlan] { case plan: Project if plan eq proj => plan.child --- End diff -- what do you mean? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13909 **[Test build #70578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70578/testReport)** for PR 13909 at commit [`293b344`](https://github.com/apache/spark/commit/293b344e761bc4b9c04891c02c702a374472345a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93841229 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, array) = + GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true) +ev.copy( + code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + postprocess, + value = arrayData, + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + /** + * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class + * + * @param ctx a [[CodegenContext]] + * @param elementType data type of an underlying array + * @param elementsCode a set of [[ExprCode]] for each element of an underlying array + * @return (code pre-assignments, assignments to each array elements, code post-assignments, + * arrayData name, underlying array name) + */ + def genCodeToCreateArrayData( + ctx: CodegenContext, + elementType: DataType, + elementsCode: Seq[ExprCode], + allowNull: Boolean): (String, Seq[String], String, String, String) = { +val arrayName = ctx.freshName("array") +val arrayDataName = ctx.freshName("arrayData") +val numElements = elementsCode.length + +if (!ctx.isPrimitiveType(elementType)) { + val arrayClass = classOf[ArrayData].getName + val genericArrayClass = classOf[GenericArrayData].getName + ctx.addMutableState("Object[]", arrayName, +s"this.$arrayName = new Object[${numElements}];") + + val assignments = elementsCode.zipWithIndex.map { case (eval, i) => +val isNullAssignment = if (allowNull) { + s"$arrayName[$i] = null;" +} else { + "throw new RuntimeException(\"Cannot use null as map key!\");" +} +eval.code + s""" + if (${eval.isNull}) { + $isNullAssignment + } else { + $arrayName[$i] = ${eval.value}; + } + """ + } + + /* +TODO: When we declare arrayDataName as GenericArrayData, + we have to solve the following exception + https://github.com/apache/spark/pull/13909/files#r93813725 + */ + ("", + assignments, + s"final $arrayClass $arrayDataName = new $genericArrayClass($arrayName);", + arrayDataName, + arrayName) +} else { + val unsafeArrayClass = classOf[UnsafeArrayData].getName + val unsafeArraySizeInBytes = +UnsafeArrayData.calculateHeaderPortionInBytes(numElements) + + ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements) + val baseOffset = Platform.BYTE_ARRAY_OFFSET + ctx.addMutableState(unsafeArrayClass, arrayDataName, ""); + + val primitiveValueTypeName = ctx.primitiveTypeName(elementType) + val assignments = elementsCode.zipWithIndex.map { case (eval, i) => +val isNullAssignment = if (allowNull) { + s"$arrayDataName.setNullAt($i);" +} else { + "throw new RuntimeException(\"Cannot use null as map key!\");" +} +eval.code + s""" + if (${eval.isNull}) { + $isNullAssignment + } else { + $arrayDataName.set$primitiveValueTypeName($i, ${eval.value}); + } + """ + } + + (s""" +byte[] $arrayName = new
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93841112 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, array) = + GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true) +ev.copy( + code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + postprocess, + value = arrayData, + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + /** + * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class + * + * @param ctx a [[CodegenContext]] + * @param elementType data type of an underlying array + * @param elementsCode a set of [[ExprCode]] for each element of an underlying array + * @return (code pre-assignments, assignments to each array elements, code post-assignments, + * arrayData name, underlying array name) + */ + def genCodeToCreateArrayData( + ctx: CodegenContext, + elementType: DataType, + elementsCode: Seq[ExprCode], + allowNull: Boolean): (String, Seq[String], String, String, String) = { +val arrayName = ctx.freshName("array") +val arrayDataName = ctx.freshName("arrayData") +val numElements = elementsCode.length + +if (!ctx.isPrimitiveType(elementType)) { + val arrayClass = classOf[ArrayData].getName + val genericArrayClass = classOf[GenericArrayData].getName + ctx.addMutableState("Object[]", arrayName, +s"this.$arrayName = new Object[${numElements}];") + + val assignments = elementsCode.zipWithIndex.map { case (eval, i) => +val isNullAssignment = if (allowNull) { + s"$arrayName[$i] = null;" +} else { + "throw new RuntimeException(\"Cannot use null as map key!\");" +} +eval.code + s""" + if (${eval.isNull}) { + $isNullAssignment + } else { + $arrayName[$i] = ${eval.value}; + } + """ + } + + /* +TODO: When we declare arrayDataName as GenericArrayData, + we have to solve the following exception + https://github.com/apache/spark/pull/13909/files#r93813725 + */ + ("", + assignments, + s"final $arrayClass $arrayDataName = new $genericArrayClass($arrayName);", + arrayDataName, + arrayName) +} else { + val unsafeArrayClass = classOf[UnsafeArrayData].getName + val unsafeArraySizeInBytes = +UnsafeArrayData.calculateHeaderPortionInBytes(numElements) + + ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements) + val baseOffset = Platform.BYTE_ARRAY_OFFSET + ctx.addMutableState(unsafeArrayClass, arrayDataName, ""); + + val primitiveValueTypeName = ctx.primitiveTypeName(elementType) + val assignments = elementsCode.zipWithIndex.map { case (eval, i) => +val isNullAssignment = if (allowNull) { + s"$arrayDataName.setNullAt($i);" +} else { + "throw new RuntimeException(\"Cannot use null as map key!\");" +} +eval.code + s""" + if (${eval.isNull}) { + $isNullAssignment + } else { + $arrayDataName.set$primitiveValueTypeName($i, ${eval.value}); + } + """ + } + + (s""" +byte[] $arrayName = new
[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/13909 LGTM except for few minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93839843 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, array) = --- End diff -- Ah, i noticed that `_` can be used for return value. It seems to be better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93839797 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, array) = + GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true) +ev.copy( + code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + postprocess, + value = arrayData, + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + /** + * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class + * + * @param ctx a [[CodegenContext]] + * @param elementType data type of an underlying array + * @param elementsCode a set of [[ExprCode]] for each element of an underlying array + * @return (code pre-assignments, assignments to each array elements, code post-assignments, + * arrayData name, underlying array name) + */ + def genCodeToCreateArrayData( + ctx: CodegenContext, + elementType: DataType, + elementsCode: Seq[ExprCode], + allowNull: Boolean): (String, Seq[String], String, String, String) = { +val arrayName = ctx.freshName("array") +val arrayDataName = ctx.freshName("arrayData") +val numElements = elementsCode.length + +if (!ctx.isPrimitiveType(elementType)) { + val arrayClass = classOf[ArrayData].getName + val genericArrayClass = classOf[GenericArrayData].getName + ctx.addMutableState("Object[]", arrayName, +s"this.$arrayName = new Object[${numElements}];") + + val assignments = elementsCode.zipWithIndex.map { case (eval, i) => +val isNullAssignment = if (allowNull) { + s"$arrayName[$i] = null;" +} else { + "throw new RuntimeException(\"Cannot use null as map key!\");" +} +eval.code + s""" + if (${eval.isNull}) { + $isNullAssignment + } else { + $arrayName[$i] = ${eval.value}; + } + """ + } + + /* +TODO: When we declare arrayDataName as GenericArrayData, + we have to solve the following exception + https://github.com/apache/spark/pull/13909/files#r93813725 + */ + ("", + assignments, + s"final $arrayClass $arrayDataName = new $genericArrayClass($arrayName);", + arrayDataName, + arrayName) +} else { + val unsafeArrayClass = classOf[UnsafeArrayData].getName + val unsafeArraySizeInBytes = +UnsafeArrayData.calculateHeaderPortionInBytes(numElements) + + ByteArrayMethods.roundNumberOfBytesToNearestWord(elementType.defaultSize * numElements) + val baseOffset = Platform.BYTE_ARRAY_OFFSET + ctx.addMutableState(unsafeArrayClass, arrayDataName, ""); + + val primitiveValueTypeName = ctx.primitiveTypeName(elementType) + val assignments = elementsCode.zipWithIndex.map { case (eval, i) => +val isNullAssignment = if (allowNull) { + s"$arrayDataName.setNullAt($i);" +} else { + "throw new RuntimeException(\"Cannot use null as map key!\");" +} +eval.code + s""" + if (${eval.isNull}) { + $isNullAssignment + } else { + $arrayDataName.set$primitiveValueTypeName($i, ${eval.value}); + } + """ + } + + (s""" +byte[] $arrayName = new
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93839767 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, array) = + GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true) +ev.copy( + code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + postprocess, + value = arrayData, + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + /** + * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class + * + * @param ctx a [[CodegenContext]] + * @param elementType data type of an underlying array + * @param elementsCode a set of [[ExprCode]] for each element of an underlying array + * @return (code pre-assignments, assignments to each array elements, code post-assignments, + * arrayData name, underlying array name) + */ + def genCodeToCreateArrayData( + ctx: CodegenContext, + elementType: DataType, + elementsCode: Seq[ExprCode], + allowNull: Boolean): (String, Seq[String], String, String, String) = { +val arrayName = ctx.freshName("array") +val arrayDataName = ctx.freshName("arrayData") +val numElements = elementsCode.length + +if (!ctx.isPrimitiveType(elementType)) { + val arrayClass = classOf[ArrayData].getName + val genericArrayClass = classOf[GenericArrayData].getName + ctx.addMutableState("Object[]", arrayName, +s"this.$arrayName = new Object[${numElements}];") + + val assignments = elementsCode.zipWithIndex.map { case (eval, i) => +val isNullAssignment = if (allowNull) { + s"$arrayName[$i] = null;" +} else { + "throw new RuntimeException(\"Cannot use null as map key!\");" +} +eval.code + s""" + if (${eval.isNull}) { + $isNullAssignment + } else { + $arrayName[$i] = ${eval.value}; + } + """ + } + + /* +TODO: When we declare arrayDataName as GenericArrayData, --- End diff -- I think that we could have optimization opportunities if we would update Janino. I am planning to submit a PR to Janino. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93839620 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, array) = --- End diff -- Why return `array`? I don't see you use it later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93839535 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, array) = + GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true) +ev.copy( + code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + postprocess, + value = arrayData, + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + /** + * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class + * + * @param ctx a [[CodegenContext]] + * @param elementType data type of an underlying array + * @param elementsCode a set of [[ExprCode]] for each element of an underlying array + * @return (code pre-assignments, assignments to each array elements, code post-assignments, + * arrayData name, underlying array name) + */ + def genCodeToCreateArrayData( + ctx: CodegenContext, + elementType: DataType, + elementsCode: Seq[ExprCode], + allowNull: Boolean): (String, Seq[String], String, String, String) = { +val arrayName = ctx.freshName("array") +val arrayDataName = ctx.freshName("arrayData") +val numElements = elementsCode.length + +if (!ctx.isPrimitiveType(elementType)) { + val arrayClass = classOf[ArrayData].getName + val genericArrayClass = classOf[GenericArrayData].getName + ctx.addMutableState("Object[]", arrayName, +s"this.$arrayName = new Object[${numElements}];") + + val assignments = elementsCode.zipWithIndex.map { case (eval, i) => +val isNullAssignment = if (allowNull) { + s"$arrayName[$i] = null;" +} else { + "throw new RuntimeException(\"Cannot use null as map key!\");" +} +eval.code + s""" + if (${eval.isNull}) { + $isNullAssignment + } else { + $arrayName[$i] = ${eval.value}; + } + """ + } + + /* +TODO: When we declare arrayDataName as GenericArrayData, --- End diff -- Is this TODO valid now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93839460 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, array) = + GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true) +ev.copy( + code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + postprocess, + value = arrayData, + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + /** + * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class + * + * @param ctx a [[CodegenContext]] + * @param elementType data type of an underlying array + * @param elementsCode a set of [[ExprCode]] for each element of an underlying array + * @return (code pre-assignments, assignments to each array elements, code post-assignments, --- End diff -- No param doc for `allowNull`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r93839456 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -56,33 +57,107 @@ case class CreateArray(children: Seq[Expression]) extends Expression { } override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -val arrayClass = classOf[GenericArrayData].getName -val values = ctx.freshName("values") -ctx.addMutableState("Object[]", values, s"this.$values = null;") - -ev.copy(code = s""" - this.$values = new Object[${children.size}];""" + - ctx.splitExpressions( -ctx.INPUT_ROW, -children.zipWithIndex.map { case (e, i) => - val eval = e.genCode(ctx) - eval.code + s""" -if (${eval.isNull}) { - $values[$i] = null; -} else { - $values[$i] = ${eval.value}; -} - """ -}) + - s""" -final ArrayData ${ev.value} = new $arrayClass($values); -this.$values = null; - """, isNull = "false") +val et = dataType.elementType +val evals = children.map(e => e.genCode(ctx)) +val (preprocess, assigns, postprocess, arrayData, array) = + GenArrayData.genCodeToCreateArrayData(ctx, et, evals, true) +ev.copy( + code = preprocess + ctx.splitExpressions(ctx.INPUT_ROW, assigns) + postprocess, + value = arrayData, + isNull = "false") } override def prettyName: String = "array" } +private [sql] object GenArrayData { + /** + * Return Java code pieces based on DataType and isPrimitive to allocate ArrayData class + * + * @param ctx a [[CodegenContext]] + * @param elementType data type of an underlying array --- End diff -- data type of underlying array element? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [B...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16399#discussion_r93838816 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -55,7 +55,10 @@ case class InsertIntoHiveTable( def output: Seq[Attribute] = Seq.empty - val stagingDir = sessionState.conf.getConfString("hive.exec.stagingdir", ".hive-staging") + val hadoopConf = sessionState.newHadoopConf() --- End diff -- https://github.com/apache/spark/pull/15744 needs to be backported too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16399 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70575/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16399 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16399 cc @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16399 **[Test build #70575 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70575/consoleFull)** for PR 16399 at commit [`2482cdc`](https://github.com/apache/spark/commit/2482cdce5680ca5c9754fc759d18e4fefa3d8cd5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15664 **[Test build #70577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70577/testReport)** for PR 15664 at commit [`11f5874`](https://github.com/apache/spark/commit/11f587465c257ba194a157b57244f53ff5eb47fd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16320: [SPARK-18877][SQL] `CSVInferSchema.inferField` on Decima...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16320 **[Test build #70576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70576/testReport)** for PR 16320 at commit [`308de12`](https://github.com/apache/spark/commit/308de12950599a6900766a76a0ea39ac72aba59f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16370: [SPARK-18960][SQL][SS] Avoid double reading file which i...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16370 @zsxwing Is there any farther feedbackï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16399 **[Test build #70575 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70575/consoleFull)** for PR 16399 at commit [`2482cdc`](https://github.com/apache/spark/commit/2482cdce5680ca5c9754fc759d18e4fefa3d8cd5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13758 I think that it is good time to close this when #13909 is closed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16399: [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [B...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/16399 [SPARK-18237][SPARK-18703] [SPARK-18675] [SQL] [BACKPORT-2.1] CTAS for hive serde table should work for all hive versions AND Drop Staging Directories and Data Files ### What changes were proposed in this pull request? This PR is to backport https://github.com/apache/spark/pull/15744, https://github.com/apache/spark/pull/16104 and https://github.com/apache/spark/pull/16134. -- [[SPARK-18237][HIVE] hive.exec.stagingdir have no effect ](https://github.com/apache/spark/pull/15744) hive.exec.stagingdir have no effect in spark2.0.1ï¼ Hive confs in hive-site.xml will be loaded in hadoopConf, so we should use hadoopConf in InsertIntoHiveTable instead of SessionState.conf -- [[SPARK-18675][SQL] CTAS for hive serde table should work for all hive versions](https://github.com/apache/spark/pull/16104) Before hive 1.1, when inserting into a table, hive will create the staging directory under a common scratch directory. After the writing is finished, hive will simply empty the table directory and move the staging directory to it. After hive 1.1, hive will create the staging directory under the table directory, and when moving staging directory to table directory, hive will still empty the table directory, but will exclude the staging directory there. In `InsertIntoHiveTable`, we simply copy the code from hive 1.2, which means we will always create the staging directory under the table directory, no matter what the hive version is. This causes problems if the hive version is prior to 1.1, because the staging directory will be removed by hive when hive is trying to empty the table directory. This PR copies the code from hive 0.13, so that we have 2 branches to create staging directory. If hive version is prior to 1.1, we'll go to the old style branch(i.e. create the staging directory under a common scratch directory), else, go to the new style branch(i.e. create the staging directory under the table directory) -- [[SPARK-18703] [SQL] Drop Staging Directories and Data Files After each Insertion/CTAS of Hive serde Tables](https://github.com/apache/spark/pull/16134) Below are the files/directories generated for three inserts againsts a Hive table: ``` /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1 /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1 /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/._SUCCESS.crc /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/.part-0.crc /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/_SUCCESS /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/part-0 /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1 /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1 /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/._SUCCESS.crc /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/.part-0.crc /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/_SUCCESS /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/part-0 /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1
[GitHub] spark issue #16309: [WIP][SPARK-18896][TESTS] Update to ScalaTest 3.0.1
Github user jaceklaskowski commented on the issue: https://github.com/apache/spark/pull/16309 For reference: [scala-xml releases](https://github.com/scala/scala-xml/releases) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16388: [SPARK-18989][SQL] DESC TABLE should not fail wit...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16388#discussion_r93832614 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -408,8 +408,8 @@ private[hive] class HiveClientImpl( lastAccessTime = h.getLastAccessTime.toLong * 1000, storage = CatalogStorageFormat( locationUri = shim.getDataLocation(h), - inputFormat = Option(h.getInputFormatClass).map(_.getName), - outputFormat = Option(h.getOutputFormatClass).map(_.getName), + inputFormat = Option(h.getTTable.getSd.getInputFormat), + outputFormat = Option(h.getTTable.getSd.getOutputFormat), --- End diff -- After more readings, `getTTable.getSd.getInputFormat` and `getTTable.getSd.getOutputFormat` will be null for non-native Hive tables, e.g., JDBC tables, HBase tables and Cassandra tables. See the link for more details: https://cwiki.apache.org/confluence/display/Hive/StorageHandlers So far, this is OK. I am just afraid we might expand the usage of `getTableOption` in the future. Maybe at least document the restrictions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/16233 I'm working on the last option approach, I hope I could finish that in one or two more days. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15730 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15730 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70574/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15730 **[Test build #70574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70574/testReport)** for PR 15730 at commit [`13ccfff`](https://github.com/apache/spark/commit/13ccfff7b2bf85671511e23e807244d8abd3f9d4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16389: [SPARK-18981][Core]The job hang problem when speculation...
Github user zhaorongsheng commented on the issue: https://github.com/apache/spark/pull/16389 @mridulm Please check it. Thanks~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16398 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16398 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70573/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16398 **[Test build #70573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70573/testReport)** for PR 16398 at commit [`ac87226`](https://github.com/apache/spark/commit/ac872264b59672a077aafc55f23a0705adf23f37). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15730: [SPARK-18218][ML][MLLib] Optimize BlockMatrix multiplica...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15730 **[Test build #70574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70574/testReport)** for PR 15730 at commit [`13ccfff`](https://github.com/apache/spark/commit/13ccfff7b2bf85671511e23e807244d8abd3f9d4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16397 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70572/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16397 **[Test build #70572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70572/testReport)** for PR 16397 at commit [`0322689`](https://github.com/apache/spark/commit/03226898cb67e6087bb72faef69d42d3a1a80201). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16397 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16397: [WIP][SPARK-18922][TESTS] Fix more path-related t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16397#discussion_r93829600 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala --- @@ -221,7 +223,7 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto // file://path/to/data/files/employee.dat // // TODO: need a similar test for non-local mode. - if (local) { + if (local && !Utils.isWindows) { --- End diff -- Let me fix this to test on Windows too after the test above is being finished. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16397: [WIP][SPARK-18922][TESTS] Fix more path-related t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16397#discussion_r93829539 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala --- @@ -221,7 +223,7 @@ class HiveCommandSuite extends QueryTest with SQLTestUtils with TestHiveSingleto // file://path/to/data/files/employee.dat // // TODO: need a similar test for non-local mode. - if (local) { + if (local && !Utils.isWindows) { --- End diff -- This is being skipped because `incorrectUri` below becomes `file://path/to/data/files/employee.dat` (or `file://C:/path/to/data/files/employee.dat` on Windows). This seems checking if the file exists or not with `uri.getPath` assuming from [tables.scala#L223-L248](https://github.com/apache/spark/blob/5572ccf86b084eb5938fe62fd5d9973ec14d555d/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L223-L248). In Linux/Mac, it seems `uri.getPath` becomes `/to/data/files/employee.dat` where as `/path/to/data/files/employee.dat` on Windows. The former path does not exists on Linux/Mac but the latter on Windows seems fine as the path seems correctly implicitly adding `C:`, meaning on Windows, these below seem fine: ```scala new File("/C:/a/b/c").exists new File("/a/b/c").exists ``` Therefore, the test below: ``` intercept[AnalysisException] { sql(s"""LOAD DATA LOCAL INPATH "$incorrectUri" INTO TABLE non_part_table""") } ``` dose not throw an exception on Windows because `incoorectURI` seems fine in this path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14936: [SPARK-7877][MESOS] Allow configuration of framework tim...
Github user philipphoffmann commented on the issue: https://github.com/apache/spark/pull/14936 will do ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 Note to myself, there should be more but I could not identify them all although I believe these are almost all, because some errors are suppressed and there are still many test failures including these which make finding more harder. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16398 cc @srowen, this is a similar problem in the last PR in this JIRA, which make the test hanging. I think this is the last one. Could I please ask to take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16398 Build started: [TESTS] `org.apache.spark.repl.ReplSuite` [![PR-16398](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=288FF0F4-F91E-4FE6-89F6-C8C8D9F47A5B=true)](https://ci.appveyor.com/project/spark-test/spark/branch/288FF0F4-F91E-4FE6-89F6-C8C8D9F47A5B) Diff: https://github.com/apache/spark/compare/master...spark-test:288FF0F4-F91E-4FE6-89F6-C8C8D9F47A5B --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16398: [SPARK-18842][TESTS] De-duplicate paths in classpaths in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16398 **[Test build #70573 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70573/testReport)** for PR 16398 at commit [`ac87226`](https://github.com/apache/spark/commit/ac872264b59672a077aafc55f23a0705adf23f37). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16398: [SPARK-18842][TESTS] De-duplicate paths in classp...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16398 [SPARK-18842][TESTS] De-duplicate paths in classpaths in processes for local-cluster mode in ReplSuite to work around the length limitation on Windows ## What changes were proposed in this pull request? `ReplSuite`s hang due to the length limitation on Windows with the exception as below: ``` Spark context available as 'sc' (master = local-cluster[1,1,1024], app id = app-20161223114000-). Spark session available as 'spark'. Exception in thread "ExecutorRunner for app-20161223114000-/26995" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOf(Arrays.java:3332) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137) at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:622) at java.lang.StringBuilder.append(StringBuilder.java:202) at java.lang.ProcessImpl.createCommandLine(ProcessImpl.java:194) at java.lang.ProcessImpl.(ProcessImpl.java:340) at java.lang.ProcessImpl.start(ProcessImpl.java:137) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) at org.apache.spark.deploy.worker.ExecutorRunner.org$apache$spark$deploy$worker$ExecutorRunner$$fetchAndRunExecutor(ExecutorRunner.scala:167) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:73) ``` The reason is, it uses the paths as URLs in the tests whereas some added afterward are normal local paths. So, many paths are duplicated because normal local paths and URLs are mixed. This length is up to 40K which hits the length limitation problem (32K) on Windows. The full command line built here is - https://gist.github.com/HyukjinKwon/46af7946c9a5fd4c6fc70a8a0aba1beb ## How was this patch tested? Manually via AppVeyor. **Before** https://ci.appveyor.com/project/spark-test/spark/build/395-find-path-issues **After** https://ci.appveyor.com/project/spark-test/spark/build/398-find-path-issues You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-18842-more Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16398.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16398 commit ac872264b59672a077aafc55f23a0705adf23f37 Author: hyukjinkwonDate: 2016-12-25T13:12:47Z De-duplicate paths in classpaths in processes for local-cluster mode in ReplSuite to work around the length limitation on Windows --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16397 Build started: [TESTS] `ALL` [![PR-16397](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=54CDC9DA-B59F-48CA-B6A8-23262A166C91=true)](https://ci.appveyor.com/project/spark-test/spark/branch/54CDC9DA-B59F-48CA-B6A8-23262A166C91) Diff: https://github.com/apache/spark/compare/master...spark-test:54CDC9DA-B59F-48CA-B6A8-23262A166C91 (This will fail because there are other test failures and the build above runs all tests.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16397: [WIP][SPARK-18922][TESTS] Fix more path-related test fai...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16397 **[Test build #70572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70572/testReport)** for PR 16397 at commit [`0322689`](https://github.com/apache/spark/commit/03226898cb67e6087bb72faef69d42d3a1a80201). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16397: [WIP][SPARK-18922][TESTS] Fix more path-related t...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/16397 [WIP][SPARK-18922][TESTS] Fix more path-related test failures on Windows ## What changes were proposed in this pull request? This PR proposes to fix the test failures due to different format of paths on Windows. Failed tests are as below: ``` ColumnExpressionSuite: - input_file_name, input_file_block_start, input_file_block_length - FileScanRDD *** FAILED *** (187 milliseconds) "file:///C:/projects/spark/target/tmp/spark-0b21b963-6cfa-411c-8d6f-e6a5e1e73bce/part-1-c083a03a-e55e-4b05-9073-451de352d006.snappy.parquet" did not contain "C:\projects\spark\target\tmp\spark-0b21b963-6cfa-411c-8d6f-e6a5e1e73bce" (ColumnExpressionSuite.scala:545) - input_file_name, input_file_block_start, input_file_block_length - HadoopRDD *** FAILED *** (172 milliseconds) "file:/C:/projects/spark/target/tmp/spark-5d0afa94-7c2f-463b-9db9-2e8403e2bc5f/part-0-f6530138-9ad3-466d-ab46-0eeb6f85ed0b.txt" did not contain "C:\projects\spark\target\tmp\spark-5d0afa94-7c2f-463b-9db9-2e8403e2bc5f" (ColumnExpressionSuite.scala:569) - input_file_name, input_file_block_start, input_file_block_length - NewHadoopRDD *** FAILED *** (156 milliseconds) "file:/C:/projects/spark/target/tmp/spark-a894c7df-c74d-4d19-82a2-a04744cb3766/part-0-29674e3f-3fcf-4327-9b04-4dab1d46338d.txt" did not contain "C:\projects\spark\target\tmp\spark-a894c7df-c74d-4d19-82a2-a04744cb3766" (ColumnExpressionSuite.scala:598) ``` ``` DataStreamReaderWriterSuite: - source metadataPath *** FAILED *** (62 milliseconds) org.mockito.exceptions.verification.junit.ArgumentsAreDifferent: Argument(s) are different! Wanted: streamSourceProvider.createSource( org.apache.spark.sql.SQLContext@3b04133b, "C:\projects\spark\target\tmp\streaming.metadata-b05db6ae-c8dc-4ce4-b0d9-1eb8c84876c0/sources/0", None, "org.apache.spark.sql.streaming.test", Map() ); -> at org.apache.spark.sql.streaming.test.DataStreamReaderWriterSuite$$anonfun$12.apply$mcV$sp(DataStreamReaderWriterSuite.scala:374) Actual invocation has different arguments: streamSourceProvider.createSource( org.apache.spark.sql.SQLContext@3b04133b, "/C:/projects/spark/target/tmp/streaming.metadata-b05db6ae-c8dc-4ce4-b0d9-1eb8c84876c0/sources/0", None, "org.apache.spark.sql.streaming.test", Map() ); ``` ``` GlobalTempViewSuite: - CREATE GLOBAL TEMP VIEW USING *** FAILED *** (110 milliseconds) org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark arget mpspark-960398ba-a0a1-45f6-a59a-d98533f9f519; ``` ``` CreateTableAsSelectSuite: - CREATE TABLE USING AS SELECT *** FAILED *** (0 milliseconds) java.lang.IllegalArgumentException: Can not create a Path from an empty string - create a table, drop it and create another one with the same name *** FAILED *** (16 milliseconds) java.lang.IllegalArgumentException: Can not create a Path from an empty string - create table using as select - with partitioned by *** FAILED *** (0 milliseconds) java.lang.IllegalArgumentException: Can not create a Path from an empty string - create table using as select - with non-zero buckets *** FAILED *** (0 milliseconds) java.lang.IllegalArgumentException: Can not create a Path from an empty string ``` ``` HiveMetadataCacheSuite: - partitioned table is cached when partition pruning is true *** FAILED *** (532 milliseconds) org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string); - partitioned table is cached when partition pruning is false *** FAILED *** (297 milliseconds) org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string); ``` ``` MultiDatabaseSuite: - createExternalTable() to non-default database - with USE *** FAILED *** (954 milliseconds) org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark arget mpspark-0839d9a7-5e29-467a-9e3e-3e4cd618ee09; - createExternalTable() to non-default database - without USE *** FAILED *** (500 milliseconds) org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark arget mpspark-c7e24d73-1d8f-45e8-ab7d-53a83087aec3; - invalid database name and table names *** FAILED *** (31 milliseconds) "Path does not exist: file:/C:projectsspark arget mpspark-15a2a494-3483-4876-80e5-ec396e704b77;" did not contain "`t:a` is not a valid name
[GitHub] spark pull request #16388: [SPARK-18989][SQL] DESC TABLE should not fail wit...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16388#discussion_r93827087 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -408,8 +408,8 @@ private[hive] class HiveClientImpl( lastAccessTime = h.getLastAccessTime.toLong * 1000, storage = CatalogStorageFormat( locationUri = shim.getDataLocation(h), - inputFormat = Option(h.getInputFormatClass).map(_.getName), - outputFormat = Option(h.getOutputFormatClass).map(_.getName), + inputFormat = Option(h.getTTable.getSd.getInputFormat), + outputFormat = Option(h.getTTable.getSd.getOutputFormat), --- End diff -- When we actually read the hive table, we still use `getInputFormatClass`. So this will only affect the `DESC TABLE`, and should be OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16391: [SPARK-18990][SQL] make DatasetBenchmark fairer f...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/16391#discussion_r93826236 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala --- @@ -170,36 +176,39 @@ object DatasetBenchmark { val benchmark3 = aggregate(spark, numRows) /* -OpenJDK 64-Bit Server VM 1.8.0_91-b14 on Linux 3.10.0-327.18.2.el7.x86_64 -Intel Xeon E3-12xx v2 (Ivy Bridge) +Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.12.1 +Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz + back-to-back map:Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative -RDD 3448 / 3646 29.0 34.5 1.0X -DataFrame 2647 / 3116 37.8 26.5 1.3X -Dataset 4781 / 5155 20.9 47.8 0.7X +RDD 3963 / 3976 25.2 39.6 1.0X +DataFrame 826 / 834121.1 8.3 4.8X +Dataset 5178 / 5198 19.3 51.8 0.8X --- End diff -- I noticed that Scala compiler automatically generates primitive version. Current Spark eventually calls primitive version thru generic version `Object apply(Object)`. Here is a simple example. When we compile the following sample, we can find that the following class is generated by scalac. Scalac automatically generates a primitive version `int apply$mcII$sp(int)` that can be called by `int apply(int)`. We could infer this signature in Catalyst for simple cases. Of course, I totally agree that the best solution is to analyze byte code and turn it into expression. [This ](https://issues.apache.org/jira/browse/SPARK-14083)was already prototyped. Do you think it is good time to make this prototype more robust now? ```java test("ds") { val ds = sparkContext.parallelize((1 to 10), 1).toDS ds.map(i => i * 7).show } $ javap -c Test\$\$anonfun\$5\$\$anonfun\$apply\$mcV\$sp\$1.class Compiled from "Test.scala" public final class org.apache.spark.sql.Test$$anonfun$5$$anonfun$apply$mcV$sp$1 extends scala.runtime.AbstractFunction1$mcII$sp implements scala.Serializable { public static final long serialVersionUID; public final int apply(int); Code: 0: aload_0 1: iload_1 2: invokevirtual #18 // Method apply$mcII$sp:(I)I 5: ireturn public int apply$mcII$sp(int); Code: 0: iload_1 1: bipush7 3: imul 4: ireturn public final java.lang.Object apply(java.lang.Object); Code: 0: aload_0 1: aload_1 2: invokestatic #29 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I 5: invokevirtual #31 // Method apply:(I)I 8: invokestatic #35 // Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer; 11: areturn public org.apache.spark.sql.Test$$anonfun$5$$anonfun$apply$mcV$sp$1(org.apache.spark.sql.Test$$anonfun$5); Code: 0: aload_0 1: invokespecial #42 // Method scala/runtime/AbstractFunction1$mcII$sp."":()V 4: return } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16296 Hive does not allow to use a CTAS statement to create a partitioned table, but we allow it in the Create Data Source table syntax. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16296 `CREATE TEMPORARY TABLE` is not supported for all the types of hive serde tables. However, in `CREATE TEMPORARY TABLE` is allowed for creating data souce tables if `AS query` is not specified. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org