[GitHub] spark pull request #18561: [WIP][TEST][test-maven] createSparkSession should...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18561#discussion_r126080296 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SharedSQLContext.scala --- @@ -61,8 +61,10 @@ trait SharedSQLContext extends SQLTestUtils with BeforeAndAfterEach with Eventua */ protected override def beforeAll(): Unit = { SparkSession.sqlListener.set(null) -if (_spark == null) { - _spark = createSparkSession +synchronized { --- End diff -- `beforeAll` ought to be called by one thread in the test framework before any tests start. I doubt this is the reason, but, maybe I don't have the full context here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18307: [SPARK-21100][SQL] Add summary method as alternative to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18307 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18307: [SPARK-21100][SQL] Add summary method as alternative to ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18307 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79313/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18023 Left a comment https://github.com/apache/spark/pull/18023#discussion_r126079978 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparision should respect case-...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18460 **[Test build #79319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79319/testReport)** for PR 18460 at commit [`268367e`](https://github.com/apache/spark/commit/268367e6b0b23808b79145951a640927190afe66). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18307: [SPARK-21100][SQL] Add summary method as alternative to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18307 **[Test build #79313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79313/testReport)** for PR 18307 at commit [`3b548cc`](https://github.com/apache/spark/commit/3b548cc3d5ad8928785fe644db9ea788dfb8fad2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r126079978 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -847,6 +847,12 @@ object SQLConf { .intConf .createWithDefault(UnsafeExternalSorter.DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD.toInt) + val SUPPORT_QUOTED_REGEX_COLUMN_NAME = buildConf("spark.sql.parser.quotedRegexColumnNames") +.doc("When true, quoted Identifiers (using backticks) in SELECT statement are interpreted" + --- End diff -- I agree. It only makes sense when we use it in SELECT statement. However, our parser allows `quoted Identifiers (using backticks)` in any part of the SQL statement. Below is the just the example. If we turn on this conf flag, will it cause the problem for the other users when they have quoted identifiers in the query except Project/SELECT list? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18559 updated, I also added a `release-note` label to the JIRA ticket. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126078601 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,43 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME)) .map(col => col.getName).toSet -filters.collect { - case op @ BinaryComparison(a: Attribute, Literal(v, _: IntegralType)) => -s"${a.name} ${op.symbol} $v" - case op @ BinaryComparison(Literal(v, _: IntegralType), a: Attribute) => -s"$v ${op.symbol} ${a.name}" - case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType)) +object ExtractableLiteral { + def unapply(expr: Expression): Option[String] = expr match { +case Literal(value, _: IntegralType) => Some(value.toString) +case Literal(value, _: StringType) => Some(quoteStringLiteral(value.toString)) +case _ => None + } +} + +object ExtractableLiterals { + def unapply(exprs: Seq[Expression]): Option[Seq[String]] = { + exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) { + case (Some(accum), Some(value)) => Some(accum :+ value) + case _ => None +} + } +} + +lazy val convert: PartialFunction[Expression, String] = { + case In(a: Attribute, ExtractableLiterals(values)) if !varcharKeys.contains(a.name) => --- End diff -- Shall we also consider `InSet`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18561: [WIP][TEST][test-maven] createSparkSession should be syn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18561 **[Test build #79318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79318/testReport)** for PR 18561 at commit [`86e47ce`](https://github.com/apache/spark/commit/86e47ce3469929b8086923fe3d201f9db2b2da83). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparision should respec...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r126077717 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -144,6 +145,12 @@ object TypeCoercion { .orElse((t1, t2) match { case (ArrayType(et1, containsNull1), ArrayType(et2, containsNull2)) => findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || containsNull2)) +case (st1 @ StructType(fields1), st2 @ StructType(fields2)) if st1.sameType(st2) => + Some(StructType(fields1.zip(fields2).map { case (sf1, sf2) => +val name = if (sf1.name == sf2.name) sf1.name else sf1.name.toLowerCase(Locale.ROOT) +val dataType = findWiderTypeForTwo(sf1.dataType, sf2.dataType).get --- End diff -- Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18503: [SPARK-21271][SQL] Ensure Unsafe.sizeInBytes is a...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18503#discussion_r126077681 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/FixedLengthRowBasedKeyValueBatch.java --- @@ -62,7 +62,7 @@ public UnsafeRow appendRow(Object kbase, long koff, int klen, keyRowId = numRows; keyRow.pointTo(base, recordOffset, klen); -valueRow.pointTo(base, recordOffset + klen, vlen + 4); --- End diff -- @ooq thank you for pointing out interesting discussion. This discussion seems to make sense for page management. The question of @cloud-fan and me is whether `valueRow` uses only `vlen`. I think that `+4` is for page management. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparision should respec...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r126077576 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -144,6 +145,12 @@ object TypeCoercion { .orElse((t1, t2) match { case (ArrayType(et1, containsNull1), ArrayType(et2, containsNull2)) => findWiderTypeForTwo(et1, et2).map(ArrayType(_, containsNull1 || containsNull2)) +case (st1 @ StructType(fields1), st2 @ StructType(fields2)) if st1.sameType(st2) => + Some(StructType(fields1.zip(fields2).map { case (sf1, sf2) => +val name = if (sf1.name == sf2.name) sf1.name else sf1.name.toLowerCase(Locale.ROOT) +val dataType = findWiderTypeForTwo(sf1.dataType, sf2.dataType).get --- End diff -- Shall we add the comment here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18561: [WIP][TEST][test-maven] createSparkSession should be syn...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18561 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18561: [WIP][TEST][TEST-MAVEN] createSparkSession should be syn...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18561 **[Test build #79317 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79317/testReport)** for PR 18561 at commit [`86e47ce`](https://github.com/apache/spark/commit/86e47ce3469929b8086923fe3d201f9db2b2da83). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18561: [WIP][TEST][TEST-MAVEN] createSparkSession should...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/18561 [WIP][TEST][TEST-MAVEN] createSparkSession should be synchronized This is a test for recent consequtive failures on Spark master branch. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/ You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-MAVEN Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18561.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18561 commit 86e47ce3469929b8086923fe3d201f9db2b2da83 Author: Dongjoon Hyun Date: 2017-07-06T06:10:20Z [TEST][TEST-MAVEN] createSparkSession should be synchronized --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18559 It'd be important to document what syntaxes are no longer allowed in the JIRA ticket (and PR description), and we also highlight that in release notes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17758#discussion_r126074994 --- Diff: sql/core/src/test/resources/sql-tests/inputs/create.sql --- @@ -0,0 +1,23 @@ +-- Catch case-sensitive name duplication +SET spark.sql.caseSensitive=true; + +CREATE TABLE t(c0 STRING, c1 INT, c1 DOUBLE, c0 INT) USING parquet; --- End diff -- ok, will update --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79310/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16697 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16697 **[Test build #79310 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79310/testReport)** for PR 16697 at commit [`554cd39`](https://github.com/apache/spark/commit/554cd391b3ddb5fb3f7c52950610e832ad40047b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18560: [SPARK-21336]Revise rand comparison in BatchEvalPythonEx...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18560 **[Test build #79315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79315/testReport)** for PR 18560 at commit [`6784143`](https://github.com/apache/spark/commit/67841437eb2cffec5686fafd07cb1233a1e5072a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18559 **[Test build #79316 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79316/testReport)** for PR 18559 at commit [`479e53c`](https://github.com/apache/spark/commit/479e53c20303d55a04fd5e98440275332ebb3e5e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18503: [SPARK-21271][SQL] Ensure Unsafe.sizeInBytes is a...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18503#discussion_r126074568 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -350,20 +350,24 @@ private[state] class HDFSBackedStateStoreProvider extends StateStoreProvider wit throw new IOException( s"Error reading delta file $fileToRead of $this: key size cannot be $keySize") } else { - val keyRowBuffer = new Array[Byte](keySize) + // If key size in an existing file is not a multiple of 8, round it to multiple of 8 + val keyAllocationSize = ((keySize + 7) / 8) * 8 + val keyRowBuffer = new Array[Byte](keyAllocationSize) --- End diff -- ok we need to figure out what's going on, seems there are other places we may have wrong size in UnsafeRow --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18559 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79312/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18560: Revise rand comparison in BatchEvalPythonExecSuit...
GitHub user gengliangwang opened a pull request: https://github.com/apache/spark/pull/18560 Revise rand comparison in BatchEvalPythonExecSuite ## What changes were proposed in this pull request? Revise rand comparison in BatchEvalPythonExecSuite In BatchEvalPythonExecSuite, there are two cases using the case "rand() > 3" Rand() generates a random value in [0, 1), it is wired to be compared with 3, use 0.3 instead ## How was this patch tested? unit test Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gengliangwang/spark revise_BatchEvalPythonExecSuite Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18560.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18560 commit 67841437eb2cffec5686fafd07cb1233a1e5072a Author: Wang Gengliang Date: 2017-07-07T05:50:24Z revise BatchEvalPythonExecSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17758#discussion_r126074450 --- Diff: sql/core/src/test/resources/sql-tests/inputs/create.sql --- @@ -0,0 +1,23 @@ +-- Catch case-sensitive name duplication +SET spark.sql.caseSensitive=true; + +CREATE TABLE t(c0 STRING, c1 INT, c1 DOUBLE, c0 INT) USING parquet; --- End diff -- We should keep them in one place. For now I think we still need to put them in `DDLSuite` because we need to run it with and without hive support. Can we pick some typical test cases here and move them to `DDLSuite`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18559 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18559 **[Test build #79312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79312/testReport)** for PR 18559 at commit [`7279262`](https://github.com/apache/spark/commit/72792627d76e0e3452f84af1322a35e3f0d82580). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17758#discussion_r126074238 --- Diff: sql/core/src/test/resources/sql-tests/inputs/create.sql --- @@ -0,0 +1,23 @@ +-- Catch case-sensitive name duplication +SET spark.sql.caseSensitive=true; + +CREATE TABLE t(c0 STRING, c1 INT, c1 DOUBLE, c0 INT) USING parquet; --- End diff -- In `DDLSuite`, we already have simple tests for duplicate columns. we better moving these tests there? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18559 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17758: [SPARK-20460][SQL] Make it more consistent to han...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17758#discussion_r126073404 --- Diff: sql/core/src/test/resources/sql-tests/inputs/create.sql --- @@ -0,0 +1,23 @@ +-- Catch case-sensitive name duplication +SET spark.sql.caseSensitive=true; + +CREATE TABLE t(c0 STRING, c1 INT, c1 DOUBLE, c0 INT) USING parquet; --- End diff -- We didn't have test cases for create table before? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18559 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79311/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18559 **[Test build #79311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79311/testReport)** for PR 18559 at commit [`4d99c11`](https://github.com/apache/spark/commit/4d99c11802efa2d6ee5c36de5941226bf12e1a55). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18444 Thanks for asking @ueshin. Sounds OK to me too. I currently have some pending review comments for minor nits. Let me finish mine within today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072754 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- yea that seems wrong ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072760 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- It's supported since 2.0.X, so definitely there are existing user queires and apps. I'm agreeing with this PR and want to understand the scope of changes. It looks good to me. ```scala scala> sc.version res0: String = 2.0.2 scala> Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") scala> sql("SELECT v.i from (SELECT i FROM v)").show +---+ | i| +---+ | 1| +---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126072118 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- we may have, but this is definitely wrong IMO. BTW at least we don't have this usage in our tests, so I think it's probably fine. also cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector shoul...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/18557 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18557 Yep. It's totally internal officially. What I meant with `performance issue` is 3rd party can still use it and there might be a performance gap between `float` and `double`. I'll close this PR. Thank you again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126071406 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- Do we have such usage in existing queries? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18444 LGTM, pending Jenkins. @HyukjinKwon, @holdenk, Do you have any other concerns? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #609: SPARK-1691: Support quoted arguments inside of spark-submi...
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/609 @ganeshm25 it seems to work in newer spark versions. i havent tried in spark 1.4.2. however its still very tricky to get it right and i would prefer a simpler solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18462: [SPARK-21333][Docs] Removed invalid joinTypes fro...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18462#discussion_r126071195 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1007,6 +1007,10 @@ class Dataset[T] private[sql]( JoinType(joinType), Some(condition.expr))).analyzed.asInstanceOf[Join] +if (joined.joinType == LeftSemi || joined.joinType == LeftAnti) { + throw new AnalysisException("Invalid join type in joinWith: " + joined.joinType) --- End diff -- Nit: `joined.joinType.sql`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126071223 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2638,4 +2638,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } } + + test("SPARK-21335: support un-aliased subquery") { +withTempView("v") { + Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") + checkAnswer(sql("SELECT i from (SELECT i FROM v)"), Row(1)) + + val e = intercept[AnalysisException](sql("SELECT v.i from (SELECT i FROM v)")) + assert(e.message == +"cannot resolve '`v.i`' given input columns: [_auto_generated_subquery_name.i]") --- End diff -- Then, the scope of breaking change is reduced into this kind of queries? ```scala scala> sc.version res0: String = 2.1.1 scala> Seq(1 -> "a").toDF("i", "j").createOrReplaceTempView("v") scala> sql("SELECT v.i from (SELECT i FROM v)").show +---+ | i| +---+ | 1| +---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics w...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18558 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #609: SPARK-1691: Support quoted arguments inside of spark-submi...
Github user ganeshm25 commented on the issue: https://github.com/apache/spark/pull/609 @koertkuipers i am trying to do achieve running the multiple driver-java-options with Spark 1.4.2 inside a bash script? is there a solution you found for this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18444 **[Test build #79314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79314/testReport)** for PR 18444 at commit [`f2774c6`](https://github.com/apache/spark/commit/f2774c639fdf653ec7d48127b529124dbbb9b60b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18558 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 We didn't change `spark.shuffle.io.numConnectionsPerPeer`. Our biggest cluster has 6000 `NodeManager`s. There are 50 executors running on a same host at the same time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18425: [SPARK-21217][SQL] Support ColumnVector.Array.to<...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18425 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18425: [SPARK-21217][SQL] Support ColumnVector.Array.toAr...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18425 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 @cloud-fan To be honest, it's a little bit tricky to reject "open blocks" by closing the connection. The following reconnection will surely have extra cost. In current change we are relying on retry mechanism of `RetryingBlockFetcher`. `spark.shuffle.io.maxRetries` and `spark.shuffle.io.retryWait` should also be tuned, with this change maybe their meanings become different, users should know this. This is the sacrifice for compatibility. It comes to me that can we add back `OpenBlocksFailed` and add a flag(default false)? If user wants to turned on, we can tell them they should upgrade the client. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18444 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18553: [SPARK-21327][SQL][PYSPARK] ArrayConstructor shou...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18553 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18557 `ColumnVector` is total internal in Spark 2.2, so there won't be 3rd party Spark library issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18558 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18558 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79309/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18553: [SPARK-21327][SQL][PYSPARK] ArrayConstructor should hand...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18553 Thanks for reviewing! merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18558 **[Test build #79309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79309/testReport)** for PR 18558 at commit [`dedafd9`](https://github.com/apache/spark/commit/dedafd95835ddd65118825d74c4592f35b73b3d8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/18388 > there are 200K+ connections and 3.5M blocks(FileSegmentManagedBuffer) being fetched. Did you use a large `spark.shuffle.io.numConnectionsPerPeer`? If not, the number of connections seems too large since each ShuffleClient should have only one connection to one shuffle service. How large is your cluster and how many applications are running at the same time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18557 BTW, thank you for swift reviews and feedbacks on my PR. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 Analyzing the heap dump, there are 200K+ connections and 3.5M blocks(`FileSegmentManagedBuffer`) being fetched. Yes, flow control is a good idea. But I still think it make much sense to control the concurrency. Reject some "open blocks" requests, thus we can have sufficient bandwidth for the existing connections and we can finish the reduce task as soon as possible. Simple flow control(slow down connections when pressure) can help avoid OOM, but it seems more reduce tasks will run longer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18557 I know that 'there is no usage of this API internally in Spark 2.2', but it's only for 2.2.0. My reason was any 3rd party Spark library cannot use `ColumnVector` for `float` type in Spark 2.2.1+. Anyway, @cloud-fan changes the bug type. If that means backporting is not allowed for this patch, I have no objection for the community decision. So, @kiszk and @cloud-fan . Given that, may I close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18307: [SPARK-21100][SQL] Add summary method as alternative to ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18307 **[Test build #79313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79313/testReport)** for PR 18307 at commit [`3b548cc`](https://github.com/apache/spark/commit/3b548cc3d5ad8928785fe644db9ea788dfb8fad2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18307: [SPARK-21100][SQL] Add summary method as alternative to ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18307 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18557 I've changed the ticket type from `bug` to `improvement`, adding a new API is not fixing a bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126068180 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,43 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME)) .map(col => col.getName).toSet -filters.collect { - case op @ BinaryComparison(a: Attribute, Literal(v, _: IntegralType)) => -s"${a.name} ${op.symbol} $v" - case op @ BinaryComparison(Literal(v, _: IntegralType), a: Attribute) => -s"$v ${op.symbol} ${a.name}" - case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType)) +object ExtractableLiteral { + def unapply(expr: Expression): Option[String] = expr match { +case Literal(value, _: IntegralType) => Some(value.toString) +case Literal(value, _: StringType) => Some(quoteStringLiteral(value.toString)) +case _ => None + } +} + +object ExtractableLiterals { + def unapply(exprs: Seq[Expression]): Option[Seq[String]] = { + exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) { + case (Some(accum), Some(value)) => Some(accum :+ value) + case _ => None +} + } +} + +lazy val convert: PartialFunction[Expression, String] = { + case In(a: Attribute, ExtractableLiterals(values)) if !varcharKeys.contains(a.name) => +val or = + values +.map(value => s"${a.name} = $value") +.reduce(_ + " or " + _) +"(" + or + ")" + case op @ BinaryComparison(a: Attribute, ExtractableLiteral(value)) if !varcharKeys.contains(a.name) => -s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}""" - case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute) +s"${a.name} ${op.symbol} $value" + case op @ BinaryComparison(ExtractableLiteral(value), a: Attribute) if !varcharKeys.contains(a.name) => -s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}""" -}.mkString(" and ") +s"$value ${op.symbol} ${a.name}" + case op @ And(expr1, expr2) => +s"(${convert(expr1)} and ${convert(expr2)})" + case op @ Or(expr1, expr2) => +s"(${convert(expr1)} or ${convert(expr2)})" +} + +filters.flatMap(f => Try(convert(f)).toOption).mkString(" and ") --- End diff -- I do think we should follow `InMemoryTableScanExec.buildFilters`. For example, if the left side of `And` is not supported but the right side is, and we can still push down the right side. But here, we simply catch the exception and push nothing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18557 We have not seen any failure in test suites. And, [there is no usage of this API](https://github.com/apache/spark/pull/17836#discussion_r114488839) in Spark 2.2. Does this missing cause any failure of test or application program? If so, it is good to put a sample program in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126067892 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,43 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME)) .map(col => col.getName).toSet -filters.collect { - case op @ BinaryComparison(a: Attribute, Literal(v, _: IntegralType)) => -s"${a.name} ${op.symbol} $v" - case op @ BinaryComparison(Literal(v, _: IntegralType), a: Attribute) => -s"$v ${op.symbol} ${a.name}" - case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType)) +object ExtractableLiteral { + def unapply(expr: Expression): Option[String] = expr match { +case Literal(value, _: IntegralType) => Some(value.toString) +case Literal(value, _: StringType) => Some(quoteStringLiteral(value.toString)) +case _ => None + } +} + +object ExtractableLiterals { + def unapply(exprs: Seq[Expression]): Option[Seq[String]] = { + exprs.map(ExtractableLiteral.unapply).foldLeft(Option(Seq.empty[String])) { + case (Some(accum), Some(value)) => Some(accum :+ value) + case _ => None +} + } +} + +lazy val convert: PartialFunction[Expression, String] = { + case In(a: Attribute, ExtractableLiterals(values)) if !varcharKeys.contains(a.name) => --- End diff -- cc @gatorsmile , any concerns to not do it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16697 LGTM, pending tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18556 Thank you @cloud-fan! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126067471 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,40 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { col.getType.startsWith(serdeConstants.CHAR_TYPE_NAME)) .map(col => col.getName).toSet -filters.collect { - case op @ BinaryComparison(a: Attribute, Literal(v, _: IntegralType)) => -s"${a.name} ${op.symbol} $v" - case op @ BinaryComparison(Literal(v, _: IntegralType), a: Attribute) => -s"$v ${op.symbol} ${a.name}" - case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType)) - if !varcharKeys.contains(a.name) => -s"""${a.name} ${op.symbol} ${quoteStringLiteral(v.toString)}""" - case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute) - if !varcharKeys.contains(a.name) => -s"""${quoteStringLiteral(v.toString)} ${op.symbol} ${a.name}""" -}.mkString(" and ") +def isExtractable(expr: Expression): Boolean = + expr match { +case Literal(_, _: IntegralType) | Literal(_, _: StringType) => true +case _ => false + } + +def extractValue(expr: Expression): String = + expr match { +case Literal(v, _: IntegralType) => v.toString +case Literal(v, _: StringType) => quoteStringLiteral(v.toString) + } + +lazy val convert: PartialFunction[Expression, String] = + { +case In(a: Attribute, exprs) +if !varcharKeys.contains(a.name) && exprs.forall(isExtractable) => + val or = +exprs + .map(expr => s"${a.name} = ${extractValue(expr)}") + .reduce(_ + " or " + _) + "(" + or + ")" +case op @ BinaryComparison(a: Attribute, expr2) --- End diff -- how about `ExtractLiteralToString`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18559 **[Test build #79312 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79312/testReport)** for PR 18559 at commit [`7279262`](https://github.com/apache/spark/commit/72792627d76e0e3452f84af1322a35e3f0d82580). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18556 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18288: [SPARK-21066][ML] LibSVM load just one input file
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18288 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18554: [SPARK-21306][ML] OneVsRest should cache weightCol if ne...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18554 I'm not familiar with R, and use grep to search "OneVsRest" and get nothing. Hence it seems that nothing is needed to do with R part. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...
Github user facaiy commented on the issue: https://github.com/apache/spark/pull/18523 @SparkQA test again, please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18557 Hi, @kiszk . I think this is a bug fix of `ColumnVector` as described in [SPARK-20566](https://issues.apache.org/jira/browse/SPARK-20566). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat in Lib...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18556 LGTM, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18556: [SPARK-21326][SPARK-21066][ML] Use TextFileFormat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18556#discussion_r126066952 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala --- @@ -102,6 +104,25 @@ object MLUtils extends Logging { .map(parseLibSVMRecord) } + private[spark] def parseLibSVMFile( + sparkSession: SparkSession, paths: Seq[String]): RDD[(Double, Array[Int], Array[Double])] = { +val lines = sparkSession.baseRelationToDataFrame( + DataSource.apply( +sparkSession, +paths = paths, +className = classOf[TextFileFormat].getName + ).resolveRelation(checkFilesExist = false)) + .select("value") --- End diff -- is this needed? I think text format is known to have only one column. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18559 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126066595 --- Diff: sql/core/src/test/resources/sql-tests/results/string-functions.sql.out --- @@ -30,20 +30,20 @@ abc -- !query 3 EXPLAIN EXTENDED SELECT (col1 || col2 || col3 || col4) col -FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) t +FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) -- !query 3 schema struct -- !query 3 output == Parsed Logical Plan == 'Project [concat(concat(concat('col1, 'col2), 'col3), 'col4) AS col#x] -+- 'SubqueryAlias t ++- 'SubqueryAlias _auto_generated_subquery_name --- End diff -- I think it's ok, as the name is quite clear about it's auto-generated. And I think it's hard to hide it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18558 LGTM pending jenkins, also cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18559 **[Test build #79311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79311/testReport)** for PR 18559 at commit [`4d99c11`](https://github.com/apache/spark/commit/4d99c11802efa2d6ee5c36de5941226bf12e1a55). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126066489 --- Diff: sql/core/src/test/resources/sql-tests/results/string-functions.sql.out --- @@ -30,20 +30,20 @@ abc -- !query 3 EXPLAIN EXTENDED SELECT (col1 || col2 || col3 || col4) col -FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) t +FROM (SELECT id col1, id col2, id col3, id col4 FROM range(10)) -- !query 3 schema struct -- !query 3 output == Parsed Logical Plan == 'Project [concat(concat(concat('col1, 'col2), 'col3), 'col4) AS col#x] -+- 'SubqueryAlias t ++- 'SubqueryAlias _auto_generated_subquery_name --- End diff -- Do we want to show the internal subquery name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18559#discussion_r126066311 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -751,15 +751,17 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with Logging { * hooks. */ override def visitAliasedQuery(ctx: AliasedQueryContext): LogicalPlan = withOrigin(ctx) { -// The unaliased subqueries in the FROM clause are disallowed. Instead of rejecting it in -// parser rules, we handle it here in order to provide better error message. -if (ctx.strictIdentifier == null) { - throw new ParseException("The unaliased subqueries in the FROM clause are not supported.", -ctx) +val alias = if (ctx.strictIdentifier == null) { + // For un-aliased subqueries, ues a default alias name that is not likely to conflict with --- End diff -- nit: typo `ues`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18559: [SPARK-21335][SQL] support un-aliased subquery
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/18559 [SPARK-21335][SQL] support un-aliased subquery ## What changes were proposed in this pull request? un-aliased subquery is supported by Spark SQL for a long time. Its semantic was not well defined and had confusing behaviors, and it's not a standard SQL syntax, so we disallowed it in https://issues.apache.org/jira/browse/SPARK-20690 . However, this is a breaking change, and we do have existing queries using un-aliased subquery. We should add the support back and fix its semantic. This PR fixes the un-aliased subquery by assigning a default alias name. ## How was this patch tested? new regression test You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark sub-query Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18559.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18559 commit 4d99c11802efa2d6ee5c36de5941226bf12e1a55 Author: Wenchen Fan Date: 2017-07-07T04:03:34Z support un-aliased subquery --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18559 cc @rxin @viirya --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18557 @dongjoon-hyun Is there any reason to backport this to previous versions? This is because we had such [a discussion](https://github.com/apache/spark/pull/17836#pullrequestreview-35957231). Obviously, it makes sense to support the latest one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18557 Hi, @cloud-fan . This is the backport for #17836 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18557 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18557 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79306/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18557: [SPARK-20566][SQL][BRANCH-2.2] ColumnVector should suppo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18557 **[Test build #79306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79306/testReport)** for PR 18557 at commit [`39839bf`](https://github.com/apache/spark/commit/39839bf5b70aab603e538d424cda00ec7cde1402). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16697 **[Test build #79310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79310/testReport)** for PR 16697 at commit [`554cd39`](https://github.com/apache/spark/commit/554cd391b3ddb5fb3f7c52950610e832ad40047b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79308/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18465 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18465: [SPARK-21093][R] Terminate R's worker processes in the p...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18465 **[Test build #79308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79308/testReport)** for PR 18465 at commit [`c08ccd5`](https://github.com/apache/spark/commit/c08ccd59f438fce1f841aa70f760ffb9dc24cf50). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16697: [SPARK-19358][CORE] LiveListenerBus shall log the event ...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/16697 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18558: [SPARK-20703][SQL][FOLLOW-UP] Associate metrics with dat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18558 **[Test build #79309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79309/testReport)** for PR 18558 at commit [`dedafd9`](https://github.com/apache/spark/commit/dedafd95835ddd65118825d74c4592f35b73b3d8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org