[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16720 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72103/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16720 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16720 **[Test build #72103 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72103/testReport)** for PR 16720 at commit [`f51f504`](https://github.com/apache/spark/commit/f51f504acb3a64da27bf0bddbb156c68d62d89bb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16720 Sure, I've simplified it. Good point on the ordering - digging into it looks like it's just file system search order, which really is not reliable. We could certainly add a test util - though seems like some tests are different though, for example test_context doesn't need a SparkSession. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16720 **[Test build #72103 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72103/testReport)** for PR 16720 at commit [`f51f504`](https://github.com/apache/spark/commit/f51f504acb3a64da27bf0bddbb156c68d62d89bb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16721: [SPARKR][DOCS] update R API doc for subset/extrac...
GitHub user felixcheung reopened a pull request: https://github.com/apache/spark/pull/16721 [SPARKR][DOCS] update R API doc for subset/extract ## What changes were proposed in this pull request? With extract `[[` or replace `[[<-`, the parameter `i` is a column index, that needs to be corrected in doc. Also a few minor updates: examples, links. ## How was this patch tested? manual You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rsubsetdoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16721.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16721 commit 2c1f67353b6049e7679947d9c6c1e9901d7e1c9f Author: Felix CheungDate: 2017-01-27T22:50:09Z update doc commit bff1e56af55fdbf5e216d49a5e673cee6085cc13 Author: Felix Cheung Date: 2017-01-27T22:58:00Z vignettes error commit de56852daf03de33fbc6dfa0280e1ea5f5f32cc7 Author: Felix Cheung Date: 2017-01-27T23:00:19Z do not link to rename --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16721: [SPARKR][DOCS] update R API doc for subset/extrac...
Github user felixcheung closed the pull request at: https://github.com/apache/spark/pull/16721 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16636: [SPARK-19279] [SQL] Infer Schema for Hive Serde Tables a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16636 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72101/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16636: [SPARK-19279] [SQL] Infer Schema for Hive Serde Tables a...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16636 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16636: [SPARK-19279] [SQL] Infer Schema for Hive Serde Tables a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16636 **[Test build #72101 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72101/testReport)** for PR 16636 at commit [`c5cfa1a`](https://github.com/apache/spark/commit/c5cfa1afe3896ab92b34b833ba6c18cfce88b224). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16726: [SPARK-19390][SQL] Replace the unnecessary usages of hiv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16726 **[Test build #72102 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72102/testReport)** for PR 16726 at commit [`e9e7486`](https://github.com/apache/spark/commit/e9e748601da00e594f73b68fe31f3b80385a0bac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16724 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72100/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16724 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16724 **[Test build #72100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72100/testReport)** for PR 16724 at commit [`aaa3c3d`](https://github.com/apache/spark/commit/aaa3c3dd42a9e1b79d30d31790560464be6df6c1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16726: [SPARK-19390][SQL] Replace the unnecessary usages...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/16726 [SPARK-19390][SQL] Replace the unnecessary usages of hiveQlTable ### What changes were proposed in this pull request? `catalogTable` is the native table metadata structure for Spark SQL. Thus, we should avoid using Hive's table metadata structure `Table` in our code base. This PR is to replace it. ### How was this patch tested? The existing test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark cleanupMetastoreRelation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16726.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16726 commit e9e748601da00e594f73b68fe31f3b80385a0bac Author: gatorsmileDate: 2017-01-28T06:11:12Z replace hiveQlTable by CatalogTable --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16700#discussion_r98325754 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat spec, partitionColumnNames, tablePath) try { tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath) + + // If the newSpec contains more than one depth partition, FileSystem.rename just deletes + // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted. + // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions, + // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then + // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1' + // is still exists, which we also need to delete + val delHivePartPathAfterRename = getExtraPartPathCreatedByHive( --- End diff -- So far, the partition rename DDL we support is for a single pair of partition spec. That is, `ALTER TABLE table PARTITION spec1 RENAME TO PARTITION spec2`. This is not an issue for end users. Thus, your concern looks reasonable, but I think we should not support the multi-partition renaming in the SessionCatalog and ExternalCatalog. It just makes the code more complex for error handling. Let me remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13072: [SPARK-15288] [Mesos] Mesos dispatcher should handle gra...
Github user devaraj-kavali commented on the issue: https://github.com/apache/spark/pull/13072 MesosClusterDispatcher also has multiple threads like Executor, when any one thread terminates in the MesosClusterDispatcher process due to some error/exception it keeps running without performing the terminated thread functionality. I think we need to handle those uncaught exceptions from the MesosClusterDispatcher process threads using the UncaughtExceptionHandler and take the action instead of running the MesosClusterDispatcher without performing the functionality and without notifying the user. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should have th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16725 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16725: [SPARK-19377] [WEBUI] [CORE] Killed tasks should ...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/16725 [SPARK-19377] [WEBUI] [CORE] Killed tasks should have the status as KILLED ## What changes were proposed in this pull request? Copying of the killed status was missing while getting the newTaskInfo object by dropping the unnecessary details to reduce the memory usage. This patch adds the copying of the killed status to newTaskInfo object, this will correct the display of the status from wrong status to KILLED status in Web UI. ## How was this patch tested? Current behaviour of displaying tasks in stage UI page, | Index | ID | Attempt | Status | Locality Level | Executor ID / Host | Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle Write Size / Records | Errors | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |143|10 |0 |SUCCESS|NODE_LOCAL |6 / x.xx.x.x stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 | |0.0 B / 0|TaskKilled (killed intentionally)| |156|11 |0 |SUCCESS|NODE_LOCAL |5 / x.xx.x.x stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 | |0.0 B / 0|TaskKilled (killed intentionally)| Web UI display after applying the patch, | Index | ID | Attempt | Status | Locality Level | Executor ID / Host | Launch Time | Duration | GC Time | Input Size / Records | Write Time | Shuffle Write Size / Records | Errors | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |143|10 |0 |KILLED |NODE_LOCAL |6 / x.xx.x.x stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 | | 0.0 B / 0 | TaskKilled (killed intentionally)| |156|11 |0 |KILLED |NODE_LOCAL |5 / x.xx.x.x stdout stderr|2017/01/25 07:49:27 |0 ms | |0.0 B / 0 | |0.0 B / 0 | TaskKilled (killed intentionally)| You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-19377 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16725.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16725 commit 6206d109b646e55223a4b162a37e70f42f4570a1 Author: Devaraj KDate: 2017-01-28T05:53:21Z [SPARK-19377] [WEBUI] [CORE] Killed tasks should have the status as KILLED --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16700#discussion_r98325213 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat spec, partitionColumnNames, tablePath) try { tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath) + + // If the newSpec contains more than one depth partition, FileSystem.rename just deletes + // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted. + // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions, + // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then + // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1' + // is still exists, which we also need to delete + val delHivePartPathAfterRename = getExtraPartPathCreatedByHive( --- End diff -- `client.renamePartitions` is called at the beginning of `renamePartitions` for all specs at once. It creates the directory `a=1` and `a=1/b=2` and `a=1/b=3`. When you iterates specs and rename the directories with FileSystem.rename, in the first iteration, `a=1/b=2` is renamed, and `a=1` is deleted in this change, then `a=1/b=3` will be deleted too. So in next iteration, the renaming of `a=1/b=3` to `A=1/B=3` will fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16700#discussion_r98325110 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat spec, partitionColumnNames, tablePath) try { tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath) + + // If the newSpec contains more than one depth partition, FileSystem.rename just deletes + // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted. + // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions, + // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then + // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1' + // is still exists, which we also need to delete + val delHivePartPathAfterRename = getExtraPartPathCreatedByHive( --- End diff -- The path `a=1` was created when you call `client.renamePartitions`, right? Based on my understanding, when you rename `A=1/B=3`, Hive will create the directory `a=1` and `a=1/b=3`. Thus, the rename will not fail. Have you made a try? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72099/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72099/testReport)** for PR 16722 at commit [`2112720`](https://github.com/apache/spark/commit/21127206db1c42710e63174904267663c9d92790). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16636: [SPARK-19279] [SQL] Infer Schema for Hive Serde T...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16636#discussion_r98324289 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -455,4 +462,133 @@ private[spark] object HiveUtils extends Logging { case (decimal, DecimalType()) => decimal.toString case (other, tpe) if primitiveTypes contains tpe => other.toString } + + /** Converts the native StructField to Hive's FieldSchema. */ + private def toHiveColumn(c: StructField): FieldSchema = { +val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) { + c.metadata.getString(HiveUtils.hiveTypeString) +} else { + c.dataType.catalogString +} +new FieldSchema(c.name, typeString, c.getComment.orNull) + } + + /** Builds the native StructField from Hive's FieldSchema. */ + private def fromHiveColumn(hc: FieldSchema): StructField = { +val columnType = try { + CatalystSqlParser.parseDataType(hc.getType) +} catch { + case e: ParseException => +throw new SparkException("Cannot recognize hive type string: " + hc.getType, e) +} + +val metadata = new MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build() +val field = StructField( + name = hc.getName, + dataType = columnType, + nullable = true, + metadata = metadata) +Option(hc.getComment).map(field.withComment).getOrElse(field) + } + + // TODO: merge this with HiveClientImpl#toHiveTable --- End diff -- So far, it is a little bit tricky when merging them, because our execution is using 1.2.1, but Hive metadata APIs support the versions from 0.12 to 1.2. Thus, it does not make sense to do it. So far, the schema inference is not using metadata Hive client. I checked the code. The changes between 0.12 and 1.2 look fine to me. Schema inference should work correctly. I think I need to add a test case to VersionSuite.scala. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16724 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16724 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72097/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16724 **[Test build #72097 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72097/testReport)** for PR 16724 at commit [`b84c08b`](https://github.com/apache/spark/commit/b84c08bd6f1d66e09cafa9026b7da48b3f67ece4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16636: [SPARK-19279] [SQL] Block Creating a Hive Table W...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16636#discussion_r98324106 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -455,4 +462,133 @@ private[spark] object HiveUtils extends Logging { case (decimal, DecimalType()) => decimal.toString case (other, tpe) if primitiveTypes contains tpe => other.toString } + + /** Converts the native StructField to Hive's FieldSchema. */ + private def toHiveColumn(c: StructField): FieldSchema = { +val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) { + c.metadata.getString(HiveUtils.hiveTypeString) +} else { + c.dataType.catalogString +} +new FieldSchema(c.name, typeString, c.getComment.orNull) + } + + /** Builds the native StructField from Hive's FieldSchema. */ + private def fromHiveColumn(hc: FieldSchema): StructField = { +val columnType = try { + CatalystSqlParser.parseDataType(hc.getType) +} catch { + case e: ParseException => +throw new SparkException("Cannot recognize hive type string: " + hc.getType, e) +} + +val metadata = new MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build() +val field = StructField( + name = hc.getName, + dataType = columnType, + nullable = true, + metadata = metadata) +Option(hc.getComment).map(field.withComment).getOrElse(field) + } + + // TODO: merge this with HiveClientImpl#toHiveTable + /** Converts the native table metadata representation format CatalogTable to Hive's Table. */ + def toHiveTable(catalogTable: CatalogTable): HiveTable = { +// We start by constructing an API table as Hive performs several important transformations +// internally when converting an API table to a QL table. +val tTable = new org.apache.hadoop.hive.metastore.api.Table() +tTable.setTableName(catalogTable.identifier.table) +tTable.setDbName(catalogTable.database) + +val tableParameters = new java.util.HashMap[String, String]() +tTable.setParameters(tableParameters) +catalogTable.properties.foreach { case (k, v) => tableParameters.put(k, v) } + +tTable.setTableType(catalogTable.tableType match { + case CatalogTableType.EXTERNAL => HiveTableType.EXTERNAL_TABLE.toString + case CatalogTableType.MANAGED => HiveTableType.MANAGED_TABLE.toString + case CatalogTableType.VIEW => HiveTableType.VIRTUAL_VIEW.toString +}) + +val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor() +tTable.setSd(sd) + +// Note: In Hive the schema and partition columns must be disjoint sets +val (partCols, schema) = catalogTable.schema.map(toHiveColumn).partition { c => + catalogTable.partitionColumnNames.contains(c.getName) +} +sd.setCols(schema.asJava) +tTable.setPartitionKeys(partCols.asJava) + +catalogTable.storage.locationUri.foreach(sd.setLocation) +catalogTable.storage.inputFormat.foreach(sd.setInputFormat) +catalogTable.storage.outputFormat.foreach(sd.setOutputFormat) + +val serdeInfo = new org.apache.hadoop.hive.metastore.api.SerDeInfo +catalogTable.storage.serde.foreach(serdeInfo.setSerializationLib) +sd.setSerdeInfo(serdeInfo) + +val serdeParameters = new java.util.HashMap[String, String]() +catalogTable.storage.properties.foreach { case (k, v) => serdeParameters.put(k, v) } +serdeInfo.setParameters(serdeParameters) + +new HiveTable(tTable) + } + + /** + * Converts the native partition metadata representation format CatalogTablePartition to + * Hive's Partition. + */ + def toHivePartition( + catalogTable: CatalogTable, + hiveTable: HiveTable, + partition: CatalogTablePartition): HivePartition = { +val tPartition = new org.apache.hadoop.hive.metastore.api.Partition +tPartition.setDbName(catalogTable.database) +tPartition.setTableName(catalogTable.identifier.table) + tPartition.setValues(catalogTable.partitionColumnNames.map(partition.spec(_)).asJava) + +val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor() +tPartition.setSd(sd) + +// Note: In Hive the schema and partition columns must be disjoint sets +val schema = catalogTable.schema.map(toHiveColumn).filter { c => + !catalogTable.partitionColumnNames.contains(c.getName) +} +sd.setCols(schema.asJava) + +partition.storage.locationUri.foreach(sd.setLocation) +partition.storage.inputFormat.foreach(sd.setInputFormat) +
[GitHub] spark issue #16636: [SPARK-19279] [SQL] Block Creating a Hive Table With an ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16636 **[Test build #72101 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72101/testReport)** for PR 16636 at commit [`c5cfa1a`](https://github.com/apache/spark/commit/c5cfa1afe3896ab92b34b833ba6c18cfce88b224). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16724 **[Test build #72100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72100/testReport)** for PR 16724 at commit [`aaa3c3d`](https://github.com/apache/spark/commit/aaa3c3dd42a9e1b79d30d31790560464be6df6c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16636: [SPARK-19279] [SQL] Block Creating a Hive Table W...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16636#discussion_r98323927 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -455,4 +462,133 @@ private[spark] object HiveUtils extends Logging { case (decimal, DecimalType()) => decimal.toString case (other, tpe) if primitiveTypes contains tpe => other.toString } + + /** Converts the native StructField to Hive's FieldSchema. */ + private def toHiveColumn(c: StructField): FieldSchema = { +val typeString = if (c.metadata.contains(HiveUtils.hiveTypeString)) { + c.metadata.getString(HiveUtils.hiveTypeString) +} else { + c.dataType.catalogString +} +new FieldSchema(c.name, typeString, c.getComment.orNull) + } + + /** Builds the native StructField from Hive's FieldSchema. */ + private def fromHiveColumn(hc: FieldSchema): StructField = { +val columnType = try { + CatalystSqlParser.parseDataType(hc.getType) +} catch { + case e: ParseException => +throw new SparkException("Cannot recognize hive type string: " + hc.getType, e) +} + +val metadata = new MetadataBuilder().putString(HiveUtils.hiveTypeString, hc.getType).build() +val field = StructField( + name = hc.getName, + dataType = columnType, + nullable = true, + metadata = metadata) +Option(hc.getComment).map(field.withComment).getOrElse(field) + } + + // TODO: merge this with HiveClientImpl#toHiveTable + /** Converts the native table metadata representation format CatalogTable to Hive's Table. */ + def toHiveTable(catalogTable: CatalogTable): HiveTable = { +// We start by constructing an API table as Hive performs several important transformations +// internally when converting an API table to a QL table. +val tTable = new org.apache.hadoop.hive.metastore.api.Table() +tTable.setTableName(catalogTable.identifier.table) +tTable.setDbName(catalogTable.database) + +val tableParameters = new java.util.HashMap[String, String]() +tTable.setParameters(tableParameters) +catalogTable.properties.foreach { case (k, v) => tableParameters.put(k, v) } + +tTable.setTableType(catalogTable.tableType match { + case CatalogTableType.EXTERNAL => HiveTableType.EXTERNAL_TABLE.toString + case CatalogTableType.MANAGED => HiveTableType.MANAGED_TABLE.toString + case CatalogTableType.VIEW => HiveTableType.VIRTUAL_VIEW.toString +}) + +val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor() +tTable.setSd(sd) + +// Note: In Hive the schema and partition columns must be disjoint sets +val (partCols, schema) = catalogTable.schema.map(toHiveColumn).partition { c => + catalogTable.partitionColumnNames.contains(c.getName) +} +sd.setCols(schema.asJava) +tTable.setPartitionKeys(partCols.asJava) + +catalogTable.storage.locationUri.foreach(sd.setLocation) +catalogTable.storage.inputFormat.foreach(sd.setInputFormat) +catalogTable.storage.outputFormat.foreach(sd.setOutputFormat) + +val serdeInfo = new org.apache.hadoop.hive.metastore.api.SerDeInfo +catalogTable.storage.serde.foreach(serdeInfo.setSerializationLib) +sd.setSerdeInfo(serdeInfo) + +val serdeParameters = new java.util.HashMap[String, String]() +catalogTable.storage.properties.foreach { case (k, v) => serdeParameters.put(k, v) } +serdeInfo.setParameters(serdeParameters) + +new HiveTable(tTable) + } + + /** + * Converts the native partition metadata representation format CatalogTablePartition to + * Hive's Partition. + */ + def toHivePartition( + catalogTable: CatalogTable, + hiveTable: HiveTable, + partition: CatalogTablePartition): HivePartition = { +val tPartition = new org.apache.hadoop.hive.metastore.api.Partition +tPartition.setDbName(catalogTable.database) +tPartition.setTableName(catalogTable.identifier.table) + tPartition.setValues(catalogTable.partitionColumnNames.map(partition.spec(_)).asJava) + +val sd = new org.apache.hadoop.hive.metastore.api.StorageDescriptor() +tPartition.setSd(sd) + +// Note: In Hive the schema and partition columns must be disjoint sets +val schema = catalogTable.schema.map(toHiveColumn).filter { c => + !catalogTable.partitionColumnNames.contains(c.getName) +} +sd.setCols(schema.asJava) + +partition.storage.locationUri.foreach(sd.setLocation) +partition.storage.inputFormat.foreach(sd.setInputFormat) +
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98322977 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala --- @@ -78,14 +78,18 @@ object Canonicalize extends { case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l) -case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r) -case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l) -case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r) -case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l) -case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r) -case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l) -case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r) +case Not(GreaterThan(l, r)) => + assert(l.hashCode() <= r.hashCode()) --- End diff -- thanks! maybe an alternative way is to add comments saying it's guaranteed that `l.hashcode <= r.hashcode`, otherwise people might wonder why there is no `case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode()` at their first glance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16724 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16724 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72096/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16724 **[Test build #72096 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72096/testReport)** for PR 16724 at commit [`93d3806`](https://github.com/apache/spark/commit/93d380620c411dc33c14a4787f2ceee28e9c155c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...
Github user lw-lin commented on a diff in the pull request: https://github.com/apache/spark/pull/16719#discussion_r98322886 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala --- @@ -75,10 +107,14 @@ class ExpressionSetSuite extends SparkFunSuite { setTest(1, aUpper >= bUpper, bUpper <= aUpper) // `Not` canonicalization - setTest(1, Not(aUpper > 1), aUpper <= 1, Not(Literal(1) < aUpper), Literal(1) >= aUpper) - setTest(1, Not(aUpper < 1), aUpper >= 1, Not(Literal(1) > aUpper), Literal(1) <= aUpper) - setTest(1, Not(aUpper >= 1), aUpper < 1, Not(Literal(1) <= aUpper), Literal(1) > aUpper) - setTest(1, Not(aUpper <= 1), aUpper > 1, Not(Literal(1) >= aUpper), Literal(1) < aUpper) + setTest(1, Not(maxHash > 1), maxHash <= 1, Not(Literal(1) < maxHash), Literal(1) >= maxHash) + setTest(1, Not(minHash > 1), minHash <= 1, Not(Literal(1) < minHash), Literal(1) >= minHash) + setTest(1, Not(maxHash < 1), maxHash >= 1, Not(Literal(1) > maxHash), Literal(1) <= maxHash) + setTest(1, Not(minHash < 1), minHash >= 1, Not(Literal(1) > minHash), Literal(1) <= minHash) + setTest(1, Not(maxHash >= 1), maxHash < 1, Not(Literal(1) <= maxHash), Literal(1) > maxHash) + setTest(1, Not(minHash >= 1), minHash < 1, Not(Literal(1) <= minHash), Literal(1) > minHash) + setTest(1, Not(maxHash <= 1), maxHash > 1, Not(Literal(1) >= maxHash), Literal(1) < maxHash) + setTest(1, Not(minHash <= 1), minHash > 1, Not(Literal(1) >= minHash), Literal(1) < minHash) --- End diff -- yea sure they are covered correctly even prior to this patch's changes! the previous `aUpper`'hashcode is either greater than or less than `1`'s hashcode but can not be both, while this change aims to test both cases -- but I'm quite open to revert the changes if they are considered unnecessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72099/testReport)** for PR 16722 at commit [`2112720`](https://github.com/apache/spark/commit/21127206db1c42710e63174904267663c9d92790). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16700#discussion_r98322377 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat spec, partitionColumnNames, tablePath) try { tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath) + + // If the newSpec contains more than one depth partition, FileSystem.rename just deletes + // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted. + // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions, + // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then + // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1' + // is still exists, which we also need to delete + val delHivePartPathAfterRename = getExtraPartPathCreatedByHive( --- End diff -- Hmmm, could it possibly have multiple specs sharing the same parent directory, e.g., 'A=1/B=2', 'A=1/B=3', ...? If so, when you delete the path 'a=1' here, in processing the next spec 'A=1/B=3', I think the rename will fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72098/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72098/testReport)** for PR 16722 at commit [`8278724`](https://github.com/apache/spark/commit/827872489194e46421263c28231beb9cb6646dfe). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72098/testReport)** for PR 16722 at commit [`8278724`](https://github.com/apache/spark/commit/827872489194e46421263c28231beb9cb6646dfe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14725 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14725 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72092/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14725 **[Test build #72092 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72092/testReport)** for PR 14725 at commit [`8b401ec`](https://github.com/apache/spark/commit/8b401ecc814a31f600da3fe28c4ff393bd4b3269). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16724 **[Test build #72097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72097/testReport)** for PR 16724 at commit [`b84c08b`](https://github.com/apache/spark/commit/b84c08bd6f1d66e09cafa9026b7da48b3f67ece4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16700: [SPARK-19359][SQL]clear useless path after rename a part...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16700 Thanks! Merging it to master. You can fix the minor comments in your other PRs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16716: [SPARK-19378][SS] Ensure continuity of stateOpera...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/16716#discussion_r98320764 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala --- @@ -171,6 +174,42 @@ class StreamingQueryStatusAndProgressSuite extends StreamTest { query.stop() } } + + test("SPARK-19378: Continue reporting stateOp metrics even if there is no active trigger") { +import testImplicits._ + +withSQLConf(SQLConf.STREAMING_NO_DATA_PROGRESS_EVENT_INTERVAL.key -> "10") { + val inputData = MemoryStream[Int] + + val query = inputData.toDS().toDF("value") +.select('value) +.groupBy($"value") +.agg(count("*")) +.writeStream +.queryName("metric_continuity") +.format("memory") +.outputMode("complete") +.start() + try { +inputData.addData(1, 2) +query.processAllAvailable() + +val progress = query.lastProgress +assert(progress.stateOperators.length > 0) +// Should emit new progresses every 10 ms, but we could be facing a slow Jenkins +eventually(timeout(1 minute)) { + val nextProgress = query.lastProgress + assert(nextProgress.timestamp !== progress.timestamp) + assert(nextProgress.numInputRows === 0) + assert(nextProgress.stateOperators.head.numRowsTotal === 2) + assert(nextProgress.stateOperators.head.numRowsTotal === 2) --- End diff -- why is this line twice? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16720: [SPARK-19387][SPARKR] Tests do not run with SparkR sourc...
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/16720 I am not sure tests are ever meant to run on a cluster (see the number of uses of LocalSparkContext in core/src/test/scala) -- The main reason I dont want to introduce the 'first test' approach is that we are then relying too much on test names not clashing / getting in front of each other which seems fragile. The other thing that might be good is to create a test util function like `initializeTestSparkContext` and inside that we put both the session start and install stuff. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16699 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72095/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16699 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16723 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72094/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16723 **[Test build #72094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72094/testReport)** for PR 16723 at commit [`fa522e6`](https://github.com/apache/spark/commit/fa522e6fba43faf481935f1204ddf9c13d82227f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16699 **[Test build #72095 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72095/testReport)** for PR 16699 at commit [`52bc32b`](https://github.com/apache/spark/commit/52bc32b2d86b2cd5ce092f86ee61f8fe9aebec5d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16723 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16043 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72088/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16043 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16043: [SPARK-18601][SQL] Simplify Create/Get complex expressio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16043 **[Test build #72088 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72088/testReport)** for PR 16043 at commit [`91001fc`](https://github.com/apache/spark/commit/91001fc97af2e80272ef29f90038cc99283ca258). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16650 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72087/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16650: [SPARK-16554][CORE] Automatically Kill Executors and Nod...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16650 **[Test build #72087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72087/testReport)** for PR 16650 at commit [`eed4112`](https://github.com/apache/spark/commit/eed4112c092d49b4eafab363f3b0a16d83ec7c9d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16700 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16724 cc @cloud-fan @rxin @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16700#discussion_r98317878 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat spec, partitionColumnNames, tablePath) try { tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath) + + // If the newSpec contains more than one depth partition, FileSystem.rename just deletes + // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted. + // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions, + // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then + // although we renamed it to 'A=1/B=2', 'a=1/b=2' in FileSystem is deleted, but 'a=1' --- End diff -- Either `although` or `but` needs to be deleted. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16700#discussion_r98317808 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat spec, partitionColumnNames, tablePath) try { tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath) + + // If the newSpec contains more than one depth partition, FileSystem.rename just deletes + // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted. + // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions, + // the location path in FileSystem is changed to 'a=1/b=2', which is wrongPath, then --- End diff -- `, then` -> `. Then` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16700#discussion_r98317719 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -899,6 +919,21 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat spec, partitionColumnNames, tablePath) try { tablePath.getFileSystem(hadoopConf).rename(wrongPath, rightPath) + + // If the newSpec contains more than one depth partition, FileSystem.rename just deletes + // the leaf(i.e. wrongPath), we should check if wrongPath's parents need to be deleted. + // For example, give a newSpec 'A=1/B=2', after calling Hive's client.renamePartitions, --- End diff -- `give` -> `given` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16700#discussion_r98317671 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -839,6 +839,26 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v } } + + /** + * The partition path created by Hive is in lowercase, while Spark SQL will + * rename it with the partition name in partitionColumnNames, and this function + * returns the extra lowercase path created by Hive, and then we can delete it. --- End diff -- Nit: all of them are commas. You need to use periods. : ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16700: [SPARK-19359][SQL]clear useless path after rename...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16700#discussion_r98317696 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -839,6 +839,26 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat spec.map { case (k, v) => partCols.find(_.equalsIgnoreCase(k)).get -> v } } + + /** + * The partition path created by Hive is in lowercase, while Spark SQL will + * rename it with the partition name in partitionColumnNames, and this function + * returns the extra lowercase path created by Hive, and then we can delete it. + * e.g. /path/A=1/B=2/C=3 is changed to /path/A=4/B=5/C=6, this function returns --- End diff -- The same issue here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows after ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16724 **[Test build #72096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72096/testReport)** for PR 16724 at commit [`93d3806`](https://github.com/apache/spark/commit/93d380620c411dc33c14a4787f2ceee28e9c155c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16724: [SPARK-19352][WIP][SQL] Keep sort order of rows a...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/16724 [SPARK-19352][WIP][SQL] Keep sort order of rows after external sorter when writing ## What changes were proposed in this pull request? WIP ## How was this patch tested? Will add test case later. Please review http://spark.apache.org/contributing.html before opening a pull request You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 keep-sort-order-after-external-sorter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16724.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16724 commit 93d380620c411dc33c14a4787f2ceee28e9c155c Author: Liang-Chi HsiehDate: 2017-01-28T00:45:11Z Keep sort order of rows after external sorter when writing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16699 **[Test build #72095 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72095/testReport)** for PR 16699 at commit [`52bc32b`](https://github.com/apache/spark/commit/52bc32b2d86b2cd5ce092f86ee61f8fe9aebec5d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16699: [SPARK-18710][ML] Add offset in GLM
Github user actuaryzhang commented on the issue: https://github.com/apache/spark/pull/16699 @zhengruifeng Thanks for the suggestions. Added casting and instrumentation. @imatiach-msft Thanks for the clarification! It is probably worth another PR to clean up all tests in GLM. Let me know if there is any additional comments! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72093/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72093/testReport)** for PR 16722 at commit [`2729a63`](https://github.com/apache/spark/commit/2729a63a8f5eab7f0eb88a9b995175bda1f82a1e). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16723 **[Test build #72094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72094/testReport)** for PR 16723 at commit [`fa522e6`](https://github.com/apache/spark/commit/fa522e6fba43faf481935f1204ddf9c13d82227f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Py...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16723 @wangmiao1981 Would you mind checking this? It has small fixes I noticed when reviewing your PR for Python LinearSVC. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16723: [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes fo...
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/16723 [SPARK-19389][ML][PYTHON][DOC] Minor doc fixes for ML Python Params and LinearSVC ## What changes were proposed in this pull request? * Removed Since tags in Python Params since they are inherited by other classes * Fixed doc links for LinearSVC ## How was this patch tested? * doc tests * generating docs locally and checking manually You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkbradley/spark pyparam-fix-doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16723.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16723 commit fa522e6fba43faf481935f1204ddf9c13d82227f Author: Joseph K. BradleyDate: 2017-01-28T00:28:59Z removed Since tags in Python Params since they are inherited by other classes. fixed doc links for LinearSVC --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16636: [SPARK-19279] [SQL] Block Creating a Hive Table W...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16636#discussion_r98315149 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1527,6 +1527,21 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { } } + test("create a data source table without schema") { +import testImplicits._ +withTempPath { tempDir => + withTable("tab1", "tab2") { +(("a", "b") :: Nil).toDF().write.json(tempDir.getCanonicalPath) + +val e = intercept[AnalysisException] { sql("CREATE TABLE tab1 USING json") }.getMessage --- End diff -- This error message is not from the code added by this PR. It is from [the original logics](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L343-L345). The error message is right if our file-based data sources are unable to infer the schema. Sure, I will add a test case for `LibSVM ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16636: [SPARK-19279] [SQL] Block Creating a Hive Table With an ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16636 : ) Done. Found a solution to infer the schema of Hive Serde tables. Let me clean the code now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72093/testReport)** for PR 16722 at commit [`2729a63`](https://github.com/apache/spark/commit/2729a63a8f5eab7f0eb88a9b995175bda1f82a1e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72091/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72091/testReport)** for PR 16722 at commit [`7dc1437`](https://github.com/apache/spark/commit/7dc1437df21999554e42d35d1d544839074414cf). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16694 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/16694 LGTM, thank you! Merging with master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16722 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16722 **[Test build #72091 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72091/testReport)** for PR 16722 at commit [`7dc1437`](https://github.com/apache/spark/commit/7dc1437df21999554e42d35d1d544839074414cf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14725 **[Test build #72092 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72092/testReport)** for PR 14725 at commit [`8b401ec`](https://github.com/apache/spark/commit/8b401ecc814a31f600da3fe28c4ff393bd4b3269). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16722: [SPARK-9478][ML][MLlib] Add sample weights to decision t...
Github user sethah commented on the issue: https://github.com/apache/spark/pull/16722 ping @jkbradley @imatiach-msft --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16722: [SPARK-9478][ML][MLlib] Add sample weights to dec...
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/16722 [SPARK-9478][ML][MLlib] Add sample weights to decision trees ## What changes were proposed in this pull request? This patch adds support for sample weights to `DecisionTreeRegressor` and `DecisionTreeClassifier`. *Note:* This patch does not add support for sample weights to RandomForest. As discussed in the JIRA, we would like to add sample weights into the bagging process. This patch is large enough as is, and there are some additional considerations to be made for random forests. Since the machinery introduced here needs to be present regardless, I have opted to leave random forests for a follow up pr. ## How was this patch tested? The algorithms are tested to ensure that: 1. Arbitrary scaling of constant weights has no effect 2. Outliers with small weights do not affect the learned model 3. Oversampling and weighting are equivalent Unit tests are also added to test other smaller components. ## Summary of changes * Impurity aggregators now store weighted sufficient statistics. They also store a raw count, however, since this is needed to use `minInstancesPerNode`. * Impurity aggregators now also hold the raw count. * This patch maintains the meaning of `minInstancesPerNode`, in that the parameter still corresponds to raw, unweighted counts. It also adds a new parameter `minWeightFractionPerNode` which requires that nodes must contain at least `minWeightFractionPerNode * weightedNumExamples` total weight. * This patch modifies `findSplitsForContinuousFeatures` to use weighted sums. Unit tests are added. * TreePoint is modified to hold a sample weight * BaggedPoint is modified from: scala private[spark] class BaggedPoint[Datum](val datum: Datum, val subsampleWeights: Array[Double]) extends Serializable to scala private[spark] class BaggedPoint[Datum]( val datum: Datum, val subsampleCounts: Array[Int], val sampleWeight: Double) extends Serializable We do not simply multiply the counts by the weight and store that because we need the raw counts and the weight in order to use both `minInstancesPerNode` and `minWeightPerNode` *Note:* many of the changed files are due simply to using `Instance` instead of `LabeledPoint` You can merge this pull request into a Git repository by running: $ git pull https://github.com/sethah/spark SPARK-9478-tree Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16722.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16722 commit 2d86cea640634a205e378bddee0b01780d019ea2 Author: sethahDate: 2017-01-27T16:38:36Z add weights to dt commit 7dc1437df21999554e42d35d1d544839074414cf Author: sethah Date: 2017-01-27T20:34:24Z dt tests passing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0
Github user julienledem commented on the issue: https://github.com/apache/spark/pull/16281 FYI: Parquet 1.8.2 vote thread passed: https://mail-archives.apache.org/mod_mbox/parquet-dev/201701.mbox/%3CCAO4re1mHLT%2BLYn8s1RTEDZK8-9WSVugY8-HQqAN%2BtU%3DBOi1L9w%40mail.gmail.com%3E --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16721 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72090/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16721 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16721 **[Test build #72090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72090/testReport)** for PR 16721 at commit [`de56852`](https://github.com/apache/spark/commit/de56852daf03de33fbc6dfa0280e1ea5f5f32cc7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16721 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72089/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16721 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16721: [SPARKR][DOCS] update R API doc for subset/extract
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16721 **[Test build #72089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72089/testReport)** for PR 16721 at commit [`2c1f673`](https://github.com/apache/spark/commit/2c1f67353b6049e7679947d9c6c1e9901d7e1c9f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15768: [SPARK-18080][ML][PySpark] Locality Sensitive Hashing (L...
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15768 Btw, @yanboliang and @Yunni did you sync? I'm fine with the takeover, but don't want to stomp on toes. Both can be listed as authors when this gets merged. Should we close this issue with the other taking its place? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16694: [SPARK-19336][ML][Pyspark]: LinearSVC Python API
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/16694#discussion_r98309772 --- Diff: python/pyspark/ml/classification.py --- @@ -60,6 +61,137 @@ def numClasses(self): @inherit_doc +class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, HasMaxIter, +HasRegParam, HasTol, HasRawPredictionCol, HasFitIntercept, HasStandardization, +HasThreshold, HasWeightCol, HasAggregationDepth, JavaMLWritable, JavaMLReadable): +""" +Linear SVM Classifier (https://en.wikipedia.org/wiki/Support_vector_machine#Linear_SVM) +This binary classifier optimizes the Hinge Loss using the OWLQN optimizer. + +>>> from pyspark.sql import Row +>>> from pyspark.ml.linalg import Vectors +>>> bdf = sc.parallelize([ +... Row(label=1.0, weight=2.0, features=Vectors.dense(1.0)), +... Row(label=0.0, weight=2.0, features=Vectors.sparse(1, [], []))]).toDF() +>>> svm = LinearSVC(maxIter=5, regParam=0.01, weightCol="weight") +>>> model = svm.fit(bdf) +>>> model.coefficients +DenseVector([1.909]) +>>> model.intercept +-1.0045358384178 +>>> model.numClasses +2 +>>> model.numFeatures +1 +>>> test0 = sc.parallelize([Row(features=Vectors.dense(-1.0))]).toDF() +>>> result = model.transform(test0).head() +>>> result.prediction +0.0 +>>> result.rawPrediction +DenseVector([2.9135, -2.9135]) +>>> test1 = sc.parallelize([Row(features=Vectors.sparse(1, [0], [1.0]))]).toDF() +>>> model.transform(test1).head().prediction +1.0 +>>> svm.setParams("vector") --- End diff -- I know, there are some not great examples to follow. It'd be nice to clean those out sometime... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org