[GitHub] spark pull request: [SPARK-15297] [SQL] Fix Set -V Command
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13081#issuecomment-219268076 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58610/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15297] [SQL] Fix Set -V Command
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13081#issuecomment-219268075 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15297] [SQL] Fix Set -V Command
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13081#issuecomment-219268048 **[Test build #58610 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58610/consoleFull)** for PR 13081 at commit [`ac371dc`](https://github.com/apache/spark/commit/ac371dc988aeaf37c88162b346f304bf7b01639f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13121#issuecomment-219265969 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58609/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13121#issuecomment-219265968 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13121#issuecomment-219265948 **[Test build #58609 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58609/consoleFull)** for PR 13121 at commit [`41efcb0`](https://github.com/apache/spark/commit/41efcb038358ad14c57212d7110fa88e355238c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15297] [SQL] Fix Set -V Command
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13081#issuecomment-219265793 **[Test build #58610 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58610/consoleFull)** for PR 13081 at commit [`ac371dc`](https://github.com/apache/spark/commit/ac371dc988aeaf37c88162b346f304bf7b01639f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14130] [SQL] Throw exceptions for ALTER...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12714#discussion_r63285806 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -179,6 +173,11 @@ unsupportedHiveNativeCommands | kw1=ALTER kw2=TABLE tableIdentifier kw3=TOUCH | kw1=ALTER kw2=TABLE tableIdentifier partitionSpec? kw3=COMPACT | kw1=ALTER kw2=TABLE tableIdentifier partitionSpec? kw3=CONCATENATE +| kw1=START kw2=TRANSACTION +| kw1=COMMIT +| kw1=ROLLBACK +| kw1=DFS --- End diff -- We still need to ban the related CLI commands in CLI Driver. Let me fix them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13121#issuecomment-219263713 **[Test build #58609 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58609/consoleFull)** for PR 13121 at commit [`41efcb0`](https://github.com/apache/spark/commit/41efcb038358ad14c57212d7110fa88e355238c6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15330] [SQL] Implement Reset Command
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/13121 [SPARK-15330] [SQL] Implement Reset Command What changes were proposed in this pull request? Like `Set` Command in Hive, `Reset` is also supported by Hive. See the link: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli Below is the related Hive JIRA: https://issues.apache.org/jira/browse/HIVE-3202 This PR is to implement such a command for resetting the SQL-related configuration to the default values. One of the use case shown in HIVE-3202 is listed below: > For the purpose of optimization we set various configs per query. It's worthy but all those configs should be reset every time for next query. How was this patch tested? Added a test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark resetCommand Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13121.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13121 commit 657ff6d57ebe9e72521d8b4c2393414e0c7a386c Author: gatorsmileDate: 2016-05-15T02:08:34Z implement reset commit 3e93b62b1aadb76e2d178adfa6655db3edded7e8 Author: gatorsmile Date: 2016-05-15T02:58:46Z fix spark-sql cli commit 41efcb038358ad14c57212d7110fa88e355238c6 Author: gatorsmile Date: 2016-05-15T03:05:38Z improve the comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r63284859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -129,6 +129,23 @@ final class Decimal extends Ordered[Decimal] with Serializable { } /** + * Set this Decimal to the given BigInteger value. Will have precision 38 and scale 0. + */ + def set(BigIntVal: BigInteger): Decimal = { --- End diff -- I will change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15318][ML][Example]:spark.ml Collaborat...
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/13110#discussion_r63284035 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala --- @@ -28,7 +28,7 @@ object ALSExample { // $example on$ case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: Long) - object Rating { + object RatingUtil { --- End diff -- I can move it into the main. I think it is not necessary. I will make the change and test it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Set provided path to Catalo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13120#issuecomment-219253785 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Set provided path to Catalo...
Github user xwu0226 commented on the pull request: https://github.com/apache/spark/pull/13120#issuecomment-219253654 cc @liancheng @yhuai @gatorsmile Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Set provided path to Catalo...
GitHub user xwu0226 opened a pull request: https://github.com/apache/spark/pull/13120 [SPARK-15269][SQL] Set provided path to CatalogTable.storage.locationURI when creating external non-hive compatible table ## What changes were proposed in this pull request? ### Symptom ``` scala> spark.range(1).write.json("/home/xwu0226/spark-test/data/spark-15269") Datasource.write -> Path: file:/home/xwu0226/spark-test/data/spark-15269 scala> spark.sql("create table spark_15269 using json options(PATH '/home/xwu0226/spark-test/data/spark-15269')") 16/05/11 14:51:00 WARN CreateDataSourceTableUtils: Couldn't find corresponding Hive SerDe for data source provider json. Persisting data source relation `spark_15269` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. going through newSparkSQLSpecificMetastoreTable() res1: org.apache.spark.sql.DataFrame = [] scala> spark.sql("drop table spark_15269") res2: org.apache.spark.sql.DataFrame = [] scala> spark.sql("create table spark_15269 using json as select 1 as a") org.apache.spark.sql.AnalysisException: path file:/user/hive/warehouse/spark_15269 already exists.; at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:88) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:62) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:60) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) ... ``` The 2nd creation of the table fails complaining about the path exists. ### Root cause: When the first table is created as external table with the data source path, but as json, `createDataSourceTables `considers it as non-hive compatible table because `json `is not a Hive SerDe. Then, `newSparkSQLSpecificMetastoreTable`is invoked to create the `CatalogTable `before asking HiveClient to create the metastore table. In this call, `locationURI `is not set. So when we convert `CatalogTable` to HiveTable before passing to Hive Metastore, hive table's data location is not set. Then, Hive metastore implicitly creates a data location as /tableName, which is, `file:/user/hive/warehouse/spark_15269` in the above case. When dropping the table, hive does not delete this implicitly created path because the table is external. when we create the 2nd table with select and without a path, the table is created as managed table, provided a default path in the options as following: ``` val optionsWithPath = if (!new CaseInsensitiveMap(options).contains("path")) { isExternal = false options + ("path" -> sessionState.catalog.defaultTablePath(tableIdent)) } else { options } ``` This default path happens to be the hive's warehouse directory + the table name, which is the same as the one hive metastore implicitly created earlier for the 1st table. So when trying to write the provided data to this data source table by `InsertIntoHadoopFsRelation`, which complains about the path existence since the SaveMode is SaveMode.ErrorIfExists. ### Solution: When creating an external datasource table that is non-hive compatible, make sure we set the provided path to `CatalogTable.storage.locationURI`, so we avoid hive metastore from implicitly creating a data location for the table. ## How was this patch tested? Testcase is added. And run regtest. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xwu0226/spark SPARK-15269 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13120.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13120 commit 21d188321284a86176927445fd1703353e0add09 Author: xin WuDate: 2016-05-08T07:06:36Z spark-15206 add testcases for distinct aggregate in having clause following up PR12974 commit e43d56ab260633d7c2af54a6960cec7eadff34c4 Author: xin Wu Date: 2016-05-08T07:09:44Z Revert "spark-15206 add testcases for distinct aggregate in having clause following up PR12974" This reverts commit 98a1f804d7343ba77731f9aa400c00f1a26c03fe. commit f9f1f1f36f3759eecfb6070b2372462ee454b700 Author: xin Wu Date: 2016-05-13T00:39:45Z SPARK-15269: set locationUFI to the non-hive compatible metastore table commit 58ad82db21f90b571d70371ff25c167ecda17720 Author: xin Wu Date: 2016-05-14T20:16:11Z SPARK-15269: only for
[GitHub] spark pull request: [SPARK-15328][MLLIB][ML] Word2Vec import for o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13119#issuecomment-219237391 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15328][MLLIB][ML] Word2Vec import for o...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/13119 [SPARK-15328][MLLIB][ML] Word2Vec import for original binary format ## What changes were proposed in this pull request? Add `loadGoogleModel()` function to import original wor2vec binary format. ## How was this patch tested? `mllib.feature.Word2VecSuite` and `ml.feature.Word2VecSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13119.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13119 commit 17f98089b2371033c5c88933123f070c7ad4c145 Author: Yuming WangDate: 2016-05-14T18:44:15Z Load Google word2vec model --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13117#issuecomment-219230521 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13117#issuecomment-219230523 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58608/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13117#issuecomment-219230470 **[Test build #58608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58608/consoleFull)** for PR 13117 at commit [`1ff05ba`](https://github.com/apache/spark/commit/1ff05ba66f2595c850357ccf2150d6b9a3f61bfd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15304] [SPARK-15305] [SPARK-15306] [SQL...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12812#discussion_r63278723 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -64,6 +65,19 @@ class SparkSession private( | Session-related state | * --- */ + { +val defaultWarehousePath = + SQLConf.WAREHOUSE_PATH +.defaultValueString +.replace("${system:user.dir}", System.getProperty("user.dir")) +val warehousePath = sparkContext.conf.get( + SQLConf.WAREHOUSE_PATH.key, + defaultWarehousePath) +sparkContext.conf.set(SQLConf.WAREHOUSE_PATH.key, warehousePath) +sparkContext.conf.set("hive.metastore.warehouse.dir", warehousePath) --- End diff -- Currently, the `Set` command does not work if the property is `hive.x.y.z`. Will try to submit a PR for resolving that tonight. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15304] [SPARK-15305] [SPARK-15306] [SQL...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/12812#discussion_r63277350 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -64,6 +65,19 @@ class SparkSession private( | Session-related state | * --- */ + { +val defaultWarehousePath = + SQLConf.WAREHOUSE_PATH +.defaultValueString +.replace("${system:user.dir}", System.getProperty("user.dir")) +val warehousePath = sparkContext.conf.get( + SQLConf.WAREHOUSE_PATH.key, + defaultWarehousePath) +sparkContext.conf.set(SQLConf.WAREHOUSE_PATH.key, warehousePath) +sparkContext.conf.set("hive.metastore.warehouse.dir", warehousePath) --- End diff -- At runtime, if users change the value of `SQLConf.WAREHOUSE_PATH.key` by using the `Set` Command, we still need to set `hive.metastore.warehouse.dir`. Right? In addition, I think we should disallow users to change the value of `hive.metastore.warehouse.dir` by using the `Set` Command. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13117#issuecomment-219224748 **[Test build #58608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58608/consoleFull)** for PR 13117 at commit [`1ff05ba`](https://github.com/apache/spark/commit/1ff05ba66f2595c850357ccf2150d6b9a3f61bfd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: update from orign
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13118#issuecomment-219223888 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: update from orign
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13118#issuecomment-219223860 @zhaorongsheng it seems it is open mistakenly. I guess this might have to be closed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: update from orign
GitHub user zhaorongsheng opened a pull request: https://github.com/apache/spark/pull/13118 update from orign ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhaorongsheng/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13118.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13118 commit 00a39d9c05c55b5ffcd4f49aadc91cedf227669a Author: Patrick WendellDate: 2015-12-15T23:09:57Z Preparing Spark release v1.6.0-rc3 commit 08aa3b47e6a295a8297e741effa14cd0d834aea8 Author: Patrick Wendell Date: 2015-12-15T23:10:04Z Preparing development version 1.6.0-SNAPSHOT commit 9e4ac56452710ddd8efb695e69c8de49317e3f28 Author: tedyu Date: 2015-12-16T02:15:10Z [SPARK-12056][CORE] Part 2 Create a TaskAttemptContext only after calling setConf This is continuation of SPARK-12056 where change is applied to SqlNewHadoopRDD.scala andrewor14 FYI Author: tedyu Closes #10164 from tedyu/master. (cherry picked from commit f725b2ec1ab0d89e35b5e2d3ddeddb79fec85f6d) Signed-off-by: Andrew Or commit 2c324d35a698b353c2193e2f9bd8ba08c741c548 Author: Timothy Chen Date: 2015-12-16T02:20:00Z [SPARK-12351][MESOS] Add documentation about submitting Spark with mesos cluster mode. Adding more documentation about submitting jobs with mesos cluster mode. Author: Timothy Chen Closes #10086 from tnachen/mesos_supervise_docs. (cherry picked from commit c2de99a7c3a52b0da96517c7056d2733ef45495f) Signed-off-by: Andrew Or commit 8e9a600313f3047139d3cebef85acc782903123b Author: Naveen Date: 2015-12-16T02:25:22Z [SPARK-9886][CORE] Fix to use ShutdownHookManager in ExternalBlockStore.scala Author: Naveen Closes #10313 from naveenminchu/branch-fix-SPARK-9886. (cherry picked from commit 8a215d2338c6286253e20122640592f9d69896c8) Signed-off-by: Andrew Or commit 93095eb29a1e59dbdbf6220bfa732b502330e6ae Author: Bryan Cutler Date: 2015-12-16T02:28:16Z [SPARK-12062][CORE] Change Master to asyc rebuild UI when application completes This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild. Author: Bryan Cutler Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062. (cherry picked from commit c5b6b398d5e368626e589feede80355fb74c2bd8) Signed-off-by: Andrew Or commit fb08f7b784bc8b5e0cd110f315f72c7d9fc65e08 Author: Wenchen Fan Date: 2015-12-16T02:29:19Z [SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability Author: Wenchen Fan Closes #8645 from cloud-fan/test. (cherry picked from commit a89e8b6122ee5a1517fbcf405b1686619db56696) Signed-off-by: Andrew Or commit a2d584ed9ab3c073df057bed5314bdf877a47616 Author: Timothy Hunter Date: 2015-12-16T18:12:33Z [SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow. Credit goes to the original author Titan-C (mentioned in the NOTICE). Note that I am not a CSS expert, so I can only address comments up to some extent. Default view: https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png;> When collapsed manually by the user: https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png;> Disappears when column is too narrow: https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png;> Can still be opened by the user if necessary: https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png;> Author: Timothy Hunter
[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13117#issuecomment-219223504 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13117#issuecomment-219223502 **[Test build #58607 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58607/consoleFull)** for PR 13117 at commit [`bc1720e`](https://github.com/apache/spark/commit/bc1720e60bb49165dca71691a3e0dfd2c23641b5). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13117#issuecomment-219223505 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58607/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13117#issuecomment-219223421 **[Test build #58607 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58607/consoleFull)** for PR 13117 at commit [`bc1720e`](https://github.com/apache/spark/commit/bc1720e60bb49165dca71691a3e0dfd2c23641b5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/13117#discussion_r63276920 --- Diff: sql/hive-thriftserver/pom.xml --- @@ -106,12 +111,6 @@ - --- End diff -- This isn't technically related, but is a simple fix for a build warning, and was editing the file anyway --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12972] [CORE] Update org.apache.httpcom...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/13117 [SPARK-12972] [CORE] Update org.apache.httpcomponents.httpclient ## What changes were proposed in this pull request? (Retry of https://github.com/apache/spark/pull/13049) - update to httpclient 4.5 / httpcore 4.4 - remove some defunct exclusions - manage httpmime version to match - update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used) ## How was this patch tested? Jenkins tests. Also, locally running the same test command of one Jenkins profile that failed: `mvn -Phadoop-2.6 -Pyarn -Phive -Phive-thriftserver -Pkinesis-asl ...` You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-12972.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13117.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13117 commit bc1720e60bb49165dca71691a3e0dfd2c23641b5 Author: Sean OwenDate: 2016-05-14T14:26:39Z Update to httpclient 4.5 / httpcore 4.4. Remove some defunct exclusions; manage httpmime version to match. Update selenium / httpunit to support 4.5 (possible now that Jetty 9 is used) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc lay...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/13109#issuecomment-219222873 This looks better. but the roxygen style is a little bit deviated. The previous is like: #' function name #' description Current is like: #' function name - description We may need a consistent roxygen style documentation. At least for two styles: one function for one RD multiple functions for one RD And also if you type '?corr' in R, only corr() for Column functions is displayed. Since R is function oriented, I think two corr() descriptions better to be displayed together in one page? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r63276327 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala --- @@ -321,11 +323,13 @@ object CatalystTypeConverters { } private class DecimalConverter(dataType: DecimalType) -extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] { + extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] { --- End diff -- Why change this? I think we should use encoders most of the time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11827] [SQL] Adding java.math.BigIntege...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10125#discussion_r63276311 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -129,6 +129,23 @@ final class Decimal extends Ordered[Decimal] with Serializable { } /** + * Set this Decimal to the given BigInteger value. Will have precision 38 and scale 0. + */ + def set(BigIntVal: BigInteger): Decimal = { --- End diff -- lower case the variable name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13113#issuecomment-219216499 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58606/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13113#issuecomment-219216498 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13113#issuecomment-219216467 **[Test build #58606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58606/consoleFull)** for PR 13113 at commit [`d4b41c5`](https://github.com/apache/spark/commit/d4b41c596fa9d95282633694508c4a910418a4ba). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13115#issuecomment-219213787 (I think it would be nicer if the PR description is fill up.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63274249 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -402,6 +402,76 @@ class DatasetSuite extends QueryTest with SharedSQLContext { 3, 17, 27, 58, 62) } + test("takeSample") { +val n = 1000 +val data = sparkContext.parallelize(1 to n, 2).toDS() +for (num <- List(0, 5, 20, 100)) { + val sample = data.takeSample(withReplacement = false, num = num) + assert(sample.count === num) // Got exactly num elements + assert(sample.distinct.count === num) // Elements are distinct + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +for (seed <- 1 to 5) { + val sample = data.takeSample(withReplacement = false, 20, seed) + assert(sample.count() === 20) // Got exactly 20 elements + assert(sample.distinct.count === 20) // Elements are distinct + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +for (seed <- 1 to 5) { + val sample = data.takeSample(withReplacement = false, 100, seed) + assert(sample.count === 100) // Got only 100 elements + assert(sample.distinct.count === 100) // Elements are distinct + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +for (seed <- 1 to 5) { + val sample = data.takeSample(withReplacement = true, 20, seed) + assert(sample.count === 20) // Got exactly 20 elements + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +{ + val sample = data.takeSample(withReplacement = true, num = 20) + assert(sample.count === 20) // Got exactly 100 elements +val sampleDisCount = sample.distinct.count + assert(sampleDisCount <= 20, "sampling with replacement returned all distinct elements") + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +{ + val sample = data.takeSample(withReplacement = true, num = n) + assert(sample.count === n) // Got exactly 100 elements + // Chance of getting all distinct elements is astronomically low, so test we got < 100 + assert(sample.distinct.count < n, "sampling with replacement returned all distinct elements") + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +for (seed <- 1 to 5) { + val sample = data.takeSample(withReplacement = true, n, seed) + assert(sample.count === n) // Got exactly 100 elements + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +for (seed <- 1 to 5) { + val sample = data.takeSample(withReplacement = true, 2 * n, seed) + assert(sample.count === 2 * n) // Got exactly 200 elements + // Chance of getting all distinct elements is still quite low, so test we got < 100 + assert(sample.distinct.count < n, "sampling with replacement returned all distinct elements") +} +{ + val emptySet = sparkContext.parallelize(Seq.empty[Int], 2) + val sample = emptySet.takeSample(false, 20, 1) + assert(sample.length === 0) +} --- End diff -- (I think we might not need a extra closure here and below) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63274238 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -402,6 +402,76 @@ class DatasetSuite extends QueryTest with SharedSQLContext { 3, 17, 27, 58, 62) } + test("takeSample") { +val n = 1000 +val data = sparkContext.parallelize(1 to n, 2).toDS() +for (num <- List(0, 5, 20, 100)) { + val sample = data.takeSample(withReplacement = false, num = num) + assert(sample.count === num) // Got exactly num elements + assert(sample.distinct.count === num) // Elements are distinct + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +for (seed <- 1 to 5) { + val sample = data.takeSample(withReplacement = false, 20, seed) + assert(sample.count() === 20) // Got exactly 20 elements + assert(sample.distinct.count === 20) // Elements are distinct + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +for (seed <- 1 to 5) { + val sample = data.takeSample(withReplacement = false, 100, seed) + assert(sample.count === 100) // Got only 100 elements + assert(sample.distinct.count === 100) // Elements are distinct + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +for (seed <- 1 to 5) { + val sample = data.takeSample(withReplacement = true, 20, seed) + assert(sample.count === 20) // Got exactly 20 elements + val sampleData = sample.collect() + assert(sampleData.forall(x => 1 <= x && x <= n), s"element not in [1, $n]") +} +{ + val sample = data.takeSample(withReplacement = true, num = 20) + assert(sample.count === 20) // Got exactly 100 elements +val sampleDisCount = sample.distinct.count --- End diff -- (It seems indentation is not consistent here.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13116#discussion_r63274228 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -18,6 +18,9 @@ package org.apache.spark.sql import java.io.CharArrayWriter +import java.util.Random + +import org.apache.spark.util.random.SamplingUtils --- End diff -- (it seems we need to reorder imports, https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide#SparkCodeStyleGuide-Imports) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13116#issuecomment-219213307 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15315][SQL] Adding error check to the C...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13105#discussion_r63274194 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala --- @@ -172,4 +173,13 @@ class DefaultSource extends FileFormat with DataSourceRegister { .mapPartitions(_.map(pair => new String(pair._2.getBytes, 0, pair._2.getLength, charset))) } } + + private def verifySchema(schema: StructType): Unit = { +schema.foreach(field => field.dataType match { --- End diff -- (Maybe starting with `{` for a multiple-line closure, `foreach { field =>`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15324] [SQL] Add the takeSample functio...
GitHub user burness opened a pull request: https://github.com/apache/spark/pull/13116 [SPARK-15324] [SQL] Add the takeSample function to the Dataset ## What changes were proposed in this pull request? In this pr, I add the takeSample function with the Dataset which is to sampling with the specify num instead of the fraction in sample function. ## How was this patch tested? add a test in `DatasetSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/burness/spark takeSample Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13116.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13116 commit c003f24cf402bcf80c0d920d4291f3753cb76ed1 Author: burnessDate: 2016-05-14T10:16:56Z add takeSample in Dataset commit 9874686563de7a5cf2bf312481910126f3dc0f12 Author: burness Date: 2016-05-14T10:20:31Z modify the format of the comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13115#issuecomment-219213070 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13115#issuecomment-219213071 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58605/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13115#issuecomment-219213068 **[Test build #58605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58605/consoleFull)** for PR 13115 at commit [`3b5eb9b`](https://github.com/apache/spark/commit/3b5eb9bb6bdf7377144b9bdd6c97a9cd5f39d088). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13115#issuecomment-219212931 **[Test build #58605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58605/consoleFull)** for PR 13115 at commit [`3b5eb9b`](https://github.com/apache/spark/commit/3b5eb9bb6bdf7377144b9bdd6c97a9cd5f39d088). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13113#issuecomment-219212927 **[Test build #58606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58606/consoleFull)** for PR 13113 at commit [`d4b41c5`](https://github.com/apache/spark/commit/d4b41c596fa9d95282633694508c4a910418a4ba). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...
Github user KaiXinXiaoLei commented on the pull request: https://github.com/apache/spark/pull/10900#issuecomment-219212923 @andrewor14 See https://github.com/apache/spark/pull/13115, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...
Github user KaiXinXiaoLei closed the pull request at: https://github.com/apache/spark/pull/10900 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12492] Using spark-sql commond to run q...
GitHub user KaiXinXiaoLei opened a pull request: https://github.com/apache/spark/pull/13115 [SPARK-12492] Using spark-sql commond to run query, write the event of SparkListenerJobStart See https://github.com/apache/spark/pull/10900 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) hot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/KaiXinXiaoLei/spark sqlPage2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13115.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13115 commit 3b5eb9bb6bdf7377144b9bdd6c97a9cd5f39d088 Author: KaiXinXiaoLeiDate: 2016-05-14T10:17:30Z sql page --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.4
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/13114#issuecomment-219212846 Close this PR @GuoNing89 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.4
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13114#issuecomment-219212714 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.4
GitHub user GuoNing89 opened a pull request: https://github.com/apache/spark/pull/13114 Branch 1.4 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-1.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13114.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13114 commit 4634be5a7db4f2fd82cfb5c602b79129d1d9e246 Author: Josh RosenDate: 2015-06-14T16:34:35Z [SPARK-8354] [SQL] Fix off-by-factor-of-8 error when allocating scratch space in UnsafeFixedWidthAggregationMap UnsafeFixedWidthAggregationMap contains an off-by-factor-of-8 error when allocating row conversion scratch space: we take a size requirement, measured in bytes, then allocate a long array of that size. This means that we end up allocating 8x too much conversion space. This patch fixes this by allocating a `byte[]` array instead. This doesn't impose any new limitations on the maximum sizes of UnsafeRows, since UnsafeRowConverter already used integers when calculating the size requirements for rows. Author: Josh Rosen Closes #6809 from JoshRosen/sql-bytes-vs-words-fix and squashes the following commits: 6520339 [Josh Rosen] Updates to reflect fact that UnsafeRow max size is constrained by max byte[] size (cherry picked from commit ea7fd2ff6454e8d819a39bf49901074e49b5714e) Signed-off-by: Josh Rosen commit 2805d145e30e4cabd11a7d33c4f80edbc54cc54a Author: Michael Armbrust Date: 2015-06-14T18:21:42Z [SPARK-8358] [SQL] Wait for child resolution when resolving generators Author: Michael Armbrust Closes #6811 from marmbrus/aliasExplodeStar and squashes the following commits: fbd2065 [Michael Armbrust] more style 806a373 [Michael Armbrust] fix style 7cbb530 [Michael Armbrust] [SPARK-8358][SQL] Wait for child resolution when resolving generatorsa (cherry picked from commit 9073a426e444e4bc6efa8608e54e0a986f38a270) Signed-off-by: Michael Armbrust commit 0ffbf085190b9d4dc13a8b6545e4e1022083bd35 Author: Peter Hoffmann Date: 2015-06-14T18:41:16Z fix read/write mixup Author: Peter Hoffmann Closes #6815 from hoffmann/patch-1 and squashes the following commits: 2abb6da [Peter Hoffmann] fix read/write mixup (cherry picked from commit f3f2a4397da164f0ddfa5d60bf441099296c4346) Signed-off-by: Reynold Xin commit fff8d7ee6c7e88ed96c29260480e8228e7fb1435 Author: tedyu Date: 2015-06-16T00:00:38Z SPARK-8336 Fix NullPointerException with functions.rand() This PR fixes the problem reported by Justin Yip in the thread 'NullPointerException with functions.rand()' Tested using spark-shell and verified that the following works: sqlContext.createDataFrame(Seq((1,2), (3, 100))).withColumn("index", rand(30)).show() Author: tedyu Closes #6793 from tedyu/master and squashes the following commits: 62fd97b [tedyu] Create RandomSuite 750f92c [tedyu] Add test for Rand() with seed a1d66c5 [tedyu] Fix NullPointerException with functions.rand() (cherry picked from commit 1a62d61696a0481508d83a07d19ab3701245ac20) Signed-off-by: Reynold Xin commit f287f7ea141fa7a3e9f8b7d3a2180b63cd77088d Author: huangzhaowei Date: 2015-06-16T06:16:09Z [SPARK-8367] [STREAMING] Add a limit for 'spark.streaming.blockInterval` since a data loss bug. Bug had reported in the jira [SPARK-8367](https://issues.apache.org/jira/browse/SPARK-8367) The relution is limitting the configuration `spark.streaming.blockInterval` to a positive number. Author: huangzhaowei Author: huangzhaowei Closes #6818 from SaintBacchus/SPARK-8367 and squashes the following commits: c9d1927 [huangzhaowei] Update BlockGenerator.scala bd3f71a [huangzhaowei] Use requre instead of if 3d17796 [huangzhaowei] [SPARK_8367][Streaming]Add a limit for 'spark.streaming.blockInterval' since a data loss bug. (cherry picked from commit
[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13113#issuecomment-219212380 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13113#issuecomment-219212377 **[Test build #58604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58604/consoleFull)** for PR 13113 at commit [`f854382`](https://github.com/apache/spark/commit/f85438276a40a13b260cb3b96d4dc0cea4113412). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13113#issuecomment-219212381 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58604/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13113#issuecomment-219212325 **[Test build #58604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58604/consoleFull)** for PR 13113 at commit [`f854382`](https://github.com/apache/spark/commit/f85438276a40a13b260cb3b96d4dc0cea4113412). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15325][SQL] Replace the usage of deprec...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/13113 [SPARK-15325][SQL] Replace the usage of deprecated DataSet API in tests (Scala/Java) ## What changes were proposed in this pull request? It seems `unionAll(other: Dataset[T])` and `registerTempTable(tableName: String)` are deprecated but it seems they are still being used across Spark tests. In Scala/Java, it seems only `registerTempTable(tableName: String)` is being used. This PR replaces `registerTempTable(tableName: String)` to `createOrReplaceTempView(viewName: String)` ## How was this patch tested? Jenkins tests. Existing tests should cover this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-15325 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13113.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13113 commit f85438276a40a13b260cb3b96d4dc0cea4113412 Author: hyukjinkwonDate: 2016-05-14T09:38:52Z Replace the usage of registerTempTable to createOrReplaceTempView --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15096][ML]:LogisticRegression MultiClas...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12969 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15197][Docs] Added Scaladoc for countAp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/12955 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15096][ML]:LogisticRegression MultiClas...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12969#issuecomment-219208737 Merged to master/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15197][Docs] Added Scaladoc for countAp...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/12955#issuecomment-219208684 Merged to master/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15263][Core] Make shuffle service dir c...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/13042#issuecomment-219208629 This seems OK to me. There is actually another delete-recursively method in TestShuffleDataContext in network-shuffle which should be able to use this method rather than define it again. It seems like this could be usefully implemented in the main `Utils.deleteRecursively` as well, rather than have two differing implementations. I'm trying to figure out whether that's a win or poses any risks; it probably speeds up some big deletes but does mean spawning a process, in many cases at JVM shutdown. If anyone's supportive of that we could try it here, but it's not essential --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15318][ML][Example]:spark.ml Collaborat...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/13110#discussion_r63273137 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/ALSExample.scala --- @@ -28,7 +28,7 @@ object ALSExample { // $example on$ case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: Long) - object Rating { + object RatingUtil { --- End diff -- Is this object even needed? there's no reason this couldn't just be defined in main? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/13112#issuecomment-219208034 Rather than change this in just a couple places, can you update all internal usages of the old accumulator API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15323] Fix reading of partitioned forma...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13104#issuecomment-219207979 Let me cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13112#discussion_r63272945 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/stopwatches.scala --- @@ -19,7 +19,8 @@ package org.apache.spark.ml.util import scala.collection.mutable -import org.apache.spark.{Accumulator, SparkContext} +import org.apache.spark.{SparkContext} +import org.apache.spark.util.LongAccumulator; --- End diff -- (The imports might have to be cleaned up as below:) ```scala import org.apache.spark.SparkContext import org.apache.spark.util.LongAccumulator ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13112#issuecomment-219207501 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58603/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13112#issuecomment-219207485 **[Test build #58603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58603/consoleFull)** for PR 13112 at commit [`2761dff`](https://github.com/apache/spark/commit/2761dff513eb2da87464735722807e3ea0ea7676). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13112#issuecomment-219207500 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix reading of partitioned format=text dataset...
Github user jurriaan commented on the pull request: https://github.com/apache/spark/pull/13104#issuecomment-219207482 I'll create a JIRA just to be sure, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix reading of partitioned format=text dataset...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13104#issuecomment-219207342 Oh, I just meant it changes codes to support partitioned table for text data source which seems disabled in Spark 2.0. It seems the guide says it does not a JIRA only if it works as the same regardless of a PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Fix reading of partitioned format=text dataset...
Github user jurriaan commented on the pull request: https://github.com/apache/spark/pull/13104#issuecomment-219207031 @HyukjinKwon It's related to https://issues.apache.org/jira/browse/SPARK-14463. Or should I create a new JIRA? And how is this changing existing behaviour? It was working perfectly fine in Spark 1.6.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15222] [SparkR] [ML] SparkR ML examples...
Github user JeremyNixon commented on the pull request: https://github.com/apache/spark/pull/13000#issuecomment-219206838 As SparkR grows at some point it will make sense to split the docs into different files to separate out different parts of the library - do you think that it's worth splitting off the SQL/core examples from the machine learning examples at this point? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15222] [SparkR] [ML] SparkR ML examples...
Github user JeremyNixon commented on a diff in the pull request: https://github.com/apache/spark/pull/13000#discussion_r63272747 --- Diff: examples/src/main/r/ml.R --- @@ -25,30 +25,102 @@ library(SparkR) sc <- sparkR.init(appName="SparkR-ML-example") sqlContext <- sparkRSQL.init(sc) -# Train GLM of family 'gaussian' + spark.glm and glm ## + +# Fit a generalized linear model with spark.glm training1 <- suppressWarnings(createDataFrame(sqlContext, iris)) test1 <- training1 -model1 <- glm(Sepal_Length ~ Sepal_Width + Species, training1, family = "gaussian") +model1 <- spark.glm(training1, Sepal_Length ~ Sepal_Width + Species, family = "gaussian") # Model summary summary(model1) # Prediction predictions1 <- predict(model1, test1) -head(select(predictions1, "Sepal_Length", "prediction")) +showDF(predictions1) + +# Fit a generalized linear model with glm (R-compliant) +sameModel <- glm(Sepal_Length ~ Sepal_Width + Species, training1, family = "gaussian") +summary(sameModel) + + spark.survreg ## + +# Use the ovarian dataset available in R survival package +library(survival) -# Train GLM of family 'binomial' -training2 <- filter(training1, training1$Species != "setosa") +# Fit an accelerated failure time (AFT) survival regression model with spark.survreg +training2 <- suppressWarnings(createDataFrame(sqlContext, ovarian)) test2 <- training2 -model2 <- glm(Species ~ Sepal_Length + Sepal_Width, data = training2, family = "binomial") --- End diff -- It may be worth keeping in the classification example for glm - users who come to the docs to see what's possible and who aren't familiar with link functions or don't assume that a binomial link function exists may not realize that it's possible to do classification with the algorithm. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15222] [SparkR] [ML] SparkR ML examples...
Github user JeremyNixon commented on a diff in the pull request: https://github.com/apache/spark/pull/13000#discussion_r63272738 --- Diff: examples/src/main/r/ml.R --- @@ -25,30 +25,102 @@ library(SparkR) sc <- sparkR.init(appName="SparkR-ML-example") sqlContext <- sparkRSQL.init(sc) -# Train GLM of family 'gaussian' + spark.glm and glm ## + +# Fit a generalized linear model with spark.glm training1 <- suppressWarnings(createDataFrame(sqlContext, iris)) test1 <- training1 -model1 <- glm(Sepal_Length ~ Sepal_Width + Species, training1, family = "gaussian") +model1 <- spark.glm(training1, Sepal_Length ~ Sepal_Width + Species, family = "gaussian") # Model summary summary(model1) --- End diff -- For user readability, it would be great if the models were given names that aligned with their algorithm - something like glmModel, naiveBayesModel, that makes it clear which model corresponds to which algorithm. For the predictions the same change may be helpful for knowing at a glance which variables correspond to their outputs. In the MLlib docs the examples are cleanly separated from one another so that there's no ambiguity, but as these are in a large contiguous file it may make sense to disambiguate things. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13112#issuecomment-219206065 **[Test build #58603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58603/consoleFull)** for PR 13112 at commit [`2761dff`](https://github.com/apache/spark/commit/2761dff513eb2da87464735722807e3ea0ea7676). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/13112#issuecomment-219206010 Jenkins add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/13112#issuecomment-219204283 This looks good. ping @mengxr @jkbradley @MLnick Could you help to add @WeichenXu123 to whitelist? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15320] [SQL] Spark-SQL Cli Ignores Para...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/13111#issuecomment-219204110 In the PR https://github.com/apache/spark/pull/12812, we mention we will not use `hive.metastore.warehouse.dir` to set the location. Thus, I think we should issue an exception if users try to set it in the CLI parameter? or just issue a warning LOG message? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13112#issuecomment-219204019 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...
GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13112 [SPARK-15322][mllib]update deprecate accumulator usage into accumulatorV2 in mllib ## What changes were proposed in this pull request? MLlib code has two position use sc.accumulator method and it is deprecate, update it. mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala line 282 mllib/src/main/scala/org/apache/spark/ml/util/stopwatches.scala line 106 ## How was this patch tested? rerun build and test You can merge this pull request into a Git repository by running: $ git pull https://github.com/WeichenXu123/spark update_accuV2_in_mllib Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13112.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13112 commit 2761dff513eb2da87464735722807e3ea0ea7676 Author: WeichenXuDate: 2016-05-14T06:10:34Z update deprecate accumulator usage into accumulatorV2 in mllib --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12754#issuecomment-219203483 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58602/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12754#issuecomment-219203482 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12754#issuecomment-219203467 **[Test build #58602 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58602/consoleFull)** for PR 12754 at commit [`8b9f33a`](https://github.com/apache/spark/commit/8b9f33a0a5991959743e29e9f61175a20ce14a87). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15320] [SQL] Spark-SQL Cli Ignores Para...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/13111#issuecomment-219203243 @yhuai @andrewor14 @rxin @liancheng **Question**: This PR is to set `spark.sql.warehouse.dir` by using the user-specified value of `hive.metastore.warehouse.dir` in CLI command line. Another option is to issue an exception and force users to use `spark.sql.warehouse.dir`. Let me know which one is preferrable. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14979] [ML] [PySpark] Add examples for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12754#issuecomment-219203197 **[Test build #58602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58602/consoleFull)** for PR 12754 at commit [`8b9f33a`](https://github.com/apache/spark/commit/8b9f33a0a5991959743e29e9f61175a20ce14a87). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org