[GitHub] spark pull request #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix f...
Github user ueshin closed the pull request at: https://github.com/apache/spark/pull/19704 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/19704 Thanks for reviewing! merging to branch-2.2. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 @MLnick Thanks for the reviewing the code . Have done changes as suggested. Please proceed further if its good to go . Thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r149886343 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala --- @@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext } / + // Tests of feature subset strategy + / + test("Tests of feature subset strategy") { +val numClasses = 2 +val gbt = new GBTRegressor() + .setMaxDepth(3) + .setMaxIter(5) + .setSubsamplingRate(1.0) + .setStepSize(0.5) + .setSeed(123) + .setFeatureSubsetStrategy("all") + +// In this data, feature 1 is very important. +val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc) +val categoricalFeatures = Map.empty[Int, Int] +val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses) + +val importances = gbt.fit(df).featureImportances +val mostImportantFeature = importances.argmax +assert(mostImportantFeature === 1) +assert(importances.toArray.sum === 1.0) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r149886323 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala --- @@ -173,6 +178,10 @@ object GBTRegressor extends DefaultParamsReadable[GBTRegressor] { @Since("2.0.0") override def load(path: String): GBTRegressor = super.load(path) + + @Since("2.3.0") --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18118: [SPARK-20199][ML] : Provided featureSubsetStrateg...
Github user pralabhkumar commented on a diff in the pull request: https://github.com/apache/spark/pull/18118#discussion_r149886357 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala --- @@ -166,6 +166,40 @@ class GBTRegressorSuite extends SparkFunSuite with MLlibTestSparkContext } / + // Tests of feature subset strategy + / + test("Tests of feature subset strategy") { +val numClasses = 2 +val gbt = new GBTRegressor() + .setMaxDepth(3) + .setMaxIter(5) + .setSubsamplingRate(1.0) + .setStepSize(0.5) + .setSeed(123) + .setFeatureSubsetStrategy("all") + +// In this data, feature 1 is very important. +val data: RDD[LabeledPoint] = TreeTests.featureImportanceData(sc) +val categoricalFeatures = Map.empty[Int, Int] +val df: DataFrame = TreeTests.setMetadata(data, categoricalFeatures, numClasses) + +val importances = gbt.fit(df).featureImportances +val mostImportantFeature = importances.argmax +assert(mostImportantFeature === 1) +assert(importances.toArray.sum === 1.0) +assert(importances.toArray.forall(_ >= 0.0)) + +// GBT with different featureSubsetStrategy +val gbtWithFeatureSubset = gbt.setFeatureSubsetStrategy("1") +val importanceFeatures = gbtWithFeatureSubset.fit(df).featureImportances +val mostIF = importanceFeatures.argmax +assert(!(mostImportantFeature === mostIF)) +assert(importanceFeatures.toArray.sum === 1.0) +assert(importanceFeatures.toArray.forall(_ >= 0.0)) +assert(!(importanceFeatures.toDense.values.deep === importances.toDense.values.deep)) --- End diff -- done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r149886093 --- Diff: python/pyspark/serializers.py --- @@ -213,7 +213,15 @@ def __repr__(self): return "ArrowSerializer" -def _create_batch(series): +def _create_batch(series, copy=False): --- End diff -- @ueshin this ended up having no effect, so I took it out. For the case of Timestamps, the timezone conversions will make a copy regardless. For the case of ints being promoted to floats then that means they will have null values and need to call `fillna(0)` which makes a copy anyway. So it seems this only makes copies when necessary. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16578 I'm going on this again. But I think we still need other eyes on this too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #83635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83635/testReport)** for PR 19459 at commit [`421d0be`](https://github.com/apache/spark/commit/421d0beafe0aeff8e689fa05af0505a4c8b1c556). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19657: [SPARK-22344][SPARKR] clean up install dir if run...
Github user felixcheung closed the pull request at: https://github.com/apache/spark/pull/19657 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19657: [SPARK-22344][SPARKR] clean up install dir if run...
GitHub user felixcheung reopened a pull request: https://github.com/apache/spark/pull/19657 [SPARK-22344][SPARKR] clean up install dir if running test as source package ## What changes were proposed in this pull request? remove spark if spark downloaded & installed ## How was this patch tested? manually by building package Jenkins, AppVeyor You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rinstalldir Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19657.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19657 commit d4433e13565e9e3d41928e1d2262696204476341 Author: Felix Cheung Date: 2017-11-04T08:14:33Z add flag to cleanup commit 0ea7c9b1c26c604296c35bc1588a6a5606a10cb2 Author: Felix Cheung Date: 2017-11-05T03:21:26Z no get0 commit d0064ca24339143aeac9f1ef78b924361f908248 Author: Felix Cheung Date: 2017-11-07T10:27:13Z make into function commit 31f3bd06cc7d2b7bf482eddfe2f2738244cfbca7 Author: Felix Cheung Date: 2017-11-07T10:50:55Z fix lint commit ca5349bfc0dae03c2402b104e51c78a841541b09 Author: Felix Cheung Date: 2017-11-07T10:55:27Z comment commit f2aa5b7e12ed36e7b56610e695615260643f952f Author: Felix Cheung Date: 2017-11-07T17:31:16Z fix windows commit 90d36c9ee3b0aed60ac9343e05b44366d1d2bf43 Author: Felix Cheung Date: 2017-11-07T17:38:12Z more test commit f21a90bef2a08c9d4cfdcc6588fb2da64679b4ec Author: Felix Cheung Date: 2017-11-07T17:39:05Z fix commit 18e238a62d53de5a73283a741c1a9bb8230f4484 Author: Felix Cheung Date: 2017-11-08T04:54:53Z fix 2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19657 ok thanks, in that case, would you mind cherry pick these changes into your account to run under appveyor - fixing test run is lower priority than getting this merged to kick off 2.2.1... :) thanks --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19697: [SPARK-22222][CORE][TEST][FOLLOW-UP] Remove redun...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19697 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83626/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19479 **[Test build #83626 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83626/testReport)** for PR 19479 at commit [`72c46f8`](https://github.com/apache/spark/commit/72c46f844967039ec2009de6cd93b9733ab1e8b8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18018: [SPARK-12686][SQL] Support aggregation push down ...
Github user kisimple closed the pull request at: https://github.com/apache/spark/pull/18018 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19697: [SPARK-22222][CORE][TEST][FOLLOW-UP] Remove redundant an...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19697 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19701 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19701 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19706: [SPARK-22476][R] Add dayofweek function to R
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19706 **[Test build #83634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83634/testReport)** for PR 19706 at commit [`d24a89b`](https://github.com/apache/spark/commit/d24a89b6a756457c651d0c208ccbe59b979e9ecc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19706: [SPARK-22476][R] Add dayofweek function to R
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/19706 [SPARK-22476][R] Add dayofweek function to R ## What changes were proposed in this pull request? This PR adds `dayofweek` to R API: ```r data <- list(list(d = as.Date("2012-12-13")), list(d = as.Date("2013-12-14")), list(d = as.Date("2014-12-15"))) df <- createDataFrame(data) collect(select(df, dayofweek(df$d))) ``` ``` dayofweek(d) 15 27 32 ``` ## How was this patch tested? Manual tests and unit tests in `R/pkg/tests/fulltests/test_sparkSQL.R` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark add-dayofweek Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19706.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19706 commit d24a89b6a756457c651d0c208ccbe59b979e9ecc Author: hyukjinkwon Date: 2017-11-08T11:31:35Z Add support for dayofweek function in R --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19695 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83625/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19695 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19695 **[Test build #83625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83625/testReport)** for PR 19695 at commit [`0dcc12b`](https://github.com/apache/spark/commit/0dcc12b9b0035d56013429322ac52e67844a1704). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19468 ping @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19657 I actually took a look to decrease the build time (as you know) and am currently away from it. If I remember correctly, what I observed was that a single particular test(?) takes 20ish(?) mins. It was related with ML in R. Let me try to take a look again first and will leave some comments about what I investigated in SPARK-21693 if I can't deal with it by myself (probably by my limited ML knowledge). If that's actually not that quite simple, then, let me ask it to increase 2 hours (like my own account). In AppVeyor, sounds they actually recommend to separate the build, as I proposed in the JIRA or reduce the time .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83631/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18118 **[Test build #83631 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83631/testReport)** for PR 18118 at commit [`ea03683`](https://github.com/apache/spark/commit/ea03683a4c388eaee70bf66fc41fd89a3a81a6a3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19663 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83632/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19663 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19663 **[Test build #83632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83632/testReport)** for PR 19663 at commit [`61b342c`](https://github.com/apache/spark/commit/61b342c9d7e4145052c2d7edd835bd36f401087e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19705 **[Test build #83633 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83633/testReport)** for PR 19705 at commit [`565c598`](https://github.com/apache/spark/commit/565c598e89299b8c1473d76249ab732abebdb661). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19705 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83633/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19705 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19657 @HyukjinKwon hey I think the appveyor test pass is just timing out after 1 hr 30 min - is there a way to up the timeout? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19705 **[Test build #83633 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83633/testReport)** for PR 19705 at commit [`565c598`](https://github.com/apache/spark/commit/565c598e89299b8c1473d76249ab732abebdb661). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19705: [SPARK-22308][test-maven] Support alternative unit testi...
Github user nkronenfeld commented on the issue: https://github.com/apache/spark/pull/19705 @gatorsmile @srowen I think this is set now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19705: [SPARK-22308][test-maven] Support alternative uni...
GitHub user nkronenfeld opened a pull request: https://github.com/apache/spark/pull/19705 [SPARK-22308][test-maven] Support alternative unit testing styles in external applications Continuation of PR#19528 (https://github.com/apache/spark/pull/19529#issuecomment-340252119) The problem with the maven build in the previous PR was the new tests the creation of a spark session outside the tests meant there was more than one spark session around at a time. I was using the spark session outside the tests so that the tests could share data; I've changed it so that each test creates the data anew. You can merge this pull request into a Git repository by running: $ git pull https://github.com/nkronenfeld/spark alternative-style-tests-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19705.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19705 commit b9d41cd79f05f6c420d070ad07cdfa8f853fd461 Author: Nathan Kronenfeld Date: 2017-10-15T03:04:16Z Separate out the portion of SharedSQLContext that requires a FunSuite from the part that works with just any old test suite. commit 0d4bd97247a2d083c7de55663703b38a34298c9c Author: Nathan Kronenfeld Date: 2017-10-15T15:57:09Z Fix typo in trait name commit 83c44f1c24619e906af48180d0aace38587aa88d Author: Nathan Kronenfeld Date: 2017-10-15T15:57:42Z Add simple tests for each non-FunSuite test style commit e460612ec6f36e62d8d21d88c2344378ecba581a Author: Nathan Kronenfeld Date: 2017-10-15T16:20:44Z Document testing possibilities commit 0ee2aadf29b681b23bed356b14038525574204a5 Author: Nathan Kronenfeld Date: 2017-10-18T23:46:44Z Better documentation of testing procedures commit 802a958b640067b99fda0b2c8587dea5b8000495 Author: Nathan Kronenfeld Date: 2017-10-18T23:46:58Z Same initialization issue in SharedSparkContext as is in SharedSparkSession commit 4218b86d5a8ff2321232ff38ed3e1b217ff7db2a Author: Nathan Kronenfeld Date: 2017-10-23T03:49:39Z Remove documentation of testing commit 2d927e94f627919ac1546b47072276b23d3e8da2 Author: Nathan Kronenfeld Date: 2017-10-24T04:37:48Z Move base versions of PlanTest and SQLTestUtils into the same file as where they came from, in an attempt to make diffs simpler commit 38a83c081b2f9e28bea6321994fc1a0a0c43f252 Author: Nathan Kronenfeld Date: 2017-10-25T14:42:15Z Comment line length should be 100 commit 241459a8a4c554877e381fe8306d086ab5b1b152 Author: Nathan Kronenfeld Date: 2017-10-25T14:43:51Z Move SQLTestUtils object to the end of the file commit 24fc4a324008b2acfcf5a2617eb7cc320565e83c Author: Nathan Kronenfeld Date: 2017-10-25T15:00:07Z fix scalastyle error (whitespace at end of line) commit e4763d977cffbe7ef362a859c229b74b3cdf4ef3 Author: Nathan Kronenfeld Date: 2017-10-26T02:27:07Z Remove extraneous curly brackets around empty PlanTest body commit 6c0b0d569ae1d779fd9253da0c7e97d12634063c Author: Nathan Kronenfeld Date: 2017-10-26T03:24:31Z Remove extraneous beforeAll and brackets from SharedSQLContext commit 565c598e89299b8c1473d76249ab732abebdb661 Author: Nathan Kronenfeld Date: 2017-11-09T06:39:30Z Make sure no spark sessions are active outside tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19663: [SPARK-22463][YARN][SQL][Hive]add hadoop/hive/hbase/etc ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19663 **[Test build #83632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83632/testReport)** for PR 19663 at commit [`61b342c`](https://github.com/apache/spark/commit/61b342c9d7e4145052c2d7edd835bd36f401087e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19704 I think we can merge this first to branch-2.2, and then re-run the test in `19701` .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19479 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83623/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19630: wip: [SPARK-22409] Introduce function type argume...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19630#discussion_r149873461 --- Diff: python/pyspark/sql/udf.py --- @@ -0,0 +1,136 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +""" +User-defined function related classes and functions +""" +import functools + +from pyspark import SparkContext +from pyspark.rdd import _prepare_for_python_RDD, PythonEvalType +from pyspark.sql.column import Column, _to_java_column, _to_seq +from pyspark.sql.types import StringType, DataType, _parse_datatype_string + + +def _wrap_function(sc, func, returnType): +command = (func, returnType) +pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command) +return sc._jvm.PythonFunction(bytearray(pickled_command), env, includes, sc.pythonExec, + sc.pythonVer, broadcast_vars, sc._javaAccumulator) + + +def _create_udf(f, *, returnType, udfType): +if udfType in (PythonEvalType.PANDAS_SCALAR_UDF, PythonEvalType.PANDAS_GROUP_FLATMAP_UDF): +import inspect +argspec = inspect.getargspec(f) +if len(argspec.args) == 0 and argspec.varargs is None: +raise ValueError( +"0-arg pandas_udfs are not supported. " +"Instead, create a 1-arg pandas_udf and ignore the arg in your function." +) +udf_obj = UserDefinedFunction(f, returnType=returnType, name=None, udfType=udfType) +return udf_obj._wrapped() + + +class UserDefinedFunction(object): +""" +User defined function in Python + +.. versionadded:: 1.3 +""" +def __init__(self, func, + returnType=StringType(), name=None, + udfType=PythonEvalType.SQL_BATCHED_UDF): +if not callable(func): +raise TypeError( +"Not a function or callable (__call__ is not defined): " +"{0}".format(type(func))) + +self.func = func +self._returnType = returnType +# Stores UserDefinedPythonFunctions jobj, once initialized +self._returnType_placeholder = None +self._judf_placeholder = None +self._name = name or ( +func.__name__ if hasattr(func, '__name__') +else func.__class__.__name__) +self.udfType = udfType + + +@property +def returnType(self): +# This makes sure this is called after SparkContext is initialized. +# ``_parse_datatype_string`` accesses to JVM for parsing a DDL formatted string. +if self._returnType_placeholder is None: +if isinstance(self._returnType, DataType): +self._returnType_placeholder = self._returnType +else: +self._returnType_placeholder = _parse_datatype_string(self._returnType) +return self._returnType_placeholder + +@property +def _judf(self): +# It is possible that concurrent access, to newly created UDF, +# will initialize multiple UserDefinedPythonFunctions. +# This is unlikely, doesn't affect correctness, +# and should have a minimal performance impact. +if self._judf_placeholder is None: +self._judf_placeholder = self._create_judf() +return self._judf_placeholder + +def _create_judf(self): +from pyspark.sql import SparkSession + +spark = SparkSession.builder.getOrCreate() +sc = spark.sparkContext + +wrapped_func = _wrap_function(sc, self.func, self.returnType) +jdt = spark._jsparkSession.parseDataType(self.returnType.json()) +judf = sc._jvm.org.apache.spark.sql.execution.python.UserDefinedPythonFunction( +self._name, wrapped_func, jdt, self.udfType) +return judf + +def __call__(self, *cols
[GitHub] spark pull request #19630: wip: [SPARK-22409] Introduce function type argume...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19630#discussion_r149873412 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -23,14 +23,15 @@ import scala.collection.JavaConverters._ import scala.language.implicitConversions import org.apache.spark.annotation.InterfaceStability +import org.apache.spark.api.python.PythonEvalType import org.apache.spark.broadcast.Broadcast import org.apache.spark.sql.catalyst.analysis.{Star, UnresolvedAlias, UnresolvedAttribute, UnresolvedFunction} import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.util.toPrettySQL import org.apache.spark.sql.execution.aggregate.TypedAggregateExpression -import org.apache.spark.sql.execution.python.{PythonUDF, PythonUdfType} +import org.apache.spark.sql.execution.python.{PythonUDF} --- End diff -- We can remove the braces here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19479: [SPARK-17074] [SQL] Generate equi-height histogram in co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19479 **[Test build #83623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83623/testReport)** for PR 19479 at commit [`a96169e`](https://github.com/apache/spark/commit/a96169eac41db1ba2db9d9211d0c301012c4c409). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Histogram(height: Double, bins: Array[HistogramBin]) ` * `case class HistogramBin(lo: Double, hi: Double, ndv: Long)` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19690: [SPARK-22467]Added a switch to support whether `stdout_s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19690 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19664: [SPARK-22442][SQL] ScalaReflection should produce correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19664 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19690: [SPARK-22467]Added a switch to support whether `stdout_s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19690 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83621/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19664: [SPARK-22442][SQL] ScalaReflection should produce correc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19664 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83622/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19690: [SPARK-22467]Added a switch to support whether `stdout_s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19690 **[Test build #83621 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83621/testReport)** for PR 19690 at commit [`7b67148`](https://github.com/apache/spark/commit/7b671485e46a7e7c4fbce57b7f9e8fa66adcd82a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19664: [SPARK-22442][SQL] ScalaReflection should produce correc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19664 **[Test build #83622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83622/testReport)** for PR 19664 at commit [`10db6b4`](https://github.com/apache/spark/commit/10db6b4ba2ea099554743a2ebcfcb19c46ed264e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r149874063 --- Diff: python/pyspark/serializers.py --- @@ -213,7 +213,15 @@ def __repr__(self): return "ArrowSerializer" -def _create_batch(series): +def _create_batch(series, copy=False): --- End diff -- Yeah, we don't want to end up double copying if `copy=True`. Let me try something and if it ends up making things too complicated then we can remove the copy flag altogether and just rely on `fillna(0)` to always make a copy - not ideal but will be more simple --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19673: [SPARK-21640][SQL][PYTHON][R][FOLLOWUP] Add error...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19673 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19673: [SPARK-21640][SQL][PYTHON][R][FOLLOWUP] Add errorifexist...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19673 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19681 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19681 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83620/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19681: [SPARK-20652][sql] Store SQL UI data in the new app stat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19681 **[Test build #83620 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83620/testReport)** for PR 19681 at commit [`bb7388b`](https://github.com/apache/spark/commit/bb7388b86d7adf8bbf209cf7748c319c4b8c0c77). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18118 **[Test build #83631 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83631/testReport)** for PR 18118 at commit [`ea03683`](https://github.com/apache/spark/commit/ea03683a4c388eaee70bf66fc41fd89a3a81a6a3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19679: [SPARK-20647][core] Port StorageTab to the new UI backen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19679 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83619/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19679: [SPARK-20647][core] Port StorageTab to the new UI backen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19679 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r149871432 --- Diff: python/pyspark/serializers.py --- @@ -213,7 +213,15 @@ def __repr__(self): return "ArrowSerializer" -def _create_batch(series): +def _create_batch(series, copy=False): --- End diff -- Hmm, I guess it depends. With the method, it can reduce the number of copy if `s` doesn't include null values, but also it might increase the number if `s` includes null values and `copy=True`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19672: [SPARK-22456][SQL] Add support for dayofweek func...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19672 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19679: [SPARK-20647][core] Port StorageTab to the new UI backen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19679 **[Test build #83619 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83619/testReport)** for PR 19679 at commit [`fd59a24`](https://github.com/apache/spark/commit/fd59a24ee89ced2b74b52d702806547aa0c578e8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83627/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18118 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18118 **[Test build #83627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83627/testReport)** for PR 18118 at commit [`af01cc4`](https://github.com/apache/spark/commit/af01cc4ea2f9756d2a3405969c3d2bb5abb6be13). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19672: [SPARK-22456][SQL] Add support for dayofweek function
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19672 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19688 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19687: [SPARK-19644][SQL]Clean up Scala reflection garbage afte...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19687 @ManchesterUnited16 I ran your codes and didn't see `NotSerializableException`. How did you patch Spark with my PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19688 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19704 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19704 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83630/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19704 **[Test build #83630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83630/testReport)** for PR 19704 at commit [`b79885a`](https://github.com/apache/spark/commit/b79885ab4ac5c64421f600eaed65ad477ed3183e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19701 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83624/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19701 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19701: [SPARK-22211][SQL][FOLLOWUP] Fix bad merge for tests
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19701 **[Test build #83624 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83624/testReport)** for PR 19701 at commit [`890f608`](https://github.com/apache/spark/commit/890f60895789234c96764b8ff917a7bc4faed93b). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19688 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83618/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19688 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19688: [SPARK-22466][Spark Submit]export SPARK_CONF_DIR while c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19688 **[Test build #83618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83618/testReport)** for PR 19688 at commit [`36ac736`](https://github.com/apache/spark/commit/36ac736856f70e4e9b7589017460bef19c01ce8c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/19704 LGTM, thanks @ueshin ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19704 **[Test build #83630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83630/testReport)** for PR 19704 at commit [`b79885a`](https://github.com/apache/spark/commit/b79885ab4ac5c64421f600eaed65ad477ed3183e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19646: [SPARK-22417][PYTHON] Fix for createDataFrame fro...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r149867086 --- Diff: python/pyspark/sql/tests.py --- @@ -2592,6 +2592,21 @@ def test_create_dataframe_from_array_of_long(self): df = self.spark.createDataFrame(data) self.assertEqual(df.first(), Row(longarray=[-9223372036854775808, 0, 9223372036854775807])) +@unittest.skipIf(not _have_pandas, "Pandas not installed") --- End diff -- Ah, in that case, maybe we need to revert one of the two original patches and fix one by one, or merge the two follow-ups into one as a hot-fix pr. cc @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19704 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19704 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83628/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19704 **[Test build #83628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83628/testReport)** for PR 19704 at commit [`dfdb5fe`](https://github.com/apache/spark/commit/dfdb5fea15499c7893d8c42dfd0307a3e4e274fa). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for crea...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19704 **[Test build #83628 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83628/testReport)** for PR 19704 at commit [`dfdb5fe`](https://github.com/apache/spark/commit/dfdb5fea15499c7893d8c42dfd0307a3e4e274fa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19695: [SPARK-22377][BUILD] Use /usr/sbin/lsof if lsof does not...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19695 **[Test build #83629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83629/testReport)** for PR 19695 at commit [`a6642fa`](https://github.com/apache/spark/commit/a6642fa41795cff82ec30c38e3c909d8025f358f). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19704: [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix f...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/19704 [SPARK-22417][PYTHON][FOLLOWUP][BRANCH-2.2] Fix for createDataFrame from pandas.DataFrame with timestamp ## What changes were proposed in this pull request? This is a follow-up of #19646 for branch-2.2. The original pr breaks branch-2.2 because the cherry-picked patch doesn't include some code which exists in master. ## How was this patch tested? Existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-22417_2.2/fup1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19704.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19704 commit 37eb04c5e8b4e2dfd6db87439ff5a9f6b3ab8039 Author: Takuya UESHIN Date: 2017-11-09T04:37:57Z Add missing code. commit dfdb5fea15499c7893d8c42dfd0307a3e4e274fa Author: Takuya UESHIN Date: 2017-11-09T04:38:55Z Modify a test to avoid DDL format type string. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19649: [SPARK-22405][SQL] Add new alter table and alter databas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19649 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83617/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19649: [SPARK-22405][SQL] Add new alter table and alter databas...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19649 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19649: [SPARK-22405][SQL] Add new alter table and alter databas...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19649 **[Test build #83617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83617/testReport)** for PR 19649 at commit [`6b4fcff`](https://github.com/apache/spark/commit/6b4fcff9288ab3942f026dbdb053c69a0fdb31b7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19646: [SPARK-22417][PYTHON] Fix for createDataFrame fro...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r149866007 --- Diff: python/pyspark/sql/tests.py --- @@ -2592,6 +2592,21 @@ def test_create_dataframe_from_array_of_long(self): df = self.spark.createDataFrame(data) self.assertEqual(df.first(), Row(longarray=[-9223372036854775808, 0, 9223372036854775807])) +@unittest.skipIf(not _have_pandas, "Pandas not installed") --- End diff -- BTW, @ueshin . `branch-2.2` Jenkins will fail due to #19701 . Could you merge #19701 first? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19646: [SPARK-22417][PYTHON] Fix for createDataFrame fro...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r149865875 --- Diff: python/pyspark/sql/tests.py --- @@ -2592,6 +2592,21 @@ def test_create_dataframe_from_array_of_long(self): df = self.spark.createDataFrame(data) self.assertEqual(df.first(), Row(longarray=[-9223372036854775808, 0, 9223372036854775807])) +@unittest.skipIf(not _have_pandas, "Pandas not installed") --- End diff -- Great, @ueshin ! :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19646: [SPARK-22417][PYTHON] Fix for createDataFrame fro...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r149865798 --- Diff: python/pyspark/sql/tests.py --- @@ -2592,6 +2592,21 @@ def test_create_dataframe_from_array_of_long(self): df = self.spark.createDataFrame(data) self.assertEqual(df.first(), Row(longarray=[-9223372036854775808, 0, 9223372036854775807])) +@unittest.skipIf(not _have_pandas, "Pandas not installed") --- End diff -- Thank you, @BryanCutler ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19646: [SPARK-22417][PYTHON] Fix for createDataFrame fro...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r149865739 --- Diff: python/pyspark/sql/tests.py --- @@ -2592,6 +2592,21 @@ def test_create_dataframe_from_array_of_long(self): df = self.spark.createDataFrame(data) self.assertEqual(df.first(), Row(longarray=[-9223372036854775808, 0, 9223372036854775807])) +@unittest.skipIf(not _have_pandas, "Pandas not installed") --- End diff -- I can take it over. I'll submit a pr soon. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19702 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19702 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83616/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19702: [SPARK-10365][SQL] Support Parquet logical type TIMESTAM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19702 **[Test build #83616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83616/testReport)** for PR 19702 at commit [`5ca8bb5`](https://github.com/apache/spark/commit/5ca8bb5904ec85c3c7bb73ab91b1004de5763627). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org