GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/14859
[SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate building and testing on Windows (currently SparkR only) ## What changes were proposed in this pull request? This PR adds the build automation on Windows with [AppVeyor](https://www.appveyor.com/) CI tool. Currently, this only runs the tests for SparkR as we have been having some issues with testing Windows-specific PRs (e.g. https://github.com/apache/spark/pull/14743 and https://github.com/apache/spark/pull/13165) and hard time to verify this. One concern is, this build is dependent on [steveloughran/winutils](https://github.com/steveloughran/winutils) for pre-built Hadoop bin package (who is a Hadoop PMC member). ## How was this patch tested? Manually, https://ci.appveyor.com/project/HyukjinKwon/spark/build/8-SPARK-17200-build Some tests are already being failed and this was found in https://github.com/apache/spark/pull/14743#issuecomment-241405287, which are currently as below: ``` Skipped ------------------------------------------------------------------------ 1. create DataFrame from RDD (@test_sparkSQL.R#200) - Hive is not build with SparkSQL, skipped 2. test HiveContext (@test_sparkSQL.R#1041) - Hive is not build with SparkSQL, skipped 3. read/write ORC files (@test_sparkSQL.R#1748) - Hive is not build with SparkSQL, skipped 4. enableHiveSupport on SparkSession (@test_sparkSQL.R#2480) - Hive is not build with SparkSQL, skipped Warnings ----------------------------------------------------------------------- 1. infer types and check types (@test_sparkSQL.R#109) - unable to identify current timezone 'C': please set environment variable 'TZ' Failed ------------------------------------------------------------------------- 1. Error: union on two RDDs (@test_binary_function.R#38) ----------------------- 1: textFile(sc, fileName) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_binary_function.R:38 2: callJMethod(sc, "textFile", path, getMinPartitions(sc, minPartitions)) 3: invokeJava(isStatic = FALSE, objId$id, methodName, ...) 4: stop(readString(conn)) 2. Error: zipPartitions() on RDDs (@test_binary_function.R#84) ----------------- 1: textFile(sc, fileName, 1) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_binary_function.R:84 2: callJMethod(sc, "textFile", path, getMinPartitions(sc, minPartitions)) 3: invokeJava(isStatic = FALSE, objId$id, methodName, ...) 4: stop(readString(conn)) 3. Error: saveAsObjectFile()/objectFile() following textFile() works (@test_binaryFile.R#31) 1: textFile(sc, fileName1, 1) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_binaryFile.R:31 2: callJMethod(sc, "textFile", path, getMinPartitions(sc, minPartitions)) 3: invokeJava(isStatic = FALSE, objId$id, methodName, ...) 4: stop(readString(conn)) 4. Error: saveAsObjectFile()/objectFile() works on a parallelized list (@test_binaryFile.R#46) 1: objectFile(sc, fileName) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_binaryFile.R:46 2: callJMethod(sc, "objectFile", path, getMinPartitions(sc, minPartitions)) 3: invokeJava(isStatic = FALSE, objId$id, methodName, ...) 4: stop(readString(conn)) 5. Error: saveAsObjectFile()/objectFile() following RDD transformations works (@test_binaryFile.R#57) 1: textFile(sc, fileName1) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_binaryFile.R:57 2: callJMethod(sc, "textFile", path, getMinPartitions(sc, minPartitions)) 3: invokeJava(isStatic = FALSE, objId$id, methodName, ...) 4: stop(readString(conn)) 6. Error: saveAsObjectFile()/objectFile() works with multiple paths (@test_binaryFile.R#85) 1: objectFile(sc, c(fileName1, fileName2)) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_binaryFile.R:85 2: callJMethod(sc, "objectFile", path, getMinPartitions(sc, minPartitions)) 3: invokeJava(isStatic = FALSE, objId$id, methodName, ...) 4: stop(readString(conn)) 7. Error: spark.glm save/load (@test_mllib.R#162) ------------------------------ 1: read.ml(modelPath) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:162 2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path) 3: invokeJava(isStatic = TRUE, className, methodName, ...) 4: stop(readString(conn)) 8. Error: glm save/load (@test_mllib.R#292) ------------------------------------ 1: read.ml(modelPath) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:292 2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path) 3: invokeJava(isStatic = TRUE, className, methodName, ...) 4: stop(readString(conn)) 9. Error: spark.kmeans (@test_mllib.R#340) ------------------------------------- 1: read.ml(modelPath) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:340 2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path) 3: invokeJava(isStatic = TRUE, className, methodName, ...) 4: stop(readString(conn)) 10. Error: spark.mlp (@test_mllib.R#371) --------------------------------------- 1: read.ml(modelPath) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:371 2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path) 3: invokeJava(isStatic = TRUE, className, methodName, ...) 4: stop(readString(conn)) 11. Error: spark.naiveBayes (@test_mllib.R#439) -------------------------------- 1: read.ml(modelPath) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:439 2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path) 3: invokeJava(isStatic = TRUE, className, methodName, ...) 4: stop(readString(conn)) 12. Error: spark.survreg (@test_mllib.R#496) ----------------------------------- 1: read.ml(modelPath) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:496 2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path) 3: invokeJava(isStatic = TRUE, className, methodName, ...) 4: stop(readString(conn)) 13. Error: spark.isotonicRegression (@test_mllib.R#541) ------------------------ 1: read.ml(modelPath) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:541 2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path) 3: invokeJava(isStatic = TRUE, className, methodName, ...) 4: stop(readString(conn)) 14. Error: spark.gaussianMixture (@test_mllib.R#603) --------------------------- 1: read.ml(modelPath) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:603 2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path) 3: invokeJava(isStatic = TRUE, className, methodName, ...) 4: stop(readString(conn)) 15. Error: spark.lda with libsvm (@test_mllib.R#636) --------------------------- 1: read.ml(modelPath) at C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:636 2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path) 3: invokeJava(isStatic = TRUE, className, methodName, ...) 4: stop(readString(conn)) DONE ========================================================================== ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-17200-build Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14859.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14859 ---- commit cf9f56c8571f947b5e89e741a2b5f2f5477843b2 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-08-29T07:24:18Z Appveyor SparkR Windows test draft commit b25eabaa88c4ada63e2d5a1bb802a48662f60d04 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-08-29T07:26:18Z Fix script path for R installation commit 2772002e832570ccb7ac2120a2775a2daf724832 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-08-29T07:27:49Z Fix the name of script for R installation commit 6cb4416a5afe66c624602b6ff36c5d23dc32b62f Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-08-29T07:31:10Z Upgrade maven version to 3.3.9 commit a2852a0f1b40d715eaf00947acdb076727e2d42e Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-08-29T08:08:11Z Clean up and fix the path for Hadoop bin package commit 5aca1045ccfd7eed35bd8fa7721867c363bd88a0 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-08-29T08:18:55Z Merged dependecies installation commit fbcfe135db8e6515f3dadccce7c2d72f6dec9b91 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-08-29T08:19:46Z Remove R installation script commit f3eb1636d1d9ec6568e8b048d41b5928698d244c Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-08-29T08:26:56Z Clean up the dependencies installation script commit fe95491bf7ef28f5ee0d7edb8ec5b14529815bcb Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-08-29T08:41:34Z Fix comment commit a8e74fc531bb83ceb14d930ca6f03c799fbde384 Author: hyukjinkwon <gurwls...@gmail.com> Date: 2016-08-29T08:43:01Z Uppercase for Maven in the comment ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org