Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13217#discussion_r64032005
  
    --- Diff: R/WINDOWS.md ---
    @@ -11,3 +11,19 @@ include Rtools and R in `PATH`.
     directory in Maven in `PATH`.
     4. Set `MAVEN_OPTS` as described in [Building 
Spark](http://spark.apache.org/docs/latest/building-spark.html).
     5. Open a command shell (`cmd`) in the Spark directory and run `mvn 
-DskipTests -Psparkr package`
    +
    +##  Unit tests
    +
    +To run existing unit tests in SparkR on Windows, the following setps are 
required (the steps below suppose you are in Spark root directory)
    +
    +1. Set `HADOOP_HOME`.
    +2. Download `winutils.exe` and locate this in `$HADOOP_HOME/bin`. 
    +
    +    It seems not requiring installing Hadoop but only this `winutils.exe`. 
It seems not included in Hadoop official binary releases so it should be built 
from source but it seems it is able to be downloaded from community (e.g. 
[steveloughran/winutils](https://github.com/steveloughran/winutils)).
    --- End diff --
    
    I wouldn't recommend putting it under the root of the project, as that only 
complicates the source tree and path cleanup; an adjacent directory works. And 
I think you may find that `HADOOP.DLL` is needed in places, as there are some 
JNI calls related to local file access and permissions/ACLs
    
    I'd suggest the following text:
    
    ----
    
    To run the SparkR unit tests on Windows, the following steps are required  
—assuming you are in the Spark root directory and do not have Apache Hadoop 
installed already:
    
    1. `cd ..`
    1. `mkdir hadoop`
    1. Download the relevant Hadoop bin package from 
[steveloughran/winutils](https://github.com/steveloughran/winutils). While 
these are not official ASF artifacts, they are built from the ASF release git 
hashes by a Hadoop PMC member on a dedicated Windows VM.
    1. Install the files into `hadoop\bin`; make sure that `winutils.exe` and 
`hadoop.dll` are present.
    1. Set the environment variable `HADOOP_HOME` to the full path to the newly 
created `hadoop` directory. 
    
    ----
    
    .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to