HyukjinKwon commented on a change in pull request #29703:
URL: https://github.com/apache/spark/pull/29703#discussion_r492425304



##########
File path: python/pyspark/find_spark_home.py
##########
@@ -42,7 +42,11 @@ def is_spark_home(path):
     import_error_raised = False
     from importlib.util import find_spec
     try:
+        # Spark distribution can be downloaded when HADOOP_VERSION environment 
variable is set.
+        # We should look up this directory first, see also SPARK-32017.
+        spark_dist_dir = "spark-distribution"
         module_home = os.path.dirname(find_spec("pyspark").origin)
+        paths.append(os.path.join(module_home, spark_dist_dir))

Review comment:
       `os.path.dirname(os.path.realpath(__file__))` is usually in a script 
directory because `find_spark_home.py` is included as a script at `setup.py`: 
https://github.com/apache/spark/blob/058e61ae143a3619fc13fb0c316044a75f7bc15f/python/setup.py#L187
   
   for example, it will be user's `bin`.
   
   I thought `os.path.dirname(os.path.realpath(__file__))` is more just dev 
purpose. I checked that it correctly points out the newly installed Spark home.
   
   However, sure. why not be safer :-) I will update.

##########
File path: python/pyspark/find_spark_home.py
##########
@@ -42,7 +42,11 @@ def is_spark_home(path):
     import_error_raised = False
     from importlib.util import find_spec
     try:
+        # Spark distribution can be downloaded when HADOOP_VERSION environment 
variable is set.
+        # We should look up this directory first, see also SPARK-32017.
+        spark_dist_dir = "spark-distribution"
         module_home = os.path.dirname(find_spec("pyspark").origin)
+        paths.append(os.path.join(module_home, spark_dist_dir))

Review comment:
       `os.path.dirname(os.path.realpath(__file__))` is usually in a script 
directory because `find_spark_home.py` is included as a script at `setup.py`: 
https://github.com/apache/spark/blob/058e61ae143a3619fc13fb0c316044a75f7bc15f/python/setup.py#L187
   
   for example, it will be user's `bin`.
   
   I thought `os.path.dirname(os.path.realpath(__file__))` is more just for dev 
purpose or non-pip installed PySpark. I checked that it correctly points out 
the newly installed Spark home.
   
   However, sure. why not be safer :-) I will update.

##########
File path: python/pyspark/find_spark_home.py
##########
@@ -42,7 +42,11 @@ def is_spark_home(path):
     import_error_raised = False
     from importlib.util import find_spec
     try:
+        # Spark distribution can be downloaded when HADOOP_VERSION environment 
variable is set.
+        # We should look up this directory first, see also SPARK-32017.
+        spark_dist_dir = "spark-distribution"
         module_home = os.path.dirname(find_spec("pyspark").origin)
+        paths.append(os.path.join(module_home, spark_dist_dir))

Review comment:
       `os.path.dirname(os.path.realpath(__file__))` is usually in a script 
directory because `find_spark_home.py` is included as a script at `setup.py`: 
https://github.com/apache/spark/blob/058e61ae143a3619fc13fb0c316044a75f7bc15f/python/setup.py#L187
   
   for example, it will be user's `bin`.
   
   I thought `os.path.dirname(os.path.realpath(__file__))` is more just for dev 
purpose. I checked that it correctly points out the newly installed Spark home.
   
   However, sure. why not be safer :-) I will update.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to