[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21267


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r188144573
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,22 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
+if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+self._python_includes.append(filename)
+sys.path.insert(1, filepath)
+except Exception:
+from pyspark import util
+warnings.warn(
--- End diff --

Likewise, I checked the warning manually:

```
.../pyspark/context.py:229: RuntimeWarning: Failed to add file 
[/home/spark/tmp.py] speficied in 'spark.submit.pyFiles' to Python path:

...
  /usr/lib64/python27.zip
  /usr/lib64/python2.7
... 
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-14 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r188037259
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
+if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+self._python_includes.append(filename)
+sys.path.insert(1, filepath)
+except Exception as e:
+from pyspark import util
+warnings.warn(
+"Python file [%s] specified in 
'spark.submit.pyFiles' failed "
--- End diff --

Simplify this message?

"Failed to add file [%s] speficied in 'spark.submit.pyFiles' to Python 
path:\n  %s"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r187274825
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
--- End diff --

Yup, that's only missing on driver side in this mode specifically. Yarn 
doesn't add it since `spark.files` is not set if I understood correctly. They 
are specially handled in case of submit but shell case seems missing. 

I described a bit in the PR description too.

> In case of Yarn client and cluster with submit, these are manually being 
handled. In particular #6360 added most of the logics. In this case, the Python 
path looks manually set via, for example, deploy.PythonRunner. We don't use 
spark.files here.





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r187274278
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
--- End diff --

Are 'spark.submit.pyFiles' files only missing on driver side? I mean, if 
they are not added by `SparkContext.addFile`, shouldn't they also be missing on 
executors?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r187264822
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
--- End diff --

I don't think so but that's already being done in other cluster / client 
modes. The copies are made via addFile in other modes but it's not being copied 
in this case specifically. I think we should better consistently copy.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r187259682
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
+if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
--- End diff --

Oh, I see.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-10 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r187259493
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
--- End diff --

For file types in `PACKAGE_EXTENSIONS`, do we need to copy?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r187216038
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
--- End diff --

that's the initial approach I tried. thing is, .py file in the 
configuration. it needs its parent directory (not .py file itself) and it would 
add other .py files too if there are in the directort.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r187133079
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
--- End diff --

Is this copy necessary? Couldn't you just add `path` to `sys.path` (instead 
of adding `filepath`) and that would solve the problem?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r186920316
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
+if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+self._python_includes.append(filename)
+sys.path.insert(1, filepath)
+except Exception as e:
+from pyspark import util
+warnings.warn(
--- End diff --

BTW, this should now be safer in any case since we now don't put 
non-existent files and print out warnings.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r186673789
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
+if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
--- End diff --

the root is added into the path above. .py file needs its parent directory 
.. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r186670486
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
+if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
--- End diff --

Am I missing anything? Looks like `PACKAGE_EXTENSIONS = ('.zip', '.egg', 
'.jar')`. So `.py` seems not in that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-08 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21267#discussion_r186650331
  
--- Diff: python/pyspark/context.py ---
@@ -211,9 +211,23 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 for path in self._conf.get("spark.submit.pyFiles", "").split(","):
 if path != "":
 (dirname, filename) = os.path.split(path)
-if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
-self._python_includes.append(filename)
-sys.path.insert(1, 
os.path.join(SparkFiles.getRootDirectory(), filename))
+try:
+filepath = os.path.join(SparkFiles.getRootDirectory(), 
filename)
+if not os.path.exists(filepath):
+# In case of YARN with shell mode, 
'spark.submit.pyFiles' files are
+# not added via SparkContext.addFile. Here we 
check if the file exists,
+# try to copy and then add it to the path. See 
SPARK-21945.
+shutil.copyfile(path, filepath)
+if filename[-4:].lower() in self.PACKAGE_EXTENSIONS:
+self._python_includes.append(filename)
+sys.path.insert(1, filepath)
+except Exception as e:
+from pyspark import util
+warnings.warn(
--- End diff --

Log was also tested manually:

```
.../python/pyspark/context.py:230: RuntimeWarning: Python file 
[/home/spark/tmp.py] specified in 'spark.submit.pyFiles' failed to be added in 
the Python path, excluding this in the Python path.
  : ...
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21267: [SPARK-21945][YARN][PYTHON] Make --py-files work ...

2018-05-08 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/21267

[SPARK-21945][YARN][PYTHON] Make --py-files work in PySpark shell in Yarn 
client mode

## What changes were proposed in this pull request?

### Problem

When we run _PySpark shell with Yarn client mode_, specified `--py-files` 
are not recognised in _driver side_.

Here are the steps I took to check:

```bash
$ cat /home/spark/tmp.py
def testtest():
return 1
```

```bash
$ ./bin/pyspark --master yarn --deploy-mode client --py-files 
/home/spark/tmp.py
```

```python
>>> def test():
... import tmp
... return tmp.testtest()
...
>>> spark.range(1).rdd.map(lambda _: test()).collect()  # executor side
[1]
>>> test()  # driver side
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 2, in test
ImportError: No module named tmp
```

### How it happened?

Unlike Yarn cluster and client mode with Spark submit, when Yarn client 
mode with PySpark shell specifically,

1. It first runs Python shell via:


https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L158
 as pointed out by @tgravescs in the JIRA.

2. this triggers shell.py and submit another application to launch a py4j 
gateway:


https://github.com/apache/spark/blob/209b9361ac8a4410ff797cff1115e1888e2f7e66/python/pyspark/java_gateway.py#L45-L60

3. it runs a Py4J gateway:


https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L425

4. it copies --py-files  into local temp directory:


https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L365-L376

and then these directories are set up to `spark.submit.pyFiles`

5. Py4J JVM is launched and then the Python paths are set via:


https://github.com/apache/spark/blob/7013eea11cb32b1e0038dc751c485da5c94a484b/python/pyspark/context.py#L209-L216

However, these are not actually set because those files were copied into a 
tmp directory in 4. whereas this code path looks for 
`SparkFiles.getRootDirectory` where the files are stored only when 
`SparkContext.addFile()` is called.

In other cluster mode, `spark.files` are set via:


https://github.com/apache/spark/blob/3cb82047f2f51af553df09b9323796af507d36f8/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L554-L555

and those files are explicitly added via:


https://github.com/apache/spark/blob/ecb8b383af1cf1b67f3111c148229e00c9c17c40/core/src/main/scala/org/apache/spark/SparkContext.scala#L395

So we are fine in other modes.

In case of Yarn client and submit with _submit_, these are manually being 
handled. In particular https://github.com/apache/spark/pull/6360 added most of 
the logics. In this case, the Python path looks manually set via, for example, 
`deploy.PythonRunner`. We don't use `spark.files` here. 

### How does the PR fix the problem?

I tried to make an isolated approach as possible as I can: simply copy py 
file or zip files into `SparkFiles.getRootDirectory()` in driver side if not 
existing. Another possible way is to set `spark.files` but it does unnecessary 
stuff together and sounds a bit invasive.

### Before

```python
>>> def test():
... import tmp
... return tmp.testtest()
...
>>> spark.range(1).rdd.map(lambda _: test()).collect()
[1]
>>> test()
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 2, in test
ImportError: No module named tmp
```

### After

```python
>>> def test():
... import tmp
... return tmp.testtest()
...
>>> spark.range(1).rdd.map(lambda _: test()).collect()

[1]
>>> test()
1
```

## How was this patch tested?

I manually tested in standalone and yarn cluster with PySpark shell. .zip 
and .py files were also tested with the similar steps above. It's difficult to 
add a test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-21945

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21267


commit 68be3baef22d8b7aa58a432cb5bd12437c07feb7
Author: hyukjinkwon 
Date:   2018-05-08T07:36:31Z

Make --py-files work in PySpark shell in