spark git commit: [SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths

rxin Fri, 07 Oct 2016 00:28:57 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 380b099fc -> 3487b0203



[SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths

## What changes were proposed in this pull request?
If given a list of paths, `pyspark.sql.readwriter.text` will attempt to use an 
undefined variable `paths`.  This change checks if the param `paths` is a 
basestring and then converts it to a list, so that the same variable `paths` 
can be used for both cases

## How was this patch tested?
Added unit test for reading list of files

Author: Bryan Cutler <cutl...@gmail.com>

Closes #15379 from BryanCutler/sql-readtext-paths-SPARK-17805.

(cherry picked from commit bcaa799cb01289f73e9f48526e94653a07628983)
Signed-off-by: Reynold Xin <r...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3487b020
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3487b020
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3487b020

Branch: refs/heads/branch-2.0
Commit: 3487b020354988a91181f23b1c6711bfcdb4c529
Parents: 380b099
Author: Bryan Cutler <cutl...@gmail.com>
Authored: Fri Oct 7 00:27:55 2016 -0700
Committer: Reynold Xin <r...@databricks.com>
Committed: Fri Oct 7 00:28:02 2016 -0700

----------------------------------------------------------------------
 python/pyspark/sql/readwriter.py | 4 ++--
 python/pyspark/sql/tests.py      | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/3487b020/python/pyspark/sql/readwriter.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py
index dc13a81..e62f483 100644
--- a/python/pyspark/sql/readwriter.py
+++ b/python/pyspark/sql/readwriter.py
@@ -287,8 +287,8 @@ class DataFrameReader(OptionUtils):
         [Row(value=u'hello'), Row(value=u'this')]
         """
         if isinstance(paths, basestring):
-            path = [paths]
-        return 
self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(path)))
+            paths = [paths]
+        return 
self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(paths)))
 
     @since(2.0)
     def csv(self, path, schema=None, sep=None, encoding=None, quote=None, 
escape=None,

http://git-wip-us.apache.org/repos/asf/spark/blob/3487b020/python/pyspark/sql/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 1ec40ce..3343bd7 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -1697,6 +1697,12 @@ class SQLTests(ReusedPySparkTestCase):
             "does_not_exist",
             lambda: spark.catalog.uncacheTable("does_not_exist"))
 
+    def test_read_text_file_list(self):
+        df = self.spark.read.text(['python/test_support/sql/text-test.txt',
+                                   'python/test_support/sql/text-test.txt'])
+        count = df.count()
+        self.assertEquals(count, 4)
+
 
 class HiveSparkSubmitTests(SparkSubmitTests):
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths

Reply via email to