Repository: spark Updated Branches: refs/heads/branch-2.0 380b099fc -> 3487b0203
[SPARK-17805][PYSPARK] Fix in sqlContext.read.text when pass in list of paths ## What changes were proposed in this pull request? If given a list of paths, `pyspark.sql.readwriter.text` will attempt to use an undefined variable `paths`. This change checks if the param `paths` is a basestring and then converts it to a list, so that the same variable `paths` can be used for both cases ## How was this patch tested? Added unit test for reading list of files Author: Bryan Cutler <cutl...@gmail.com> Closes #15379 from BryanCutler/sql-readtext-paths-SPARK-17805. (cherry picked from commit bcaa799cb01289f73e9f48526e94653a07628983) Signed-off-by: Reynold Xin <r...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3487b020 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3487b020 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3487b020 Branch: refs/heads/branch-2.0 Commit: 3487b020354988a91181f23b1c6711bfcdb4c529 Parents: 380b099 Author: Bryan Cutler <cutl...@gmail.com> Authored: Fri Oct 7 00:27:55 2016 -0700 Committer: Reynold Xin <r...@databricks.com> Committed: Fri Oct 7 00:28:02 2016 -0700 ---------------------------------------------------------------------- python/pyspark/sql/readwriter.py | 4 ++-- python/pyspark/sql/tests.py | 6 ++++++ 2 files changed, 8 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/3487b020/python/pyspark/sql/readwriter.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/readwriter.py b/python/pyspark/sql/readwriter.py index dc13a81..e62f483 100644 --- a/python/pyspark/sql/readwriter.py +++ b/python/pyspark/sql/readwriter.py @@ -287,8 +287,8 @@ class DataFrameReader(OptionUtils): [Row(value=u'hello'), Row(value=u'this')] """ if isinstance(paths, basestring): - path = [paths] - return self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(path))) + paths = [paths] + return self._df(self._jreader.text(self._spark._sc._jvm.PythonUtils.toSeq(paths))) @since(2.0) def csv(self, path, schema=None, sep=None, encoding=None, quote=None, escape=None, http://git-wip-us.apache.org/repos/asf/spark/blob/3487b020/python/pyspark/sql/tests.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index 1ec40ce..3343bd7 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -1697,6 +1697,12 @@ class SQLTests(ReusedPySparkTestCase): "does_not_exist", lambda: spark.catalog.uncacheTable("does_not_exist")) + def test_read_text_file_list(self): + df = self.spark.read.text(['python/test_support/sql/text-test.txt', + 'python/test_support/sql/text-test.txt']) + count = df.count() + self.assertEquals(count, 4) + class HiveSparkSubmitTests(SparkSubmitTests): --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org