Pyspark dataframe read
Hello everyone. It seems pyspark dataframe read is broken for reading multiple files. sql.read.json( "file1,file2") fails with java.io.IOException: No input paths specified in job. This used to work in spark 1.4 and also still work with sc.textFile Blaž
Re: Pyspark dataframe read
i ran into the same thing in scala api. we depend heavily on comma separated paths, and it no longer works. On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl wrote: > Hello everyone. > > It seems pyspark dataframe read is broken for reading multiple files. > > sql.read.json( "file1,file2") fails with java.io.IOException: No input > paths specified in job. > > This used to work in spark 1.4 and also still work with sc.textFile > > Blaž >
Re: Pyspark dataframe read
Could someone please file a JIRA to track this? https://issues.apache.org/jira/browse/SPARK On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers wrote: > i ran into the same thing in scala api. we depend heavily on comma > separated paths, and it no longer works. > > > On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl wrote: > >> Hello everyone. >> >> It seems pyspark dataframe read is broken for reading multiple files. >> >> sql.read.json( "file1,file2") fails with java.io.IOException: No input >> paths specified in job. >> >> This used to work in spark 1.4 and also still work with sc.textFile >> >> Blaž >> > >
Re: Pyspark dataframe read
I think the problem is that comma is actually a legitimate character for file name, and as a result ... On Tuesday, October 6, 2015, Josh Rosen wrote: > Could someone please file a JIRA to track this? > https://issues.apache.org/jira/browse/SPARK > > On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers > wrote: > >> i ran into the same thing in scala api. we depend heavily on comma >> separated paths, and it no longer works. >> >> >> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl > > wrote: >> >>> Hello everyone. >>> >>> It seems pyspark dataframe read is broken for reading multiple files. >>> >>> sql.read.json( "file1,file2") fails with java.io.IOException: No input >>> paths specified in job. >>> >>> This used to work in spark 1.4 and also still work with sc.textFile >>> >>> Blaž >>> >> >> >
Re: Pyspark dataframe read
i personally find the comma separated paths feature much more important than commas in paths (which one could argue you should avoid). but assuming people want to keep commas as legitimate characters in paths: https://issues.apache.org/jira/browse/SPARK-10185 https://github.com/apache/spark/pull/8416 On Tue, Oct 6, 2015 at 4:31 AM, Reynold Xin wrote: > I think the problem is that comma is actually a legitimate character for > file name, and as a result ... > > > On Tuesday, October 6, 2015, Josh Rosen wrote: > >> Could someone please file a JIRA to track this? >> https://issues.apache.org/jira/browse/SPARK >> >> On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers wrote: >> >>> i ran into the same thing in scala api. we depend heavily on comma >>> separated paths, and it no longer works. >>> >>> >>> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl wrote: >>> >>>> Hello everyone. >>>> >>>> It seems pyspark dataframe read is broken for reading multiple files. >>>> >>>> sql.read.json( "file1,file2") fails with java.io.IOException: No input >>>> paths specified in job. >>>> >>>> This used to work in spark 1.4 and also still work with sc.textFile >>>> >>>> Blaž >>>> >>> >>> >>