Pyspark dataframe read

2015-10-06 Thread Blaž Šnuderl
Hello everyone.

It seems pyspark dataframe read is broken for reading multiple files.

sql.read.json( "file1,file2") fails with java.io.IOException: No input
paths specified in job.

This used to work in spark 1.4 and also still work with sc.textFile

Blaž


Re: Pyspark dataframe read

2015-10-06 Thread Koert Kuipers
i ran into the same thing in scala api. we depend heavily on comma
separated paths, and it no longer works.


On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl <snud...@gmail.com> wrote:

> Hello everyone.
>
> It seems pyspark dataframe read is broken for reading multiple files.
>
> sql.read.json( "file1,file2") fails with java.io.IOException: No input
> paths specified in job.
>
> This used to work in spark 1.4 and also still work with sc.textFile
>
> Blaž
>


Re: Pyspark dataframe read

2015-10-06 Thread Reynold Xin
I think the problem is that comma is actually a legitimate character for
file name, and as a result ...

On Tuesday, October 6, 2015, Josh Rosen <rosenvi...@gmail.com> wrote:

> Could someone please file a JIRA to track this?
> https://issues.apache.org/jira/browse/SPARK
>
> On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers <ko...@tresata.com
> <javascript:_e(%7B%7D,'cvml','ko...@tresata.com');>> wrote:
>
>> i ran into the same thing in scala api. we depend heavily on comma
>> separated paths, and it no longer works.
>>
>>
>> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl <snud...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','snud...@gmail.com');>> wrote:
>>
>>> Hello everyone.
>>>
>>> It seems pyspark dataframe read is broken for reading multiple files.
>>>
>>> sql.read.json( "file1,file2") fails with java.io.IOException: No input
>>> paths specified in job.
>>>
>>> This used to work in spark 1.4 and also still work with sc.textFile
>>>
>>> Blaž
>>>
>>
>>
>


Re: Pyspark dataframe read

2015-10-06 Thread Koert Kuipers
i personally find the comma separated paths feature much more important
than commas in paths (which one could argue you should avoid).

but assuming people want to keep commas as legitimate characters in paths:
https://issues.apache.org/jira/browse/SPARK-10185
https://github.com/apache/spark/pull/8416



On Tue, Oct 6, 2015 at 4:31 AM, Reynold Xin <r...@databricks.com> wrote:

> I think the problem is that comma is actually a legitimate character for
> file name, and as a result ...
>
>
> On Tuesday, October 6, 2015, Josh Rosen <rosenvi...@gmail.com> wrote:
>
>> Could someone please file a JIRA to track this?
>> https://issues.apache.org/jira/browse/SPARK
>>
>> On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i ran into the same thing in scala api. we depend heavily on comma
>>> separated paths, and it no longer works.
>>>
>>>
>>> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl <snud...@gmail.com> wrote:
>>>
>>>> Hello everyone.
>>>>
>>>> It seems pyspark dataframe read is broken for reading multiple files.
>>>>
>>>> sql.read.json( "file1,file2") fails with java.io.IOException: No input
>>>> paths specified in job.
>>>>
>>>> This used to work in spark 1.4 and also still work with sc.textFile
>>>>
>>>> Blaž
>>>>
>>>
>>>
>>


Re: Pyspark dataframe read

2015-10-06 Thread Josh Rosen
Could someone please file a JIRA to track this?
https://issues.apache.org/jira/browse/SPARK

On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers <ko...@tresata.com> wrote:

> i ran into the same thing in scala api. we depend heavily on comma
> separated paths, and it no longer works.
>
>
> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl <snud...@gmail.com> wrote:
>
>> Hello everyone.
>>
>> It seems pyspark dataframe read is broken for reading multiple files.
>>
>> sql.read.json( "file1,file2") fails with java.io.IOException: No input
>> paths specified in job.
>>
>> This used to work in spark 1.4 and also still work with sc.textFile
>>
>> Blaž
>>
>
>