Pyspark dataframe read

2015-10-06 Thread Blaž Šnuderl
Hello everyone.

It seems pyspark dataframe read is broken for reading multiple files.

sql.read.json( "file1,file2") fails with java.io.IOException: No input
paths specified in job.

This used to work in spark 1.4 and also still work with sc.textFile

Blaž


Re: Pyspark dataframe read

2015-10-06 Thread Koert Kuipers
i ran into the same thing in scala api. we depend heavily on comma
separated paths, and it no longer works.


On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl  wrote:

> Hello everyone.
>
> It seems pyspark dataframe read is broken for reading multiple files.
>
> sql.read.json( "file1,file2") fails with java.io.IOException: No input
> paths specified in job.
>
> This used to work in spark 1.4 and also still work with sc.textFile
>
> Blaž
>


Re: Pyspark dataframe read

2015-10-06 Thread Josh Rosen
Could someone please file a JIRA to track this?
https://issues.apache.org/jira/browse/SPARK

On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers  wrote:

> i ran into the same thing in scala api. we depend heavily on comma
> separated paths, and it no longer works.
>
>
> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl  wrote:
>
>> Hello everyone.
>>
>> It seems pyspark dataframe read is broken for reading multiple files.
>>
>> sql.read.json( "file1,file2") fails with java.io.IOException: No input
>> paths specified in job.
>>
>> This used to work in spark 1.4 and also still work with sc.textFile
>>
>> Blaž
>>
>
>


Re: Pyspark dataframe read

2015-10-06 Thread Reynold Xin
I think the problem is that comma is actually a legitimate character for
file name, and as a result ...

On Tuesday, October 6, 2015, Josh Rosen  wrote:

> Could someone please file a JIRA to track this?
> https://issues.apache.org/jira/browse/SPARK
>
> On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers  > wrote:
>
>> i ran into the same thing in scala api. we depend heavily on comma
>> separated paths, and it no longer works.
>>
>>
>> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl > > wrote:
>>
>>> Hello everyone.
>>>
>>> It seems pyspark dataframe read is broken for reading multiple files.
>>>
>>> sql.read.json( "file1,file2") fails with java.io.IOException: No input
>>> paths specified in job.
>>>
>>> This used to work in spark 1.4 and also still work with sc.textFile
>>>
>>> Blaž
>>>
>>
>>
>


Re: Pyspark dataframe read

2015-10-06 Thread Koert Kuipers
i personally find the comma separated paths feature much more important
than commas in paths (which one could argue you should avoid).

but assuming people want to keep commas as legitimate characters in paths:
https://issues.apache.org/jira/browse/SPARK-10185
https://github.com/apache/spark/pull/8416



On Tue, Oct 6, 2015 at 4:31 AM, Reynold Xin  wrote:

> I think the problem is that comma is actually a legitimate character for
> file name, and as a result ...
>
>
> On Tuesday, October 6, 2015, Josh Rosen  wrote:
>
>> Could someone please file a JIRA to track this?
>> https://issues.apache.org/jira/browse/SPARK
>>
>> On Tue, Oct 6, 2015 at 1:21 AM, Koert Kuipers  wrote:
>>
>>> i ran into the same thing in scala api. we depend heavily on comma
>>> separated paths, and it no longer works.
>>>
>>>
>>> On Tue, Oct 6, 2015 at 3:02 AM, Blaž Šnuderl  wrote:
>>>
>>>> Hello everyone.
>>>>
>>>> It seems pyspark dataframe read is broken for reading multiple files.
>>>>
>>>> sql.read.json( "file1,file2") fails with java.io.IOException: No input
>>>> paths specified in job.
>>>>
>>>> This used to work in spark 1.4 and also still work with sc.textFile
>>>>
>>>> Blaž
>>>>
>>>
>>>
>>