Re: wholeTextFiles()

Jakob Odersky Mon, 12 Dec 2016 20:21:53 -0800

Also, in case the issue was not due to the string length (however it
is still valid and may get you later), the issue may be due to some
other indexing issues which are currently being worked on here
https://issues.apache.org/jira/browse/SPARK-6235


On Mon, Dec 12, 2016 at 8:18 PM, Jakob Odersky <ja...@odersky.com> wrote:
> Hi Pradeep,
>
> I'm afraid you're running into a hard Java issue. Strings are indexed
> with signed integers and can therefore not be longer than
> approximately 2 billion characters. Could you use `textFile` as a
> workaround? It will give you an RDD of the files' lines instead.
>
> In general, this guide http://spark.apache.org/contributing.html gives
> information on how to contribute to spark, including instructions on
> how to file bug reports (which does not apply in this case as it isn't
> a bug in Spark).
>
> regards,
> --Jakob
>
> On Mon, Dec 12, 2016 at 7:34 PM, Pradeep <pradeep.mi...@mail.com> wrote:
>> Hi,
>>
>> Why there is an restriction on max file size that can be read by 
>> wholeTextFile() method.
>>
>> I can read a 1.5 gigs file but get Out of memory for 2 gig file.
>>
>> Also, how can I raise this as an defect in spark jira. Can someone please 
>> guide.
>>
>> Thanks,
>> Pradeep
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: wholeTextFiles()

Reply via email to