Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Reynold Xin
You can write the data to local hdfs (or local disk) and just load it from there. On Mon, Oct 5, 2015 at 4:37 PM, Jegan wrote: > Thanks for your suggestion Ted. > > Unfortunately at this point of time I cannot go beyond 1000 partitions. I > am writing this data to BigQuery

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Jegan
I am sorry, I didn't understand it completely. Are you suggesting to copy the files from S3 to HDFS? Actually, that is what I am doing. I am reading the files using Spark and persisting it locally. Or did you actually mean to ask the producer to write the files directly to HDFS instead of S3? I

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Reynold Xin
I meant to say just copy everything to a local hdfs, and then don't use caching ... On Mon, Oct 5, 2015 at 4:52 PM, Jegan wrote: > I am sorry, I didn't understand it completely. Are you suggesting to copy > the files from S3 to HDFS? Actually, that is what I am doing. I am

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Ted Yu
As a workaround, can you set the number of partitions higher in the sc.textFile method ? Cheers On Mon, Oct 5, 2015 at 3:31 PM, Jegan wrote: > Hi All, > > I am facing the below exception when the size of the file being read in a > partition is above 2GB. This is apparently

Re: IllegalArgumentException: Size exceeds Integer.MAX_VALUE

2015-10-05 Thread Jegan
Thanks for your suggestion Ted. Unfortunately at this point of time I cannot go beyond 1000 partitions. I am writing this data to BigQuery and it has a limit of 1000 jobs per day for a table(they have some limits on this) I currently create 1 load job per partition. Is there any other