You can write the data to local hdfs (or local disk) and just load it from
there.
On Mon, Oct 5, 2015 at 4:37 PM, Jegan wrote:
> Thanks for your suggestion Ted.
>
> Unfortunately at this point of time I cannot go beyond 1000 partitions. I
> am writing this data to BigQuery
I am sorry, I didn't understand it completely. Are you suggesting to copy
the files from S3 to HDFS? Actually, that is what I am doing. I am reading
the files using Spark and persisting it locally.
Or did you actually mean to ask the producer to write the files directly to
HDFS instead of S3? I
I meant to say just copy everything to a local hdfs, and then don't use
caching ...
On Mon, Oct 5, 2015 at 4:52 PM, Jegan wrote:
> I am sorry, I didn't understand it completely. Are you suggesting to copy
> the files from S3 to HDFS? Actually, that is what I am doing. I am
As a workaround, can you set the number of partitions higher in the
sc.textFile method ?
Cheers
On Mon, Oct 5, 2015 at 3:31 PM, Jegan wrote:
> Hi All,
>
> I am facing the below exception when the size of the file being read in a
> partition is above 2GB. This is apparently
Thanks for your suggestion Ted.
Unfortunately at this point of time I cannot go beyond 1000 partitions. I
am writing this data to BigQuery and it has a limit of 1000 jobs per day
for a table(they have some limits on this) I currently create 1 load job
per partition. Is there any other