Re: How to resolve the SparkExecption : Size exceeds Integer.MAX_VALUE

2016-08-15 Thread Ewan Leith
I think this is more suited to the user mailing list than the dev one, but this almost always means you need to repartition your data into smaller partitions as one of the partitions is over 2GB. When you create your dataset, put something like . repartition(1000) at the end of the command

How to resolve the SparkExecption : Size exceeds Integer.MAX_VALUE

2016-08-15 Thread Minudika Malshan
Hi all, I am trying to create and train a model for a Kaggle competition dataset using Apache spark. The dataset has more than 10 million rows of data. But when training the model, I get an exception "*Size exceeds Integer.MAX_VALUE*". I found the same question has been raised in Stack overflow