Re: Size exceeds Integer.MAX_VALUE issue with RandomForest

Pulluru Ranjith Mon, 18 Sep 2017 01:58:11 -0700

Hi,

Here are the commands that are used.
-----
> spark.default.parallelism=1000
> sparkR.session()
Java ref type org.apache.spark.sql.SparkSession id 1
> sql("use test")
SparkDataFrame[]
> mydata <-sql("select c1 ,p1 ,rt1 ,c2 ,p2 ,rt2 ,avt,avn from test_temp2
where vdr = 'TEST31X' ")
>
> nrow(mydata)
[1] 544140
> lat_model <- spark.randomForest( mydata, avt~ c1 + p1 + rt1 + c2 + p2 +
rt2 , maxDepth = 30)
[Stage 10:==================================================> (7 + 1) /
8]17/09/18 10:50:30 WARN TaskSetManager: Lost task 0.0 in stage 10.0 (TID
66, node1.test, executor 1): java.lang.IllegalArgumentException: Size
exceeds Integer.MAX_VALUE
----


On Sat, Sep 16, 2017 at 8:54 PM, Akhil Das <ak...@hacked.work> wrote:

> What are the parameters you passed to the classifier and what is the size
> of your train data? You are hitting that issue because one of the block
> size is over 2G, repartitioning the data will help.
>
> On Fri, Sep 15, 2017 at 7:55 PM, rpulluru <ranjith.pull...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I am using sparkR randomForest function and running into
>> java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE issue.
>> Looks like I am running into this issue
>> https://issues.apache.org/jira/browse/SPARK-1476, I used
>> spark.default.parallelism=1000 but still facing the same issue.
>>
>> Thanks
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> Cheers!
>
>

Re: Size exceeds Integer.MAX_VALUE issue with RandomForest

Reply via email to