Hi, Here are the commands that are used. ----- > spark.default.parallelism=1000 > sparkR.session() Java ref type org.apache.spark.sql.SparkSession id 1 > sql("use test") SparkDataFrame[] > mydata <-sql("select c1 ,p1 ,rt1 ,c2 ,p2 ,rt2 ,avt,avn from test_temp2 where vdr = 'TEST31X' ") > > nrow(mydata) [1] 544140 > lat_model <- spark.randomForest( mydata, avt~ c1 + p1 + rt1 + c2 + p2 + rt2 , maxDepth = 30) [Stage 10:==================================================> (7 + 1) / 8]17/09/18 10:50:30 WARN TaskSetManager: Lost task 0.0 in stage 10.0 (TID 66, node1.test, executor 1): java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE ----
On Sat, Sep 16, 2017 at 8:54 PM, Akhil Das <ak...@hacked.work> wrote: > What are the parameters you passed to the classifier and what is the size > of your train data? You are hitting that issue because one of the block > size is over 2G, repartitioning the data will help. > > On Fri, Sep 15, 2017 at 7:55 PM, rpulluru <ranjith.pull...@gmail.com> > wrote: > >> Hi, >> >> I am using sparkR randomForest function and running into >> java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE issue. >> Looks like I am running into this issue >> https://issues.apache.org/jira/browse/SPARK-1476, I used >> spark.default.parallelism=1000 but still facing the same issue. >> >> Thanks >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > > -- > Cheers! > >