Re: Random forest binary classification H20 difference Spark

2016-08-11 Thread Bedrytski Aliaksandr
Hi Samir, either use *dataframe.na.fill()* method or the *nvl()* UDF when selecting features: val train = sqlContext.sql("SELECT ... nvl(Field, 1.0) AS Field ... FROM test") -- Bedrytski Aliaksandr sp...@bedryt.ski On Wed, Aug 10, 2016, at 11:19, Yanbo Liang wrote: > Hi Samir, > > Did

Re: Random forest binary classification H20 difference Spark

2016-08-10 Thread Yanbo Liang
Hi Samir, Did you use VectorAssembler to assemble some columns into the feature column? If there are NULLs in your dataset, VectorAssembler will throw this exception. You can use DataFrame.drop() or DataFrame.replace() to drop/substitute NULL values. Thanks Yanbo 2016-08-07 19:51 GMT-07:00

Random forest binary classification H20 difference Spark

2016-08-07 Thread Javier Rey
Hi everybody. I have executed RF on H2O I didn't troubles with nulls values, by in contrast in Spark using dataframes and ML library I obtain this error,l I know my dataframe contains nulls, but I understand that Random Forest supports null values: "Values to assemble cannot be null" Any