[ML] RandomForestRegressor training set size for each trees

2018-03-05 Thread OBones
We are using |RandomForestRegressor| from Spark 2.1.1 to train a model. To make sure we have the appropriate parameters we start with a very small dataset, one that has 6024 lines. The regressor is created with this code: |val rf = new RandomForestRegressor() .setLabelCol("MyLabel")

Getting multiple regression metrics at once

2017-12-18 Thread OBones
Hello, I'm working with the ML package for regression purposes and I get good results on my data. I'm now trying to get multiple metrics at once, as right now, I'm doing what is suggested by the examples here: https://spark.apache.org/docs/2.1.0/ml-classification-regression.html Basically

[ML] Performance issues with GBTRegressor

2017-07-12 Thread OBones
Hello all, I'm using Spark for medium to large datasets regression analysis and its performance are very great when using random forest or decision trees. Continuing my experimentation, I started using GBTRegressor and am finding it extremely slow when compared to R while both other methods

Re: [ML] Stop conditions for RandomForest

2017-06-28 Thread OBones
max(X, Y). Hence, are they different? On Tue, Jun 27, 2017 at 11:07 PM, OBones <obo...@free.fr <mailto:obo...@free.fr>> wrote: Hello, Reading around on the theory behind tree based regression, I concluded that there are various reasons to stop exploring the tree

[ML] Stop conditions for RandomForest

2017-06-27 Thread OBones
Hello, Reading around on the theory behind tree based regression, I concluded that there are various reasons to stop exploring the tree when a given node has been reached. Among these, I have those two: 1. When starting to process a node, if its size (row count) is less than X then consider

Re: [How-To] Migrating from mllib.tree.DecisionTree to ml.regression.DecisionTreeRegressor

2017-06-15 Thread OBones
OBones wrote: So, I tried to rewrite my sample code using the ml package and it is very much easier to use, no need for the LabeledPoint transformation. Here is the code I came up with: val dt = new DecisionTreeRegressor() .setPredictionCol("Y") .setImpurity

[How-To] Migrating from mllib.tree.DecisionTree to ml.regression.DecisionTreeRegressor

2017-06-15 Thread OBones
Hello, I have written the following scala code to train a regression tree, based on mllib: val conf = new SparkConf().setAppName("DecisionTreeRegressionExample") val sc = new SparkContext(conf) val spark = new SparkSession.Builder().getOrCreate() val sourceData =

Re: [How-To] Custom file format as source

2017-06-15 Thread OBones
Thanks to both of you, this should get me started. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[How-To] Custom file format as source

2017-06-12 Thread OBones
Hello, I have an application here that generates data files in a custom binary format that provides the following information: Column list, each column has a data type (64 bit integer, 32 bit string index, 64 bit IEEE float, 1 byte boolean) Catalogs that give modalities for some columns (ie,