Joseph K. Bradley created SPARK-3728:
----------------------------------------

             Summary: RandomForest: Learn models too large to store in memory
                 Key: SPARK-3728
                 URL: https://issues.apache.org/jira/browse/SPARK-3728
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
            Reporter: Joseph K. Bradley


Proposal: Write trees to disk as they are learned.

RandomForest currently uses a FIFO queue, which means training all trees at 
once via breadth-first search.  Using a FILO queue would encourage the code to 
finish one tree before moving on to new ones.  This would allow the code to 
write trees to disk as they are learned.

Note: It would also be possible to write nodes to disk as they are learned 
using a FIFO queue, once the example--node mapping is cached [JIRA].  The 
[Sequoia Forest package]() does this.  However, it could be useful to learn 
trees progressively, so that future functionality such as early stopping 
(training fewer trees than expected) could be supported.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to