[ https://issues.apache.org/jira/browse/SYSTEMML-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Niketan Pansare updated SYSTEMML-1160: -------------------------------------- Affects Version/s: SystemML 1.0 > Enable Prefetching of Mini-Batches > ---------------------------------- > > Key: SYSTEMML-1160 > URL: https://issues.apache.org/jira/browse/SYSTEMML-1160 > Project: SystemML > Issue Type: New Feature > Affects Versions: SystemML 1.0 > Reporter: Mike Dusenberry > Priority: Blocker > > For efficient training of large deep learning models, a mini-batch training > approach is preferred. On SystemML with the Spark backend, this currently > equates to grabbing a mini-batch from an RDD (via a PartitionPruning RDD -- > see SYSTEMML-951), and then using entirely single-node instructions for each > mini-batch. While the fetching of partitions has been made efficient, we > currently have to pause after each training step to grab the next partition. > For large models, training time is already an issue even for GPUs with > saturated input pipelines. Thus, we need to enable prefetching of > mini-batches that runs in parallel to the training loop. One possibility > would be to create an input queue that is fed from a prefetch thread, and > that then feeds the training loop. -- This message was sent by Atlassian JIRA (v6.3.15#6346)