Ekta Khanna created MADLIB-1392:
-----------------------------------
Summary: DL: Preprocessor support for asymmetric segment
distribution
Key: MADLIB-1392
URL: https://issues.apache.org/jira/browse/MADLIB-1392
Project: Apache MADlib
Issue Type: New Feature
Components: Deep Learning
Reporter: Ekta Khanna
Fix For: v1.17
Add asymmetric segment redistribution support to the deep learning
preprocessor. Applies to {{training_preprocessor_dl()}} and
{{validation_preprocessor_dl()}}
{code:java}
training_preprocessor_dl(source_table,
output_table,
dependent_varname,
independent_varname,
buffer_size,
normalizing_const,
num_classes,
distribution_rules -- new optional param
)
{code}
Following are the possible values for the new optional
param({{distribution_rules}})
# TEXT, *default*: {{all_segments}}. Specifies how to distribute the
{{output_table}}. This is important for how the fit function will use resources
on the cluster. The default {{all_segments}} means the {{output_table}} will be
distributed to all segments in the database cluster.
# If you specify {{gpu_segments}} then the {{output_table}} will be
distributed to all segments that are on hosts that have GPUs attached. This
will make maximum use of GPU resources.
# You can also specify the name of a resources table containing the segments
to use for training. This table is typically created and maintained by the
database administrator. Must contain a column called {{dbid}} that specifies
the segment id from the {{gp_segment_configuration}} table.
Sample {{segments_to_use}} table:
{code:java}
dbid | notes
-----|--------------
2 | comment here
3 | comment here
4 | comment here
5 | comment here
{code}
Same deal as above ^^^ for validation preprocessor.
This change adds a new column to the output summary table {{gpu_config}},
contains the following values:
# if {{distribution_policy}} = {{all_segments}}, then {{all_segments}}
# if {{distribution_policy}} = {{gpu_segments}}, then array of segments ids all
segments that are on hosts that have GPUs attached
# if {{distribution_policy}} = {{segments_to_use_table}}, then array of
segments ids, for the above sample {{segments_to_use}} table -> [2,3,4,5]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)