[
https://issues.apache.org/jira/browse/MADLIB-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797632#comment-16797632
]
Domino Valdano edited comment on MADLIB-1308 at 3/21/19 2:27 AM:
-----------------------------------------------------------------
1.
a.) Yes, we should rename this to segments_per_host, and detect it
automatically. Also, because you can have a different number of segments on
each host, we either need to have the transition function do this detection
separately on each segment (so that it detects the number of segments on that
host), or change it into an array.
b.) No, it should never be anything but gpu0 or cpu0. The other gpu's are
there, but we intentionally hide all but one from each segment. Every segment
must use either exactly 1 gpu, or the cpu's (and cpu's only show up as cpu0 no
matter how many there are–this was only found through testing though, maybe we
should confirm this in the docs to be sure). The reason for this decision was
that we tested it with more than 1 gpu per segment and found that the
performance was nearly identical to a single gpu–so almost 1 full gpu is just
wasted. And allowing more than 1 segment to share a gpu makes the memory
issues we were facing worse and reduces the size of the dataset it can handle.
Therefore, we require that there be at most 1 gpu per segment. Any other gpus
will be ignored.
2.
This is the logic Omer and I came up with for assigning each segment to a
unique gpu. The gpu number the segment gets is just its segment id modulo the
number of segments (gpus) per host. In other words, if there are 4 segments on
each host, then they will get gpu 0, 1, 2, and 3 respectively. (To each, their
own gpu will appear as gpu0.) This logic will work only if there are the same
number of segments on each host, and as long as there is at least 1 gpu per
segment. They each get assigned their own gpu, and any extra gpus are ignored.
The formula will have to be modified if there are a different number of
segments on each host. I think this means we have to pass around an array
holding the number of segments on each host... or at the very least, the
segment id of the first segment on each host. Hopefully this is something that
can just be queried from a system table. For postgres, the current logic
should work fine as long as we set gp_segment_id = 1 and segments_per_host = 1.
But with postgres, it doesn't matter anyway whether we hide gpu's beyond gpu0
or not–so even easier would be to just skip calling the device detection
function entirely.
3.
a.) For gpus < segments, I think the best behavior would for the segments that
don't have a gpu to fall back on using cpu. I think it might work as-is, but
we should test it. If there is a required change, I suspect it will be
minimal. The real question is, will anyone want to run like this, knowing that
the slowest segment is going to be the one that determines the runtime? (ie,
they probably won't get performance that's any better than if they ran with all
cpus.)
b.) This should work fine as-is, no changes required. Extra gpu's are ignored.
was (Author: dvaldano):
1.
a.) Yes, we should rename this to segments_per_host, and detect it
automatically. Also, because you can have a different number of segments on
each host, we either need to have the transition function do this detection
separately on each segment (so that it detects the number of segments on that
host), or change it into an array.
b.) No, it should never be anything but gpu0 or cpu0. The other gpu's are
there, but we intentionally hide all but one from each segment. Every segment
must use either exactly 1 gpu, or the cpu's (and cpu's only show up as cpu0 no
matter how many there are–this was only found through testing though, maybe we
should confirm this in the docs to be sure). The reason for this decision was
that we tested it with more than 1 gpu per segment and found that the
performance was nearly identical to a single gpu–so almost 1 full gpu is just
wasted. And allowing more than 1 segment to share a gpu. Therefore, we
require that there be at most 1 gpu per segment. Any other gpus will be
ignored.
2.
This is the logic Omer and I came up with for assigning each segment to a
unique gpu. The gpu number the segment gets is just its segment id modulo the
number of segments (gpus) per host. In other words, if there are 4 segments on
each host, then they will get gpu 0, 1, 2, and 3 respectively. (To each, their
own gpu will appear as gpu0.) This logic will work only if there are the same
number of segments on each host, and as long as there is at least 1 gpu per
segment. They each get assigned their own gpu, and any extra gpus are ignored.
The formula will have to be modified if there are a different number of
segments on each host. I think this means we have to pass around an array
holding the number of segments on each host... or at the very least, the
segment id of the first segment on each host. Hopefully this is something that
can just be queried from a system table. For postgres, the current logic
should work fine as long as we set gp_segment_id = 1 and segments_per_host = 1.
But with postgres, it doesn't matter anyway whether we hide gpu's beyond gpu0
or not–so even easier would be to just skip calling the device detection
function entirely.
3.
a.) For gpus < segments, I think the best behavior would for the segments that
don't have a gpu to fall back on using cpu. I think it might work as-is, but
we should test it. If there is a required change, I suspect it will be
minimal. The real question is, will anyone want to run like this, knowing that
the slowest segment is going to be the one that determines the runtime? (ie,
they probably won't get performance that's any better than if they ran with all
cpus.)
b.) This should work fine as-is, no changes required. Extra gpu's are ignored.
> Change GPU related hardcoded things in madlib_keras.py_in
> ---------------------------------------------------------
>
> Key: MADLIB-1308
> URL: https://issues.apache.org/jira/browse/MADLIB-1308
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Deep Learning
> Reporter: Nandish Jayaram
> Priority: Major
> Fix For: v1.16
>
>
> Based on the code in PR [https://github.com/apache/madlib/pull/355:]
> # Currently in madlib_keras.py_in , we hardcod the following things
> ## gpus_per_host = 4
> ## device_name = '/cpu:0' or '/gpu:0' ( can the device ever be not named
> gpu0 or cpu0 ? )
> # Look into and document the usage of {{CUDA_VISIBLE_DEVICES}} when gpu_only
> is set to TRUE. Currently we set it to str(current_seg_id % gpus_per_host).
> How does this logic work and will it always work? How would this logic change
> for Postgres, since it has no segment_id.
> # What happens if
> no of gpus < no of segments
> no of gpus > no of segments
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)