[jira] [Comment Edited] (MADLIB-1308) Change GPU related hardcoded things in madlib_keras.py_in

Domino Valdano (JIRA) Wed, 20 Mar 2019 19:28:37 -0700


    [ 
https://issues.apache.org/jira/browse/MADLIB-1308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16797632#comment-16797632
 ]


Domino Valdano edited comment on MADLIB-1308 at 3/21/19 2:27 AM:
-----------------------------------------------------------------

 

1.

a.)  Yes, we should rename this to segments_per_host, and detect it 
automatically.  Also, because you can have a different number of segments on 
each host, we either need to have the transition function do this detection 
separately on each segment (so that it detects the number of segments on that 
host), or change it into an array.

 b.)  No, it should never be anything but gpu0 or cpu0.  The other gpu's are 
there, but we intentionally hide all but one from each segment.  Every segment 
must use either exactly 1 gpu, or the cpu's (and cpu's only show up as cpu0 no 
matter how many there are–this was only found through testing though, maybe we 
should confirm this in the docs to be sure).  The reason for this decision was 
that we tested it with more than 1 gpu per segment and found that the 
performance was nearly identical to a single gpu–so almost 1 full gpu is just 
wasted.  And allowing more than 1 segment to share a gpu makes the memory 
issues we were facing worse and reduces the size of the dataset it can handle.  
Therefore, we require that there be at most 1 gpu per segment.  Any other gpus 
will be ignored.

2.

  This is the logic Omer and I came up with for assigning each segment to a 
unique gpu.  The gpu number the segment gets is just its segment id modulo the 
number of segments (gpus) per host.  In other words, if there are 4 segments on 
each host, then they will get gpu 0, 1, 2, and 3 respectively.  (To each, their 
own gpu will appear as gpu0.)  This logic will work only if there are the same 
number of segments on each host, and as long as there is at least 1 gpu per 
segment.  They each get assigned their own gpu, and any extra gpus are ignored. 
 The formula will have to be modified if there are a different number of 
segments on each host.  I think this means we have to pass around an array 
holding the number of segments on each host... or at the very least, the 
segment id of the first segment on each host.  Hopefully this is something that 
can just be queried from a system table.  For postgres, the current logic 
should work fine as long as we set gp_segment_id = 1 and segments_per_host = 1. 
 But with postgres, it doesn't matter anyway whether we hide gpu's beyond gpu0 
or not–so even easier would be to just skip calling the device detection 
function entirely.

3. 

a.)  For gpus < segments, I think the best behavior would for the segments that 
don't have a gpu to fall back on using cpu.  I think it might work as-is, but 
we should test it.  If there is a required change, I suspect it will be 
minimal.  The real question is, will anyone want to run like this, knowing that 
the slowest segment is going to be the one that determines the runtime?  (ie, 
they probably won't get performance that's any better than if they ran with all 
cpus.)

b.) This should work fine as-is, no changes required.  Extra gpu's are ignored.


was (Author: dvaldano):
 

1.

a.)  Yes, we should rename this to segments_per_host, and detect it 
automatically.  Also, because you can have a different number of segments on 
each host, we either need to have the transition function do this detection 
separately on each segment (so that it detects the number of segments on that 
host), or change it into an array.

 b.)  No, it should never be anything but gpu0 or cpu0.  The other gpu's are 
there, but we intentionally hide all but one from each segment.  Every segment 
must use either exactly 1 gpu, or the cpu's (and cpu's only show up as cpu0 no 
matter how many there are–this was only found through testing though, maybe we 
should confirm this in the docs to be sure).  The reason for this decision was 
that we tested it with more than 1 gpu per segment and found that the 
performance was nearly identical to a single gpu–so almost 1 full gpu is just 
wasted.  And allowing more than 1 segment to share a gpu.  Therefore, we 
require that there be at most 1 gpu per segment.  Any other gpus will be 
ignored.

2.

  This is the logic Omer and I came up with for assigning each segment to a 
unique gpu.  The gpu number the segment gets is just its segment id modulo the 
number of segments (gpus) per host.  In other words, if there are 4 segments on 
each host, then they will get gpu 0, 1, 2, and 3 respectively.  (To each, their 
own gpu will appear as gpu0.)  This logic will work only if there are the same 
number of segments on each host, and as long as there is at least 1 gpu per 
segment.  They each get assigned their own gpu, and any extra gpus are ignored. 
 The formula will have to be modified if there are a different number of 
segments on each host.  I think this means we have to pass around an array 
holding the number of segments on each host... or at the very least, the 
segment id of the first segment on each host.  Hopefully this is something that 
can just be queried from a system table.  For postgres, the current logic 
should work fine as long as we set gp_segment_id = 1 and segments_per_host = 1. 
 But with postgres, it doesn't matter anyway whether we hide gpu's beyond gpu0 
or not–so even easier would be to just skip calling the device detection 
function entirely.

3. 

a.)  For gpus < segments, I think the best behavior would for the segments that 
don't have a gpu to fall back on using cpu.  I think it might work as-is, but 
we should test it.  If there is a required change, I suspect it will be 
minimal.  The real question is, will anyone want to run like this, knowing that 
the slowest segment is going to be the one that determines the runtime?  (ie, 
they probably won't get performance that's any better than if they ran with all 
cpus.)

b.) This should work fine as-is, no changes required.  Extra gpu's are ignored.

> Change GPU related hardcoded things in madlib_keras.py_in
> ---------------------------------------------------------
>
>                 Key: MADLIB-1308
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1308
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Deep Learning
>            Reporter: Nandish Jayaram
>            Priority: Major
>             Fix For: v1.16
>
>
> Based on the code in PR [https://github.com/apache/madlib/pull/355:]
>  # Currently in madlib_keras.py_in , we hardcod the following things
>  ## gpus_per_host = 4
>  ## device_name = '/cpu:0' or '/gpu:0' ( can the device ever be not named 
> gpu0 or cpu0 ? )
>  # Look into and document the usage of {{CUDA_VISIBLE_DEVICES}} when gpu_only 
> is set to TRUE. Currently we set it to str(current_seg_id % gpus_per_host). 
> How does this logic work and will it always work? How would this logic change 
> for Postgres, since it has no segment_id.
>  # What happens if
> no of gpus < no of segments
> no of gpus > no of segments



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (MADLIB-1308) Change GPU related hardcoded things in madlib_keras.py_in

Reply via email to