yeandy commented on code in PR #22795:
URL: https://github.com/apache/beam/pull/22795#discussion_r950237304


##########
sdks/python/apache_beam/ml/inference/pytorch_inference.py:
##########
@@ -234,16 +263,17 @@ def run_inference(
     # If elements in `batch` are provided as a dictionaries from key to 
Tensors,
     # then iterate through the batch list, and group Tensors to the same key
     key_to_tensor_list = defaultdict(list)
-    for example in batch:
-      for key, tensor in example.items():
-        key_to_tensor_list[key].append(tensor)
-    key_to_batched_tensors = {}
-    for key in key_to_tensor_list:
-      batched_tensors = torch.stack(key_to_tensor_list[key])
-      batched_tensors = _convert_to_device(batched_tensors, self._device)
-      key_to_batched_tensors[key] = batched_tensors
-    predictions = model(**key_to_batched_tensors, **inference_args)
-    return [PredictionResult(x, y) for x, y in zip(batch, predictions)]
+    with torch.no_grad():

Review Comment:
   Necessary to prevent 
   ```Setting to CPU due to an exception while deserializing state_dict_path. 
Exception: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 39.59 GiB 
total capacity; 666.60 MiB already allocated; 13.19 MiB free; 682.00 MiB 
reserved in total by PyTorch) If reserved memory is >> allocated memory try 
setting max_split_size_mb to avoid fragmentation.  See documentation for Memory 
Management and PYTORCH_CUDA_ALLOC_CONF.
   ```
   error



##########
sdks/python/apache_beam/ml/inference/pytorch_inference.py:
##########
@@ -234,16 +263,17 @@ def run_inference(
     # If elements in `batch` are provided as a dictionaries from key to 
Tensors,
     # then iterate through the batch list, and group Tensors to the same key
     key_to_tensor_list = defaultdict(list)
-    for example in batch:
-      for key, tensor in example.items():
-        key_to_tensor_list[key].append(tensor)
-    key_to_batched_tensors = {}
-    for key in key_to_tensor_list:
-      batched_tensors = torch.stack(key_to_tensor_list[key])
-      batched_tensors = _convert_to_device(batched_tensors, self._device)
-      key_to_batched_tensors[key] = batched_tensors
-    predictions = model(**key_to_batched_tensors, **inference_args)
-    return [PredictionResult(x, y) for x, y in zip(batch, predictions)]
+    with torch.no_grad():

Review Comment:
   Necessary to prevent this error
   ```
   Setting to CPU due to an exception while deserializing state_dict_path. 
Exception: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 39.59 GiB 
total capacity; 666.60 MiB already allocated; 13.19 MiB free; 682.00 MiB 
reserved in total by PyTorch) If reserved memory is >> allocated memory try 
setting max_split_size_mb to avoid fragmentation.  See documentation for Memory 
Management and PYTORCH_CUDA_ALLOC_CONF.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to