gemini-code-assist[bot] commented on code in PR #39011:
URL: https://github.com/apache/beam/pull/39011#discussion_r3432498082


##########
sdks/python/apache_beam/testing/benchmarks/cloudml/criteo_tft/criteo.py:
##########
@@ -38,6 +38,23 @@ def 
get_transformed_categorical_column_name(column_name_or_id):
   return column_name + '_id'
 
 
+def fill_in_missing(feature, default_value):
+  """Fills missing values in a rank 2 SparseTensor.
+
+  Args:
+    feature: A rank 2 SparseTensor with at most one value per row.
+    default_value: The value to fill in for missing entries.
+
+  Returns:
+    A rank 1 Tensor with missing entries filled in.
+  """
+  feature = tf.sparse.to_dense(
+      tf.SparseTensor(
+          feature.indices, feature.values, [feature.dense_shape[0], 1]),
+      default_value=default_value)
+  return tf.squeeze(feature, axis=1)

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   Reconstructing the `SparseTensor` with a hardcoded shape of 
`[feature.dense_shape[0], 1]` can cause runtime crashes if the input 
`SparseTensor` has a column dimension greater than 1 (i.e., 
`feature.dense_shape[1] > 1`). This can happen if any row in the batch contains 
more than one value, or if the sparse tensor is parsed with a larger 
static/dynamic column dimension. In such cases, `feature.indices` may contain 
column indices `>= 1`, which will be out of bounds for the new shape 
`[batch_size, 1]`, leading to an `InvalidArgumentError` during 
`tf.sparse.to_dense`.\n\nUsing `tft.sparse_tensor_to_dense_with_shape` is much 
safer and more robust as it correctly handles shape alignment, padding, and 
truncation.
   
   ```python
     feature = tft.sparse_tensor_to_dense_with_shape(\n      feature, [None, 
1], default_value=default_value)\n  return tf.squeeze(feature, axis=1)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to