Re: [PR] Add ApplyBucketsWithInterpolation TFTransform [beam]

via GitHub Wed, 29 May 2024 08:26:54 -0700


jrmccluskey commented on code in PR #31291:
URL: https://github.com/apache/beam/pull/31291#discussion_r1619089783



##########
sdks/python/apache_beam/ml/transforms/tft.py:
##########
@@ -363,6 +363,42 @@ def apply_transform(
     return output
 
 
+@register_input_dtype(float)
+class ApplyBucketsWithInterpolation(TFTOperation):
+  def __init__(
+      self,
+      columns: List[str],
+      bucket_boundaries: Iterable[Union[int, float]],
+      name: Optional[str] = None):
+    """ Interpolates values within the provided buckest and then normalizes to
+    [0, 1].
+    
+    Input values are bucketized based on the provided boundaries such that the
+    input is mapped to a positive index i for which bucket_boundaries[i-1] <=
+    element < bucket_boundaries[i], if it exists. The values are then
+    normalized to the range [0,1] within the bucket, with NaN values being
+    mapped to 0.5.
+
+    Args:
+      columns: A list of column names to apply the transformation on.
+      bucket_boundaries: A rank 2 Tensor or list representing the bucket

Review Comment:
   it's consistent with what we're accepting as valid input in ApplyBuckets, 
both co-opt the language from TFT. Updating those to be more accurate to our 
function signature seems reasonable, changed for both functions



##########
sdks/python/apache_beam/ml/transforms/tft.py:
##########
@@ -363,6 +363,42 @@ def apply_transform(
     return output
 
 
+@register_input_dtype(float)
+class ApplyBucketsWithInterpolation(TFTOperation):
+  def __init__(
+      self,
+      columns: List[str],
+      bucket_boundaries: Iterable[Union[int, float]],
+      name: Optional[str] = None):
+    """ Interpolates values within the provided buckest and then normalizes to
+    [0, 1].
+    
+    Input values are bucketized based on the provided boundaries such that the
+    input is mapped to a positive index i for which bucket_boundaries[i-1] <=
+    element < bucket_boundaries[i], if it exists. The values are then
+    normalized to the range [0,1] within the bucket, with NaN values being
+    mapped to 0.5.

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add ApplyBucketsWithInterpolation TFTransform [beam]

Reply via email to