gemini-code-assist[bot] commented on code in PR #35753:
URL: https://github.com/apache/beam/pull/35753#discussion_r2245972899


##########
sdks/python/apache_beam/yaml/yaml_ml.py:
##########
@@ -32,37 +29,14 @@
 from apache_beam.yaml import options
 from apache_beam.yaml.yaml_utils import SafeLineLoader
 
-
-def _list_submodules(package):
-  """
-    Lists all submodules within a given package.
-    """
-  submodules = []
-  for _, module_name, _ in pkgutil.walk_packages(
-      package.__path__, package.__name__ + '.'):
-    if 'test' in module_name:
-      continue
-    submodules.append(module_name)
-  return submodules
-
-
 try:
   from apache_beam.ml.transforms import tft
   from apache_beam.ml.transforms.base import MLTransform
   # TODO(robertwb): Is this all of them?
-  _transform_constructors = {}
+  _transform_constructors = tft.__dict__

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   Using `tft.__dict__` to populate `_transform_constructors` is risky as it 
includes not only the transform classes but also everything else in the `tft` 
module's namespace (e.g., imported modules, constants, base classes). This 
could lead to a `TypeError` at runtime if a YAML file specifies a `type` that 
corresponds to a non-callable member from the dictionary.
   
   A safer approach would be to build the dictionary of constructors from 
`tft.__all__`, which is the idiomatic way to define a module's public API.
   
   ```python
   _transform_constructors = {name: getattr(tft, name) for name in tft.__all__ 
if hasattr(tft, name)}
   ```



##########
sdks/python/apache_beam/ml/transforms/base.py:
##########
@@ -216,31 +212,21 @@
 
     # Get all columns for this item
     for col in columns:
-      if isinstance(item, dict):
-        result.append(item[col])
+      result.append(item[col])
   return result
 
 
 def _dict_output_fn(
     columns: Sequence[str],
-    batch: Sequence[Union[Dict[str, Any], beam.Row]],
-    embeddings: Sequence[Any]) -> list[Union[dict[str, Any], beam.Row]]:
+    batch: Sequence[Dict[str, Any]],
+    embeddings: Sequence[Any]) -> List[Dict[str, Any]]:

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   Consider adding a type hint for the return value of `_dict_output_fn` for 
better code clarity and maintainability.
   
   ```python
   def _dict_output_fn(
       columns: Sequence[str],
       batch: Sequence[Dict[str, Any]],
       embeddings: Sequence[Any]) -> List[Dict[str, Any]]:
   ```



##########
sdks/python/apache_beam/ml/transforms/base.py:
##########
@@ -183,13 +183,9 @@ def append_transform(self, transform: BaseOperation):
     """
 
 
-def _dict_input_fn(
-    columns: Sequence[str], batch: Sequence[Union[Dict[str, Any],
-                                                  beam.Row]]) -> List[str]:
+def _dict_input_fn(columns: Sequence[str],
+                   batch: Sequence[Dict[str, Any]]) -> List[str]:

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   Consider adding a type hint for the return value of `_dict_input_fn` for 
better code clarity and maintainability.
   
   ```python
   def _dict_input_fn(columns: Sequence[str],
                      batch: Sequence[Dict[str, Any]]) -> List[str]:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to