gemini-code-assist[bot] commented on code in PR #38473:
URL: https://github.com/apache/beam/pull/38473#discussion_r3289529374


##########
sdks/python/apache_beam/yaml/yaml_mapping.py:
##########
@@ -199,6 +212,15 @@ def py_value_to_js_dict(py_value):
     py_value = py_value._asdict()
   if isinstance(py_value, dict):
     return {key: py_value_to_js_dict(value) for key, value in py_value.items()}
+  elif isinstance(py_value, bytes):
+    return py_value.decode('utf-8', errors='strict')
+  elif isinstance(py_value, (datetime.datetime, datetime.date, datetime.time)):
+    return py_value.isoformat()

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   Converting `datetime`, `date`, and `time` objects to ISO strings is a 
breaking change compared to the previous `js2py` implementation, which provided 
JavaScript `Date` objects. Any existing JavaScript UDFs that rely on `Date` 
methods (e.g., `.getFullYear()`, `.getTime()`) will now fail with a 
`TypeError`. If the bridge does not support native `Date` conversion, this 
change should be documented as a breaking change, or the values should be 
explicitly converted to `Date` objects in the JS environment (e.g., `new 
Date(iso_string)`).



##########
sdks/python/apache_beam/yaml/yaml_mapping.py:
##########
@@ -178,18 +180,29 @@ def _check_mapping_arguments(
     raise ValueError(f'{transform_name} cannot specify "name" without "path"')
 
 
-# js2py's JsObjectWrapper object has a self-referencing __dict__ property
-# that cannot be pickled without implementing the __getstate__ and
-# __setstate__ methods.
-class _CustomJsObjectWrapper(JsObjectWrapper):
-  def __init__(self, js_obj):
-    super().__init__(js_obj.__dict__['_obj'])
+_THREAD_LOCAL_JS_CACHE = threading.local()
 
-  def __getstate__(self):
-    return self.__dict__.copy()
 
-  def __setstate__(self, state):
-    self.__dict__.update(state)
+class _JsFunctionWrapper:
+  def __init__(self, source_code, entrypoint_name):
+    self.source_code = source_code
+    self.entrypoint_name = entrypoint_name
+
+  def _get_fn(self):
+    cache = _THREAD_LOCAL_JS_CACHE
+    if not hasattr(cache, 'functions'):
+      cache.functions = {}
+
+    cache_key = (self.source_code, self.entrypoint_name)
+    if cache_key not in cache.functions:
+      cache.functions[cache_key] = quickjs.Function(
+          self.entrypoint_name, self.source_code)
+
+    return cache.functions[cache_key]
+
+  def __call__(self, row):
+    fn = self._get_fn()
+    return dicts_to_rows(fn(py_value_to_js_dict(row)))

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The previous implementation wrapped the JavaScript execution in a 
`try-except` block to provide a more descriptive `RuntimeError`. The current 
implementation allows `quickjs.JSException` (or other bridge-related 
exceptions) to bubble up directly. To maintain a consistent user experience and 
provide better context in error messages (e.g., which expression or entrypoint 
failed), consider catching exceptions from the JS engine and re-raising them 
with additional metadata.



##########
sdks/python/apache_beam/yaml/yaml_mapping.py:
##########
@@ -199,6 +212,15 @@ def py_value_to_js_dict(py_value):
     py_value = py_value._asdict()
   if isinstance(py_value, dict):
     return {key: py_value_to_js_dict(value) for key, value in py_value.items()}
+  elif isinstance(py_value, bytes):
+    return py_value.decode('utf-8', errors='strict')

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   Decoding `bytes` to `utf-8` with `errors='strict'` will cause the pipeline 
to fail if the data contains non-UTF-8 sequences (e.g., raw binary data). Since 
QuickJS (via the `quickjs-ng` Python bridge) can typically handle `bytes` 
objects by converting them to `ArrayBuffer` or `Uint8Array` in JavaScript, 
forcing a conversion to a string may be unnecessary and could lead to data 
corruption for binary fields. If a string is required, consider using 
`errors='replace'` or `errors='ignore'` to avoid crashing the pipeline.



##########
sdks/python/apache_beam/yaml/yaml_mapping.py:
##########
@@ -210,80 +232,36 @@ def py_value_to_js_dict(py_value):
 def _expand_javascript_mapping_func(
     original_fields, expression=None, callable=None, path=None, name=None):
 
-  # Check for installed js2py package
-  if js2py is None:
+  if quickjs is None:
     raise ValueError(
-        "Javascript mapping functions are not supported on"
-        " Python 3.12 or later.")
-
-  # import remaining js2py objects
-  from js2py import base
-  from js2py.constructors import jsdate
-  from js2py.internals import simplex
-
-  js_array_type = (
-      base.PyJsArray,
-      base.PyJsArrayBuffer,
-      base.PyJsInt8Array,
-      base.PyJsUint8Array,
-      base.PyJsUint8ClampedArray,
-      base.PyJsInt16Array,
-      base.PyJsUint16Array,
-      base.PyJsInt32Array,
-      base.PyJsUint32Array,
-      base.PyJsFloat32Array,
-      base.PyJsFloat64Array)
-
-  def _js_object_to_py_object(obj):
-    if isinstance(obj, (base.PyJsNumber, base.PyJsString, base.PyJsBoolean)):
-      return base.to_python(obj)
-    elif isinstance(obj, js_array_type):
-      return [_js_object_to_py_object(value) for value in obj.to_list()]
-    elif isinstance(obj, jsdate.PyJsDate):
-      return obj.to_utc_dt()
-    elif isinstance(obj, (base.PyJsNull, base.PyJsUndefined)):
-      return None
-    elif isinstance(obj, base.PyJsError):
-      raise RuntimeError(obj['message'])
-    elif isinstance(obj, base.PyJsObject):
-      return {
-          key: _js_object_to_py_object(value['value'])
-          for (key, value) in obj.own.items()
-      }
-    elif isinstance(obj, base.JsObjectWrapper):
-      return _js_object_to_py_object(obj._obj)
-
-    return obj
+        "Javascript mapping functions are not supported because the "
+        "quickjs-ng library is not installed.")
 
   if expression:
-    source = '\n'.join(['function(__row__) {'] + [
-        f'  {name} = __row__.{name}'
-        for name in original_fields if name in expression
-    ] + ['  return (' + expression + ')'] + ['}'])
-    js_func = _CustomJsObjectWrapper(js2py.eval_js(source))
+    unpacking_code = '\n'.join([
+        f"  var {name} = __row__['{name}'];" for name in original_fields
+        if name in expression and name.isidentifier()
+    ])

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The logic for unpacking fields into local variables (`var {name} = ...`) has 
two potential issues:
   1. **JS Reserved Words**: `name.isidentifier()` (a Python check) does not 
exclude JavaScript keywords (e.g., `delete`, `class`, `default`). Using these 
as variable names will cause a `SyntaxError` in the JS engine.
   2. **Substring Matching**: `name in expression` can return `True` for 
substrings (e.g., if a field is named `a` and the expression is `Math.abs(x)`, 
it will incorrectly attempt to unpack `a`).
   
   Consider using a more robust way to expose variables, such as using a regex 
with word boundaries to detect field usage in the expression, and checking 
against a list of JavaScript reserved words.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to