weiqingy commented on issue #845:
URL: https://github.com/apache/flink-agents/issues/845#issuecomment-4700838227

   ## Verification outcome: `bytes` is checkpoint-safe → admitting it
   
   Verified the full question (does Python `bytes` survive the Pemja → Flink 
state path as a native, checkpoint-stable JVM type?) two independent ways, both 
agreeing:
   
   **1. Pemja source (the conversion logic).** In Pemja 0.5.5 
(`src/main/c/pemja/core/pyutils.c`), the Python→Java dispatch 
`JcpPyObject_AsJObject` routes `PyBytes_CheckExact` to `JcpPyBytes_AsJObject`, 
whose body is `NewByteArray` + `SetByteArrayRegion` — a genuine JVM `byte[]` 
with no native back-pointer. `bytearray` has no branch and falls through to the 
generic `JcpPyObject_AsJPyObject` wrapper (`Py_INCREF` + a process-local 
pointer) — the unsafe case that crashes on restore. The conversion is 
byte-for-byte identical in 0.5.5 and 0.5.7.
   
   **2. Runtime probe (the actual materialization).** A throwaway Java test 
driving a real embedded Pemja interpreter materialized the values and inspected 
their Java types:
   
   - `b"hello"` → Java `[B` (`byte[]`, len 5) — safe
   - `bytearray(...)` → `pemja.core.object.PyObject` — unsafe wrapper
   - `str` → `java.lang.String` — known-good baseline
   
   This is the `Python-object → Java-object` conversion that 
`FlinkMemoryObject.set()`'s `j_memory_object.set(path, value)` bridge invokes, 
so it reflects the real `memory.set` path.
   
   **Why that settles restore safety.** `byte[]` is a first-class 
Flink-serializable primitive array, so once `bytes` materializes as `byte[]` it 
joins the already-proven `byte[]` checkpoint path. A literal checkpoint-restart 
round-trip still can't run on the MiniCluster (in-place recovery doesn't 
recreate the JVM, so the Pemja conversion path isn't crossed) — a permanent 
end-to-end assertion is deferred to the recovery harness in #836 (noted there).
   
   **Key narrowing — exact type only.** The safe Pemja branch is gated on 
`PyBytes_CheckExact`, so only exact `bytes` is safe; `bytearray` and `bytes` 
subclasses wrap as `PyObject`. This lines up exactly with the validator's 
existing exact-type check (`type(value) in _CHECKPOINT_STABLE_SCALARS`), so 
admitting `bytes` is a one-line addition: exact `bytes` is accepted, 
`bytearray` and `bytes` subclasses stay rejected for free.
   
   PR: #846 (validator + accept/reject tests pinning the exact-type boundary + 
contract docs).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to