This is an automated email from the ASF dual-hosted git repository.

kunwp1 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/texera.git


The following commit(s) were added to refs/heads/main by this push:
     new 1bf18e7416 fix: avoid O(N) memory allocation when displaying large 
binary blobs in result panel (#4876)
1bf18e7416 is described below

commit 1bf18e74168745394cd9cc95af34819f7fed55e5
Author: Kunwoo (Chris) <[email protected]>
AuthorDate: Sun May 3 16:44:35 2026 -0700

    fix: avoid O(N) memory allocation when displaying large binary blobs in 
result panel (#4876)
    
    <!--
    Thanks for sending a pull request (PR)! Here are some tips for you:
    1. If this is your first time, please read our contributor guidelines:
    [Contributing to
    Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
      2. Ensure you have added or run the appropriate tests for your PR
      3. If the PR is work in progress, mark it a draft on GitHub.
      4. Please write your PR title to summarize what this PR proposes, we
        are following Conventional Commits style for PR titles as well.
      5. Be sure to keep the PR description updated to reflect all changes.
    -->
    
    ### What changes were proposed in this PR?
    <!--
    Please clarify what changes you are proposing. The purpose of this
    section
    is to outline the changes. Here are some tips for you:
      1. If you propose a new API, clarify the use case for a new API.
      2. If you fix a bug, you can clarify why it is a bug.
      3. If it is a refactoring, clarify what has been changed.
      3. It would be helpful to include a before-and-after comparison using
         screenshots or GIFs.
      4. Please consider writing useful notes for better and faster reviews.
    -->
    
    When running a workflow that produces large binary fields (e.g. 50 MB
    blobs), the result panel appeared empty with no error messages. The root
    cause was that `ExecutionResultService` converted the entire byte array
    to a hex string (~3× the blob size in memory) before truncating it for
    display, causing the web server to take a very long time to fetch
    results.
    
    The fix slices the byte array to only the bytes needed for the preview
    before encoding, making the conversion O(1) regardless of blob size. The
    display format is also updated from a hex dump (AB 12 CD...) to a binary
    string preview (<binary 0110100101...110, size = 52,428,800 bytes>),
    showing binaryPreviewLeadingBits (10) leading bits and
    binaryPreviewTrailingBits (3) trailing bits, just more clearly signaling
    opaque binary content.
    
    <strong>Performance comparison (50 MB blob, averaged over 5
    runs):</strong></p>
    Approach | Time
    -- | --
    Before (full hex conversion) | ~5,971 ms
    After (slice then encode) | ~0.006 ms
    
    ~1,000,000× faster for a 50 MB blob; effectively constant time
    regardless of blob size.
    
    <img width="1728" height="956" alt="Screenshot 2026-05-03 at 4 09 47 PM"
    
src="https://github.com/user-attachments/assets/5643426e-0df0-4ae5-a0e0-82ac844b1cb3";
    />
    
    ### Any related issues, documentation, discussions?
    <!--
    Please use this section to link other resources if not mentioned
    already.
    1. If this PR fixes an issue, please include `Fixes #1234`, `Resolves
    #1234`
    or `Closes #1234`. If it is only related, simply mention the issue
    number.
      2. If there is design documentation, please add the link.
      3. If there is a discussion in the mailing list, please add the link.
    -->
    Closes #4875
    
    
    ### How was this PR tested?
    <!--
    If tests were added, say they were added here. Or simply mention that if
    the PR
    is tested with existing test cases. Make sure to include/update test
    cases that
    check the changes thoroughly including negative and positive cases if
    possible.
    If it was tested in a way different from regular unit tests, please
    clarify how
    you tested step by step, ideally copy and paste-able, so that other
    reviewers can
    test and check, and descendants can verify in the future. If tests were
    not added,
    please describe why they were not added and/or why it was difficult to
    add.
    -->
    
    Test by running this workflow and check if the result appears almost
    immediately
    
    [Untitled workflow
    
(11).json](https://github.com/user-attachments/files/27325256/Untitled.workflow.11.json)
    
    
    ### Was this PR authored or co-authored using generative AI tooling?
    <!--
    If generative AI tooling has been used in the process of authoring this
    PR,
    please include the phrase: 'Generated-by: ' followed by the name of the
    tool
    and its version. If no, write 'No'.
    Please refer to the [ASF Generative Tooling
    Guidance](https://www.apache.org/legal/generative-tooling.html) for
    details.
    -->
    Claude Code
---
 .../web/service/ExecutionResultService.scala       | 53 ++++++++++++----------
 .../web/service/ExecutionResultServiceSpec.scala   | 24 +++++-----
 2 files changed, 40 insertions(+), 37 deletions(-)

diff --git 
a/amber/src/main/scala/org/apache/texera/web/service/ExecutionResultService.scala
 
b/amber/src/main/scala/org/apache/texera/web/service/ExecutionResultService.scala
index 3f0362f824..b335ed0c3c 100644
--- 
a/amber/src/main/scala/org/apache/texera/web/service/ExecutionResultService.scala
+++ 
b/amber/src/main/scala/org/apache/texera/web/service/ExecutionResultService.scala
@@ -58,6 +58,7 @@ import 
org.apache.texera.web.service.ExecutionResultService.convertTuplesToJson
 import 
org.apache.texera.web.service.WorkflowExecutionService.getLatestExecutionId
 import org.apache.texera.web.storage.{ExecutionStateStore, WorkflowStateStore}
 
+import java.lang.Byte.{SIZE => BitsPerByte}
 import java.util.UUID
 import scala.collection.mutable
 import scala.concurrent.duration.DurationInt
@@ -65,6 +66,15 @@ import scala.concurrent.duration.DurationInt
 object ExecutionResultService {
 
   private val defaultPageSize: Int = 5
+  private val binaryPreviewLeadingBits: Int = 10
+  private val binaryPreviewTrailingBits: Int = 3
+
+  private def bytesToBinaryString(bytes: Array[Byte]): String =
+    bytes
+      .map(b =>
+        String.format(s"%${BitsPerByte}s", Integer.toBinaryString(b & 
0xff)).replace(' ', '0')
+      )
+      .mkString("")
 
   /**
     * Converts a collection of Tuples to a list of JSON ObjectNodes.
@@ -98,18 +108,24 @@ object ExecutionResultService {
                     value match {
                       case byteArray: Array[Byte] =>
                         val totalSize = byteArray.length
-                        val hexString = byteArrayToHexString(byteArray)
-
-                        // 39 = 30 (leading bytes) + 9 (trailing bytes)
-                        // 30 bytes = space for 10 hex values (each hex value 
takes 2 chars + 1 space)
-                        // 9 bytes = space for 3 hex values at the end (2 
chars each + 1 space)
-                        if (hexString.length < 39) {
-                          s"bytes'$hexString' (length: $totalSize)"
-                        } else {
-                          val leadingBytes = hexString.take(30)
-                          val trailingBytes = hexString.takeRight(9)
-                          s"bytes'$leadingBytes...$trailingBytes' (length: 
$totalSize)"
-                        }
+                        val sizeFormatted = f"$totalSize%,d"
+                        val totalBits = totalSize * BitsPerByte
+                        val preview =
+                          if (totalBits <= binaryPreviewLeadingBits + 
binaryPreviewTrailingBits)
+                            bytesToBinaryString(byteArray)
+                          else {
+                            val leadingBytesNeeded =
+                              math.ceil(binaryPreviewLeadingBits.toDouble / 
BitsPerByte).toInt
+                            val trailingBytesNeeded =
+                              math.ceil(binaryPreviewTrailingBits.toDouble / 
BitsPerByte).toInt
+                            val leading = 
bytesToBinaryString(byteArray.take(leadingBytesNeeded))
+                              .take(binaryPreviewLeadingBits)
+                            val trailing = bytesToBinaryString(
+                              byteArray.takeRight(trailingBytesNeeded)
+                            ).takeRight(binaryPreviewTrailingBits)
+                            s"$leading...$trailing"
+                          }
+                        s"<binary $preview, size = $sizeFormatted bytes>"
 
                       case _ =>
                         throw new RuntimeException(
@@ -132,19 +148,6 @@ object ExecutionResultService {
     }.toList
   }
 
-  /**
-    * Converts a byte array to a hex string representation.
-    *
-    * This helper function takes a byte array and converts its contents to a 
space-separated
-    * string of hexadecimal values. Each byte is formatted as a two-digit 
uppercase hex number.
-    *
-    * @param byteArray The byte array to convert
-    * @return A string containing the hex representation of the byte array's 
contents
-    */
-  private def byteArrayToHexString(byteArray: Array[Byte]): String = {
-    byteArray.map(b => String.format("%02X", Byte.box(b))).mkString(" ")
-  }
-
   /**
     * convert Tuple from engine's format to JSON format
     */
diff --git 
a/amber/src/test/scala/org/apache/texera/web/service/ExecutionResultServiceSpec.scala
 
b/amber/src/test/scala/org/apache/texera/web/service/ExecutionResultServiceSpec.scala
index d9b3f60e6f..0afe31fc09 100644
--- 
a/amber/src/test/scala/org/apache/texera/web/service/ExecutionResultServiceSpec.scala
+++ 
b/amber/src/test/scala/org/apache/texera/web/service/ExecutionResultServiceSpec.scala
@@ -83,17 +83,17 @@ class ExecutionResultServiceSpec extends AnyFlatSpec with 
Matchers {
     // Check short binary representation
     val shortBinaryString = jsonNode.get("shortBinaryCol").asText()
     shortBinaryString should (
-      startWith("bytes'") and
-        include("01 02 03 04 05") and
-        include("(length: 5)")
+      startWith("<binary") and
+        include("...") and
+        include("size = 5 bytes")
     )
 
     // Check long binary representation
     val longBinaryString = jsonNode.get("longBinaryCol").asText()
     longBinaryString should (
-      startWith("bytes'") and
+      startWith("<binary") and
         include("...") and
-        include("(length: 100)")
+        include("size = 100 bytes")
     )
   }
 
@@ -178,7 +178,7 @@ class ExecutionResultServiceSpec extends AnyFlatSpec with 
Matchers {
     val jsonNode = result.head
 
     val emptyBinaryString = jsonNode.get("emptyBinary").asText()
-    emptyBinaryString should include("(length: 0)")
+    emptyBinaryString should include("size = 0 bytes")
   }
 
   it should "handle binary data with single ByteBuffer" in {
@@ -202,8 +202,8 @@ class ExecutionResultServiceSpec extends AnyFlatSpec with 
Matchers {
 
     val binaryString = jsonNode.get("singleBufferBinary").asText()
     binaryString should (
-      startWith("bytes'") and
-        include("(length: 13)") // "Hello, world!" is 13 bytes
+      startWith("<binary") and
+        include("size = 13 bytes") // "Hello, world!" is 13 bytes
     )
   }
 
@@ -255,14 +255,14 @@ class ExecutionResultServiceSpec extends AnyFlatSpec with 
Matchers {
 
     val binaryString1 = jsonNode.get("binaryField1").asText()
     binaryString1 should (
-      include("0A 14 1E") and // Hex representation of 10, 20, 30
-        include("(length: 3)")
+      startWith("<binary") and
+        include("size = 3 bytes")
     )
 
     val binaryString2 = jsonNode.get("binaryField2").asText()
     binaryString2 should (
-      include("28 32 3C") and // Hex representation of 40, 50, 60
-        include("(length: 3)")
+      startWith("<binary") and
+        include("size = 3 bytes")
     )
   }
 

Reply via email to