This is an automated email from the ASF dual-hosted git repository.
kunwp1 pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/texera.git
The following commit(s) were added to refs/heads/main by this push:
new 1bf18e7416 fix: avoid O(N) memory allocation when displaying large
binary blobs in result panel (#4876)
1bf18e7416 is described below
commit 1bf18e74168745394cd9cc95af34819f7fed55e5
Author: Kunwoo (Chris) <[email protected]>
AuthorDate: Sun May 3 16:44:35 2026 -0700
fix: avoid O(N) memory allocation when displaying large binary blobs in
result panel (#4876)
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
2. Ensure you have added or run the appropriate tests for your PR
3. If the PR is work in progress, mark it a draft on GitHub.
4. Please write your PR title to summarize what this PR proposes, we
are following Conventional Commits style for PR titles as well.
5. Be sure to keep the PR description updated to reflect all changes.
-->
### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this
section
is to outline the changes. Here are some tips for you:
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
3. If it is a refactoring, clarify what has been changed.
3. It would be helpful to include a before-and-after comparison using
screenshots or GIFs.
4. Please consider writing useful notes for better and faster reviews.
-->
When running a workflow that produces large binary fields (e.g. 50 MB
blobs), the result panel appeared empty with no error messages. The root
cause was that `ExecutionResultService` converted the entire byte array
to a hex string (~3× the blob size in memory) before truncating it for
display, causing the web server to take a very long time to fetch
results.
The fix slices the byte array to only the bytes needed for the preview
before encoding, making the conversion O(1) regardless of blob size. The
display format is also updated from a hex dump (AB 12 CD...) to a binary
string preview (<binary 0110100101...110, size = 52,428,800 bytes>),
showing binaryPreviewLeadingBits (10) leading bits and
binaryPreviewTrailingBits (3) trailing bits, just more clearly signaling
opaque binary content.
<strong>Performance comparison (50 MB blob, averaged over 5
runs):</strong></p>
Approach | Time
-- | --
Before (full hex conversion) | ~5,971 ms
After (slice then encode) | ~0.006 ms
~1,000,000× faster for a 50 MB blob; effectively constant time
regardless of blob size.
<img width="1728" height="956" alt="Screenshot 2026-05-03 at 4 09 47 PM"
src="https://github.com/user-attachments/assets/5643426e-0df0-4ae5-a0e0-82ac844b1cb3"
/>
### Any related issues, documentation, discussions?
<!--
Please use this section to link other resources if not mentioned
already.
1. If this PR fixes an issue, please include `Fixes #1234`, `Resolves
#1234`
or `Closes #1234`. If it is only related, simply mention the issue
number.
2. If there is design documentation, please add the link.
3. If there is a discussion in the mailing list, please add the link.
-->
Closes #4875
### How was this PR tested?
<!--
If tests were added, say they were added here. Or simply mention that if
the PR
is tested with existing test cases. Make sure to include/update test
cases that
check the changes thoroughly including negative and positive cases if
possible.
If it was tested in a way different from regular unit tests, please
clarify how
you tested step by step, ideally copy and paste-able, so that other
reviewers can
test and check, and descendants can verify in the future. If tests were
not added,
please describe why they were not added and/or why it was difficult to
add.
-->
Test by running this workflow and check if the result appears almost
immediately
[Untitled workflow
(11).json](https://github.com/user-attachments/files/27325256/Untitled.workflow.11.json)
### Was this PR authored or co-authored using generative AI tooling?
<!--
If generative AI tooling has been used in the process of authoring this
PR,
please include the phrase: 'Generated-by: ' followed by the name of the
tool
and its version. If no, write 'No'.
Please refer to the [ASF Generative Tooling
Guidance](https://www.apache.org/legal/generative-tooling.html) for
details.
-->
Claude Code
---
.../web/service/ExecutionResultService.scala | 53 ++++++++++++----------
.../web/service/ExecutionResultServiceSpec.scala | 24 +++++-----
2 files changed, 40 insertions(+), 37 deletions(-)
diff --git
a/amber/src/main/scala/org/apache/texera/web/service/ExecutionResultService.scala
b/amber/src/main/scala/org/apache/texera/web/service/ExecutionResultService.scala
index 3f0362f824..b335ed0c3c 100644
---
a/amber/src/main/scala/org/apache/texera/web/service/ExecutionResultService.scala
+++
b/amber/src/main/scala/org/apache/texera/web/service/ExecutionResultService.scala
@@ -58,6 +58,7 @@ import
org.apache.texera.web.service.ExecutionResultService.convertTuplesToJson
import
org.apache.texera.web.service.WorkflowExecutionService.getLatestExecutionId
import org.apache.texera.web.storage.{ExecutionStateStore, WorkflowStateStore}
+import java.lang.Byte.{SIZE => BitsPerByte}
import java.util.UUID
import scala.collection.mutable
import scala.concurrent.duration.DurationInt
@@ -65,6 +66,15 @@ import scala.concurrent.duration.DurationInt
object ExecutionResultService {
private val defaultPageSize: Int = 5
+ private val binaryPreviewLeadingBits: Int = 10
+ private val binaryPreviewTrailingBits: Int = 3
+
+ private def bytesToBinaryString(bytes: Array[Byte]): String =
+ bytes
+ .map(b =>
+ String.format(s"%${BitsPerByte}s", Integer.toBinaryString(b &
0xff)).replace(' ', '0')
+ )
+ .mkString("")
/**
* Converts a collection of Tuples to a list of JSON ObjectNodes.
@@ -98,18 +108,24 @@ object ExecutionResultService {
value match {
case byteArray: Array[Byte] =>
val totalSize = byteArray.length
- val hexString = byteArrayToHexString(byteArray)
-
- // 39 = 30 (leading bytes) + 9 (trailing bytes)
- // 30 bytes = space for 10 hex values (each hex value
takes 2 chars + 1 space)
- // 9 bytes = space for 3 hex values at the end (2
chars each + 1 space)
- if (hexString.length < 39) {
- s"bytes'$hexString' (length: $totalSize)"
- } else {
- val leadingBytes = hexString.take(30)
- val trailingBytes = hexString.takeRight(9)
- s"bytes'$leadingBytes...$trailingBytes' (length:
$totalSize)"
- }
+ val sizeFormatted = f"$totalSize%,d"
+ val totalBits = totalSize * BitsPerByte
+ val preview =
+ if (totalBits <= binaryPreviewLeadingBits +
binaryPreviewTrailingBits)
+ bytesToBinaryString(byteArray)
+ else {
+ val leadingBytesNeeded =
+ math.ceil(binaryPreviewLeadingBits.toDouble /
BitsPerByte).toInt
+ val trailingBytesNeeded =
+ math.ceil(binaryPreviewTrailingBits.toDouble /
BitsPerByte).toInt
+ val leading =
bytesToBinaryString(byteArray.take(leadingBytesNeeded))
+ .take(binaryPreviewLeadingBits)
+ val trailing = bytesToBinaryString(
+ byteArray.takeRight(trailingBytesNeeded)
+ ).takeRight(binaryPreviewTrailingBits)
+ s"$leading...$trailing"
+ }
+ s"<binary $preview, size = $sizeFormatted bytes>"
case _ =>
throw new RuntimeException(
@@ -132,19 +148,6 @@ object ExecutionResultService {
}.toList
}
- /**
- * Converts a byte array to a hex string representation.
- *
- * This helper function takes a byte array and converts its contents to a
space-separated
- * string of hexadecimal values. Each byte is formatted as a two-digit
uppercase hex number.
- *
- * @param byteArray The byte array to convert
- * @return A string containing the hex representation of the byte array's
contents
- */
- private def byteArrayToHexString(byteArray: Array[Byte]): String = {
- byteArray.map(b => String.format("%02X", Byte.box(b))).mkString(" ")
- }
-
/**
* convert Tuple from engine's format to JSON format
*/
diff --git
a/amber/src/test/scala/org/apache/texera/web/service/ExecutionResultServiceSpec.scala
b/amber/src/test/scala/org/apache/texera/web/service/ExecutionResultServiceSpec.scala
index d9b3f60e6f..0afe31fc09 100644
---
a/amber/src/test/scala/org/apache/texera/web/service/ExecutionResultServiceSpec.scala
+++
b/amber/src/test/scala/org/apache/texera/web/service/ExecutionResultServiceSpec.scala
@@ -83,17 +83,17 @@ class ExecutionResultServiceSpec extends AnyFlatSpec with
Matchers {
// Check short binary representation
val shortBinaryString = jsonNode.get("shortBinaryCol").asText()
shortBinaryString should (
- startWith("bytes'") and
- include("01 02 03 04 05") and
- include("(length: 5)")
+ startWith("<binary") and
+ include("...") and
+ include("size = 5 bytes")
)
// Check long binary representation
val longBinaryString = jsonNode.get("longBinaryCol").asText()
longBinaryString should (
- startWith("bytes'") and
+ startWith("<binary") and
include("...") and
- include("(length: 100)")
+ include("size = 100 bytes")
)
}
@@ -178,7 +178,7 @@ class ExecutionResultServiceSpec extends AnyFlatSpec with
Matchers {
val jsonNode = result.head
val emptyBinaryString = jsonNode.get("emptyBinary").asText()
- emptyBinaryString should include("(length: 0)")
+ emptyBinaryString should include("size = 0 bytes")
}
it should "handle binary data with single ByteBuffer" in {
@@ -202,8 +202,8 @@ class ExecutionResultServiceSpec extends AnyFlatSpec with
Matchers {
val binaryString = jsonNode.get("singleBufferBinary").asText()
binaryString should (
- startWith("bytes'") and
- include("(length: 13)") // "Hello, world!" is 13 bytes
+ startWith("<binary") and
+ include("size = 13 bytes") // "Hello, world!" is 13 bytes
)
}
@@ -255,14 +255,14 @@ class ExecutionResultServiceSpec extends AnyFlatSpec with
Matchers {
val binaryString1 = jsonNode.get("binaryField1").asText()
binaryString1 should (
- include("0A 14 1E") and // Hex representation of 10, 20, 30
- include("(length: 3)")
+ startWith("<binary") and
+ include("size = 3 bytes")
)
val binaryString2 = jsonNode.get("binaryField2").asText()
binaryString2 should (
- include("28 32 3C") and // Hex representation of 40, 50, 60
- include("(length: 3)")
+ startWith("<binary") and
+ include("size = 3 bytes")
)
}