Copilot commented on code in PR #19558:
URL: https://github.com/apache/datafusion/pull/19558#discussion_r2652619305
##########
datafusion/functions/src/unicode/rpad.rs:
##########
@@ -268,18 +272,20 @@ where
);
}
let length = if length < 0 { 0 } else { length
as usize };
- let graphemes =
-
string.graphemes(true).collect::<Vec<&str>>();
+ // Reuse buffer by clearing and refilling
+ graphemes_buf.clear();
+ graphemes_buf.extend(string.graphemes(true));
- if length < graphemes.len() {
-
builder.append_value(graphemes[..length].concat());
+ if length < graphemes_buf.len() {
+ builder
+
.append_value(graphemes_buf[..length].concat());
} else if fill.is_empty() {
builder.append_value(string);
} else {
builder.write_str(string)?;
fill.chars()
.cycle()
- .take(length - graphemes.len())
+ .take(length - graphemes_buf.len())
.for_each(|ch|
builder.write_char(ch).unwrap());
Review Comment:
The fill characters should be pre-collected into a reusable buffer similar
to how it's done in lpad.rs. Currently, fill.chars() is being called on every
row which allocates a new iterator each time. This misses an optimization
opportunity mentioned in the PR description: "For lpad with fill parameter:
Eliminates 2 Vec allocations per row (graphemes + fill_chars)". The rpad
function should also benefit from the same fill_chars_buf optimization by
declaring a fill_chars_buf at line 219 and reusing it by clearing and extending
with fill.chars() before this usage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]