This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion.git


The following commit(s) were added to refs/heads/main by this push:
     new e894a03bea perf: Use Hashbrown for array_distinct (#20538)
e894a03bea is described below

commit e894a03bea638e35677eaf27876966013dd64bf4
Author: Neil Conway <[email protected]>
AuthorDate: Wed Feb 25 13:12:42 2026 -0500

    perf: Use Hashbrown for array_distinct (#20538)
    
    ## Which issue does this PR close?
    
    N/A
    
    ## Rationale for this change
    
    #20364 recently optimized `array_distinct` to use batched row
    conversion. As part of that PR, `std::HashSet` was used. This PR just
    replaces `std::HashSet` with `hashbrown::HashSet`, which measurably
    improves performance.
    
    ## What changes are included in this PR?
    
    ## Are these changes tested?
    
    Yes.
    
    ## Are there any user-facing changes?
    
    No.
---
 datafusion/functions-nested/src/set_ops.rs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/datafusion/functions-nested/src/set_ops.rs 
b/datafusion/functions-nested/src/set_ops.rs
index 2348b3c530..150559111f 100644
--- a/datafusion/functions-nested/src/set_ops.rs
+++ b/datafusion/functions-nested/src/set_ops.rs
@@ -34,8 +34,8 @@ use datafusion_expr::{
     ColumnarValue, Documentation, ScalarUDFImpl, Signature, Volatility,
 };
 use datafusion_macros::user_doc;
+use hashbrown::HashSet;
 use std::any::Any;
-use std::collections::HashSet;
 use std::fmt::{Display, Formatter};
 use std::sync::Arc;
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to