duongcongtoai commented on code in PR #10148:
URL: https://github.com/apache/datafusion/pull/10148#discussion_r1581815100


##########
datafusion/functions/src/core/getfield.rs:
##########
@@ -107,29 +109,55 @@ impl ScalarUDFImpl for GetFieldFunc {
                 );
             }
         };
+
         match (array.data_type(), name) {
-                (DataType::Map(_, _), ScalarValue::Utf8(Some(k))) => {
-                    let map_array = as_map_array(array.as_ref())?;
-                    let key_scalar = 
Scalar::new(StringArray::from(vec![k.clone()]));
-                    let keys = arrow::compute::kernels::cmp::eq(&key_scalar, 
map_array.keys())?;
-                    let entries = arrow::compute::filter(map_array.entries(), 
&keys)?;

Review Comment:
   **Previous implememtation**
   
   map_array.entries() has type of
   ```
   pub struct StructArray {
       len: usize,
       data_type: DataType,
       nulls: Option<NullBuffer>,
       fields: Vec<ArrayRef>,
   }
   ```
   With the example above, the layout of "field" will be a vector of 2 array, 
where first array is a list of key, and second array is a list of value
   ```
   [0]: ["a","b","c","a","b",c"]
   [1]: [1,2,100,3,4,200]
   ```
   ```
                       let keys = arrow::compute::kernels::cmp::eq(&key_scalar, 
map_array.keys())?;
   ```
   with this computation, the result is a boolean aray where "key" = "c"
   ```
   [false,false,true,false,false,true]
   ```
    and thus this operation will reduce the number of rows into
   ```
                       let entries = 
arrow::compute::filter(map_array.entries(), &keys)?;
   ```
   ```
   [0]: ["c,"c"]
   [1]: [100,200]
   ```
   
   **Problem**
   
   However, let's add a row where the map does not have key "c" in between
   ```
   { a: 1, b: 2, c: 100}
   { a: 1, b: 2}
   { a: 3, b: 4, c: 200}
   ```
   map_array.entries() underneath is represented as
   ```
   [0]: ["a,"b","c","a","b","a","b","c"]
   [1]: [1,2,100,1,2,3,4,200]
   
                       let entries = 
arrow::compute::filter(map_array.entries(), &keys)?;
   Now rows after filtered will be
   [0]: ["c","c"]
   [1]: [100,200]
   ```
   and the return result will be 
   ```
   { c: 100 }
   { c: 200 }
   ```
   instead of
   ```
   { c: 100 }
   null
   { c: 200 }
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to