paleolimbot opened a new issue, #17428:
URL: https://github.com/apache/datafusion/issues/17428

   ### Describe the bug
   
   In the expression `some_function_that_returns_an_extension() IS NULL`, the 
field metadata is propagated into the output boolean. This causes errors when 
collecting an expression somewhere that loads and validates extension types 
(e.g., pyarrow).
   
   ### To Reproduce
   
   Output:
   
   ```
   Regular select:
   Field { name: "is_null", data_type: Boolean, nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: {"ARROW:extension:metadata": "foofy.foofy"} }
   ```
   
   ```rust
   use std::collections::HashMap;
   
   use datafusion::{
       arrow::datatypes::DataType,
       logical_expr::{ScalarUDFImpl, Signature, Volatility},
       prelude::*,
   };
   
   #[tokio::main]
   async fn main() {
       let ctx = SessionContext::new();
       ctx.register_udf(MakeExtension::default().into());
   
       let batches = ctx
           .sql("SELECT make_extension('foofy zero') IS NULL as is_null")
           .await
           .unwrap()
           .collect()
           .await
           .unwrap();
       println!("Regular select:");
       println!("{:?}", batches[0].schema().field(0));
   }
   
   #[derive(Debug)]
   struct MakeExtension {
       signature: Signature,
   }
   
   impl Default for MakeExtension {
       fn default() -> Self {
           Self {
               signature: Signature::user_defined(Volatility::Immutable),
           }
       }
   }
   
   impl ScalarUDFImpl for MakeExtension {
       fn as_any(&self) -> &dyn std::any::Any {
           self
       }
   
       fn name(&self) -> &str {
           "make_extension"
       }
   
       fn signature(&self) -> &Signature {
           &self.signature
       }
   
       fn coerce_types(&self, arg_types: &[DataType]) -> 
datafusion::error::Result<Vec<DataType>> {
           Ok(arg_types.to_vec())
       }
   
       fn return_type(&self, _arg_types: &[DataType]) -> 
datafusion::error::Result<DataType> {
           unreachable!("This shouldn't have been called")
       }
   
       fn return_field_from_args(
           &self,
           args: datafusion::logical_expr::ReturnFieldArgs,
       ) -> datafusion::error::Result<datafusion::arrow::datatypes::FieldRef> {
           Ok(args.arg_fields[0]
               .as_ref()
               .clone()
               .with_metadata(HashMap::from([(
                   "ARROW:extension:metadata".to_string(),
                   "foofy.foofy".to_string(),
               )]))
               .into())
       }
   
       fn invoke_with_args(
           &self,
           args: datafusion::logical_expr::ScalarFunctionArgs,
       ) -> datafusion::error::Result<datafusion::logical_expr::ColumnarValue> {
           Ok(args.args[0].clone())
       }
   }
   ```
   
   ### Expected behavior
   
   I would have expected Boolean output with no field metadata.
   
   ### Additional context
   
   cc @timsaucer (😬 😬 )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to