[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2196: feat: Support GetIndexedFieldExpr for ScalarValue

GitBox Mon, 09 May 2022 03:59:59 -0700


alamb commented on code in PR #2196:
URL: https://github.com/apache/arrow-datafusion/pull/2196#discussion_r867879803



##########
datafusion/core/tests/sql/expr.rs:
##########
@@ -491,6 +491,16 @@ async fn test_crypto_expressions() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn test_array_index() -> Result<()> {
+    test_expression!("([5,4,3,2,1])[1]", "5");

Review Comment:
   https://www.postgresql.org/docs/current/arrays.html
   
   I verified that the subscripts are 1 based 👍 
   
   > The array subscript numbers are written within square brackets. By default 
PostgreSQL uses a one-based numbering convention for arrays, that is, an array 
of n elements starts with array[1] and ends with array[n].
   
   
   
   
   



##########
datafusion/core/tests/sql/expr.rs:
##########
@@ -491,6 +491,16 @@ async fn test_crypto_expressions() -> Result<()> {
     Ok(())
 }
 
+#[tokio::test]
+async fn test_array_index() -> Result<()> {
+    test_expression!("([5,4,3,2,1])[1]", "5");
+    test_expression!("([5,4,3,2,1])[5]", "1");
+    test_expression!("([5,4,3,2,1])[100]", "NULL");
+    test_expression!("([5,4,3,2,1])[-1]", "NULL");

Review Comment:
   I wonder if you want to potentially try nested lists. Something like
   
   ```suggestion
       test_expression!("([5,4,3,2,1])[-1]", "NULL");
       test_expression!("([[123],[4,5,6]])[2]", "[4,5,6]");
   ```



##########
datafusion/physical-expr/src/expressions/get_indexed_field.rs:
##########
@@ -107,9 +136,69 @@ impl PhysicalExpr for GetIndexedFieldExpr {
                 }
                 (dt, key) => Err(DataFusionError::NotImplemented(format!("get 
indexed field is only possible on lists with int64 indexes. Tried {} with {} 
index", dt, key))),
             },
-            ColumnarValue::Scalar(_) => Err(DataFusionError::NotImplemented(
-                "field access is not yet implemented for scalar 
values".to_string(),
-            )),
+            ColumnarValue::Scalar(scalar) => match (scalar.get_datatype(), 
&self.key) {

Review Comment:
   I am not sure I fully follow this code -- Since it is creating a `ArrayRef` 
from the `ColumnarValue::Scalar`, I wonder why it can't use the same code as 
the `ColumnarValue::Array` case and call `to_arrow()`?
   
   
https://github.com/cube-js/arrow-datafusion/blob/scalar-array-index/ballista/rust/client/src/columnar_batch.rs#L150
   
   So for example, rather than 
   
   ```rust
       fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
           let arg = self.arg.evaluate(batch)?;
           match arg {
               ColumnarValue::Array(array) => match (array.data_type(), 
&self.key) {
   ...
   ```
   
   It could look like:
   ```rust
       fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
           let array = self.arg.evaluate(batch)?
             // convert to Arrayref
             .to_arrow();
   
           match (array.data_type(), &self.key) {
   ...
   ```
   
   That way the same code would be used for the array and scalar cases of 
`ColumnarValue`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2196: feat: Support GetIndexedFieldExpr for ScalarValue

Reply via email to