scovich commented on issue #8082: URL: https://github.com/apache/arrow-rs/issues/8082#issuecomment-3258435567
Hi @sdf-jkl -- this is definitely a challenging one, so I would strongly recommend some "whiteboard designing" and pathfinding efforts before actually trying to code up a full solution. Pathing through shredded variant objects is pretty straightforward because it's just projection. For example, ```sql SELECT variant_get(v, 'a.b.c', INT)` AS c FROM t ``` ideally just translates to ```sql SELECT v.typed_value.a.typed_value.b.typed_value.c.typed_value AS c FROM t ``` (if everything shredded perfectly) In contrast, pathing through a shredded variant array requires slicing. For example, ```sql SELECT variant_get(v, 'a[0].b[1].c[2]', INT)` AS c FROM t ``` ... would, if perfectly shredded, require to first project out the the following path: ``` v.typed_value.a.typed_value.list.element.typed_value.b.typed_value.list.element.typed_value.c.typed_value.list.element ``` ... and then slice the result down to the single requested row (because the projected column contains every `a[i].b[j].c[k]` for _ALL_ `i`, `j`, and `k`). If the path selects an array instead of an array element, e.g. `a[0].b[1].c`, then the slicing has to return just the contiguous subset of rows that belong to that sub-sub-list. And that's just for the perfect shredding case. If it's imperfectly shredded, then at least some of the slicing has to happen inside binary variant values instead, which will involve a different kind of slicing and dicing. NOTE: None of this is impossible... it's just complex. And a lot of the infrastructure you'll need, like slicing and dicing of binary variant values, is still at least partly work in progress. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
