I think compatibility with Postgres derivatives generally makes sense as a goal, even if the behaviour isn't ideal - definitely much easier than rewriting everything to a different syntax.
With your rewrite proposal, would you also be able to express these queries by unnesting in the FROM clause? SELECT unnest(t.a) c1, unnest(t.b) c1 FROM tbl t => SELECT _tmp1 c1, _tmp2 c2 FROM tbl t, UNNEST(t.a, t.b) (_tmp1, _tmp2) I'm not sure a feature flag would work well for this, since you might potentially have a more complex FROM clause with other unnesting and I'm not sure how you would model that. I suspect it works out cleaner to model the zipped arrays as a TableRef - for most purposes in analysis and planning it's going to be treated the same as a TableRef that is just a plain array of structs - the analysis/planner code for TableRefs, joins, etc isn't the easiest to extend. I guess you could also rewrite it to something like this too, which would really let you isolate the special handling to a simple SELECT block: SELECT v._tmp1 c1, v._tmp2 c2 FROM tbl t, (SELECT t.a __tmp1, t.b __tmp2 FROM t.a ZIP JOIN t.b) v Another question: what happens if the two arrays are from different table references? Is that allowed? Do they still get zipped together? On Mon, 13 Sept 2021 at 04:43, Gabor Kaszab <gaborkas...@apache.org> wrote: > Hey All, > Thanks for taking a look and sharing your view. The whole idea of having > multiple UNNEST() functions in the select list (and zipping the results) > came from a user as a direct feature request as they try to move to Impala > from Greenplum and as they say they can't rewrite their existing queries. > > The most straightforward way would be to implement UNNEST() as a UDTF but > Impala doesn't support them, and anyway, as mentioned above Impala doesn't > support even UDFs with nested type parameters. > > My idea was something like introducing an unnest SlotRef or such, change > the SQL parser to translate each UNNEST(array) into this SlotRef and then > do a query rewrite based on that so that we can process the unnest. My > preference would be to rewrite the query to the syntax we have now with > joining the arrays in the FROM clause as collection table refs and then > have a feature flag (e.g. unnest_join_type) to indicate whether we want the > joined or the zipped behaviour. For the zipped one we can modify the > existing UNNEST node to be able to receive more arrays and then do the zip > itself. > > What do you think? > > Gabor > > On Mon, Sep 13, 2021 at 9:35 AM 彭斗 <pengdou1...@126.com> wrote: > > > Hi All, > > > > > > Currently IMPALA don't support complex data type as input/output > > parameters for Bulltin Functions / User Defined Functions, > > If IMPALA can support this, the Unnest() or other operation on complex > > data type may easy to achieve. > > > > > > > > > > -- > > > > Best Regards > > Dou > > > > > > > > > > > > At 2021-09-13 15:08:52, "Quanlong Huang" <huangquanl...@gmail.com> > wrote: > > >This seems like the explode() UDTF in Hive: > > > > > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode(array) > > > > > >Hive doesn't support multiple UDTF in the SelectList. I think we can > make > > >the same choice that only supports one UNNEST() in the SelectList. If > > >multiple columns need to be unfolded, they should be wrapped in the same > > >UNNEST() function with a keyword/func specifying how to join the rows. > > E.g. > > > > > >-- cross join by default > > >select UNNEST(array_a, array_b) from tbl; > > >-- specifying the join type > > >select UNNEST(zip array_a, array_b) from tbl; > > >select UNNEST(arrays_zip(array_a, array_b)) from tbl; > > > > > >explode() in Hive only supports one column. The above syntax is inspired > > by > > >SparkSQL: > > > > > > https://stackoverflow.com/questions/33220916/explode-transpose-multiple-columns-in-spark-sql-table > > >https://spark.apache.org/docs/latest/api/sql/index.html#arrays_zip > > > > > >On a separate note, what about using the same name(explode) as Hive? I > > >think it'd be helpful for migrations into Impala, especially if we also > > >support the LATERAL VIEW syntax. > > > > > >Thanks, > > >Quanlong > > > > > >On Sun, Sep 12, 2021 at 8:56 AM Tim Armstrong < > tim.g.armstr...@gmail.com> > > >wrote: > > > > > >> The syntax/behaviour seems kinda unusual to me - do any other systems > > aside > > >> from postgres implement this? UNNEST() in the from clause seems more > > >> common, and it is generally not consistent with the rest of SQL for > > SELECT > > >> clause entries to affect the set of rows returned. > > >> > > >> I'm not sure but it seems like it might be tricky to make the select > > clause > > >> in the Impala frontend do this. > > >> > > >> Some way of doing this in FROM clause seems more intuitive and > > consistent > > >> with the rest of sql, e.g. > > >> > > >> select id, e1, e2 > > >> from tbl t, UNNEST(t.arr1, t.arr2) (e1, e2) > > >> > > >> I'm tempted to say that it should be called ZIP() since that's the > name > > of > > >> this operation in functional programming, but I don't think inventing > > new > > >> SQL keywords is really ideal. > > >> > > >> I'm also a little curious about the use case for this. I think ideally > > you > > >> would represent two arrays with correlated indices as an array of > > structs > > >> instead of two arrays, but I can appreciate that schema designs may > not > > >> always be fully normalised. > > >> > > >> On Fri, 10 Sept 2021 at 18:38, Aman Sinha <amansi...@gmail.com> > wrote: > > >> > > >> > Hi Gabor, > > >> > An implicit assumption in the proposed syntax is that there is some > > >> > association between an item at a particular position in the first > > array > > >> and > > >> > the same position in the second array. > > >> > However, in general the arrays are not ordered in a semantic sense. > > e.g > > >> > car_make: {"Honda", "Toyota"}, car_model: {"Prius", "Accord"}. > The > > >> > proposed UNNEST behavior will not produce any rows where {Honda, > > Accord} > > >> > appear together which would be unexpected. Although one could make > > the > > >> > argument that in order to get this behavior you could fall back to > the > > >> > current syntax of putting the arrays in the FROM clause, it could be > > >> > confusing to explain when one should be used vs other. It would be > > >> useful > > >> > to see what other systems (apart from Postgres) which have more > native > > >> > complex types support do in this case (e.g Presto, Couchbase, > > Bigquery). > > >> > > > >> > One alternative I can think of is to make the association between > the > > >> items > > >> > in multiple arrays be more explicit e.g by using an outer join on > the > > >> > ordinal position. That way, items at the same position appear > together > > >> in a > > >> > row and for non-matches we produce Null for the smaller array. I > > haven't > > >> > thought through the exact syntax for this yet. > > >> > > > >> > -Aman > > >> > > > >> > On Fri, Sep 10, 2021 at 11:04 AM Gabor Kaszab < > > gaborkas...@cloudera.com> > > >> > wrote: > > >> > > > >> > > Hey, > > >> > > > > >> > > I'm working on implementing a new UNNEST() operation that does > what > > it > > >> > > says, unnests arrays. But here the trick would come when there are > > >> > multiple > > >> > > UNNESTS() in the select list. Let me explain the desired behaviour > > with > > >> > an > > >> > > example: > > >> > > > > >> > > Let's assume the following table: > > >> > > [image: Screenshot 2021-09-10 at 19.58.53.png] > > >> > > An the following query: > > >> > > SELECT id, UNNEST(arr1), UNNEST(arr2) FROM tbl; > > >> > > > > >> > > Here the expected result is to not produce a join on the two > arrays > > as > > >> we > > >> > > would with the original way of giving them in the select lit, but > to > > >> put > > >> > > the values of the arrays "next to each other" like this: > > >> > > [image: Screenshot 2021-09-10 at 19.59.04.png] > > >> > > I have investigated some options how this could be implemented and > > >> > created > > >> > > a doc for my notes. Let me share this with the community so that > > people > > >> > can > > >> > > share their preference. Any feedback is welcome! > > >> > > > > >> > > > > >> > > > > >> > > > >> > > > https://docs.google.com/document/d/184EKJwMME4SNzyfOTueI-nz-IL-WUiBeaS8Zhi2XzMo/edit?usp=sharing > > >> > > > > >> > > Cheers, > > >> > > Gabor > > >> > > > > >> > > > > >> > > > > >> > > > >> > > >