The semantics for zip with different length arguments tend to be either ignore tail of longer argument <https://docs.python.org/3.3/library/functions.html#zip> as Python does with zip or to reuse shorter arguments to fill out to the length of the longest argument as R does with cbind and rbind <https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/cbind>. R is nice about returning with a warning if the reused value doesn't come out even at the end.
> cbind(c(1,2), c(4,5,6)) > [,1] [,2] > [1,] 1 4 > [2,] 2 5 > [3,] 1 6 > Warning message: > In cbind(c(1, 2), c(4, 5, 6)) : > number of rows of result is not a multiple of vector length (arg 1) I think that either definition is fine, but that python's truncate style is probably easier and makes more sense in a database environment. The most common use case for R's semantics is to build tables with rows that have all combinations of sets of values (i.e. the cross product). IN a database, we already have a better mechanism to build the cross product so having zip behave like Python is nice. On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha <amansi...@gmail.com> wrote: > > I thought flatten() would be the answer, however, if I flatten the > columns, I get the following result: > > Regarding the flatten() output, this is expected because doing a 'SELECT > flatten(a), flatten(b) FROM T' is equivalent to doing a cross-product of > the 2 arrays. > > In your example, both arrays are the same length, but what would you expect > the output to be if they were different ? I don't see a direct SQL way of > doing it but > even with UDFs the semantics should be defined. > > Aman > > On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cgi...@gmail.com> wrote: > > > That’s a good idea. I’ll work on a equivalent ZIP() function and submit > > as a separate PR. > > — C > > > > > On Apr 10, 2019, at 20:44, Paul Rogers <par0...@yahoo.com.INVALID> > > wrote: > > > > > > Hi Charles, > > > > > > In Python [1], the "zip" function does this task: > > > > > > > > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)] > > > > > > > > > When you gathered the list of functions for the Drill book, did you > come > > across anything like this in Drill? I presume you didn't, hence the > > question. I did a quick (incomplete) check and didn't see any likely > > candidates. > > > > > > Perhaps you could create such a function. > > > > > > Once you have the zipped result, you could flatten to get the pairs as > > rows. > > > > > > > > > Thanks, > > > - Paul > > > > > > > > > > > > On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre < > > cgi...@gmail.com> wrote: > > > > > > Hello Drillers, > > > I have a query question for you. I have some really ugly data that has > > a field like this: > > > > > > compound_field : { “field_1”: [1,2,3], > > > “field_2”:[4,5,6] > > > ) > > > > > > I would like to map fields 1 and 2 to columns so that the end result > is: > > > > > > field1 | field2 > > > 1 | 4 > > > 2 | 5 > > > 3 | 5 > > > > > > I thought flatten() would be the answer, however, if I flatten the > > columns, I get the following result: > > > > > > field1 | field2 > > > 1 | 4 > > > 1 | 5 > > > 1 | 6 > > > > > > Does anyone have any suggestions? > > > Thanks, > > > —C > > > > >