Re: Query Question

Ted Dunning Thu, 11 Apr 2019 17:49:49 -0700

The semantics for zip with different length arguments tend to be either
ignore tail of longer argument
<https://docs.python.org/3.3/library/functions.html#zip> as Python does
with zip or to reuse shorter arguments to fill out to the length of the
longest argument as R does with cbind and rbind
<https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/cbind>.
R is nice about returning with a warning if the reused value doesn't come
out even at the end.


> cbind(c(1,2), c(4,5,6))
>      [,1] [,2]
> [1,]    1    4
> [2,]    2    5
> [3,]    1    6
> Warning message:
> In cbind(c(1, 2), c(4, 5, 6)) :
>   number of rows of result is not a multiple of vector length (arg 1)


I think that either definition is fine, but that python's truncate style is
probably easier and makes more sense in a database environment. The most
common use case for R's semantics is to build tables with rows that have
all combinations of sets of values (i.e. the cross product). IN a database,
we already have a better mechanism to build the cross product so having zip
behave like Python is nice.


On Thu, Apr 11, 2019 at 8:40 AM Aman Sinha <amansi...@gmail.com> wrote:

> > I thought flatten() would be the answer, however, if I flatten the
> columns, I get the following result:
>
> Regarding the flatten() output, this is expected because doing a 'SELECT
> flatten(a),  flatten(b) FROM T'  is equivalent to doing a cross-product of
> the 2 arrays.
>
> In your example, both arrays are the same length, but what would you expect
> the output to be if they were different ?   I don't see a direct SQL way of
> doing it but
> even with UDFs the semantics should be defined.
>
> Aman
>
> On Thu, Apr 11, 2019 at 6:37 AM Charles Givre <cgi...@gmail.com> wrote:
>
> > That’s a good idea.  I’ll work on a equivalent ZIP() function and submit
> > as a separate PR.
> > — C
> >
> > > On Apr 10, 2019, at 20:44, Paul Rogers <par0...@yahoo.com.INVALID>
> > wrote:
> > >
> > > Hi Charles,
> > >
> > > In Python [1], the "zip" function does this task:
> > >
> > >
> > > zip([1, 2, 3], [4, 5, 6]) --> [(1, 4), (2, 5), (3, 6)]
> > >
> > >
> > > When you gathered the list of functions for the Drill book, did you
> come
> > across anything like this in Drill? I presume you didn't, hence the
> > question. I did a quick (incomplete) check and didn't see any likely
> > candidates.
> > >
> > > Perhaps you could create such a function.
> > >
> > > Once you have the zipped result, you could flatten to get the pairs as
> > rows.
> > >
> > >
> > > Thanks,
> > > - Paul
> > >
> > >
> > >
> > >    On Wednesday, April 10, 2019, 5:26:10 PM PDT, Charles Givre <
> > cgi...@gmail.com> wrote:
> > >
> > > Hello Drillers,
> > > I have a query question for you.  I have some really ugly data that has
> > a field like this:
> > >
> > > compound_field : { “field_1”: [1,2,3],
> > >     “field_2”:[4,5,6]
> > > )
> > >
> > > I would like to map fields 1 and 2 to columns so that the end result
> is:
> > >
> > > field1 | field2
> > > 1        | 4
> > > 2      |  5
> > > 3      |  5
> > >
> > > I thought flatten() would be the answer, however, if I flatten the
> > columns, I get the following result:
> > >
> > > field1 | field2
> > > 1      |  4
> > > 1      |  5
> > > 1      |  6
> > >
> > > Does anyone have any suggestions?
> > > Thanks,
> > > —C
> >
> >
>

Re: Query Question

Reply via email to