I do agree that we should distinguish between empty lists and nulls. I consider this related to the outstanding task to allow for untyped nulls, as empty lists like this in JSON does not give us enough type information to materialize any known type.
This does have the unfortunate property that we are actually losing information today during the JSON read. There is very clearly a list there, we just don't know the inner type. In this case we leave it out of the schema entirely because we require a Drill type to specify both repeatability/nullability and a data type (int, varchar, map, etc.). We should materialize a list type with a untyped null as the inner type, so we at least know that it is not a scalar. The confusing behavior with flatten (or other operations that take a list, like repeated_contains) is that it will fail with a message saying there is no repeated_contains function implementation that takes a nullable bigint type, because this is what we fill in when there is nothing at that element in the schema (the way we left it after encountering an untyped empty list in the JSON reader). As I understand it, Steven's outstanding work to enable changing types with the union vector does allow for untyped nulls, I'm not sure if this also covers untyped empty lists. He does currently has the union type disabled by default, and it can be turned on with an option. We should discuss if we should implement some of the concepts he adds with union type in the case where it is not enabled. I assume that after some more thorough testing we will want to default turn it on, but in the meantime we might want to ease some users pain points like this. On Neeraja's point about whether or not an empty list should produce an output row when flattened, please see my comment here [1]. If there is a need for this feature we can consider adding it, perhaps with an alternate function name, but I think that the current behavior of flatten concerning empty lists (to ignore them) is useful for the reasons I mention on the JIRA. [1] - https://issues.apache.org/jira/browse/DRILL-2153 On Mon, Nov 2, 2015 at 1:09 PM, Andries Engelbrecht < [email protected]> wrote: > Seems empty list is getting the vote for select. For consistency would it > then make sense to have flatten produce a single row with an empty list in > the result? > > > —Andries > > > On Nov 2, 2015, at 12:55 PM, Neeraja Rentachintala < > [email protected]> wrote: > > > > Empty list sounds like the appropriate thing. > > What would be the result of we do flatten on this list? > > I think user should get a row back in the results. > > > > > > > > On Mon, Nov 2, 2015 at 11:53 AM, Hsuan Yi Chu <[email protected]> > wrote: > > > >> I agree. I think in any context, empty list should appear as [] in the > >> result. > >> > >> Any opposition? > >> > >> On Mon, Nov 2, 2015 at 11:50 AM, Zelaine Fong <[email protected]> > wrote: > >> > >>> Wouldn't it make more sense to return an empty list in this case? Null > >> is > >>> not quite the same as an empty list. So, I would think you'd want a > >>> distinction between the two. > >>> > >>> -- Zelaine > >>> > >>> On Mon, Nov 2, 2015 at 11:41 AM, Hsuan Yi Chu <[email protected]> > >> wrote: > >>> > >>>> There could be inconsistent results: > >>>> https://issues.apache.org/jira/browse/DRILL-4007 > >>>> > >>>> On Mon, Nov 2, 2015 at 8:17 AM, Andries Engelbrecht < > >>>> [email protected]> wrote: > >>>> > >>>>> Currently in 1.2 I observe the following. > >>>>> > >>>>> 0: jdbc:drill:> select a from dfs.json.`/test.json`; > >>>>> +-------+ > >>>>> | a | > >>>>> +-------+ > >>>>> | null | > >>>>> +———+ > >>>>> > >>>>> But then flatten produces an error > >>>>> > >>>>> 0: jdbc:drill:> select flatten(a) from dfs.json.`/test.json`; > >>>>> Error: SYSTEM ERROR: ClassCastException: Cannot cast > >>>>> org.apache.drill.exec.vector.NullableIntVector to > >>>>> org.apache.drill.exec.vector.complex.RepeatedValueVector > >>>>> > >>>>> Ideally both should return similar responses, in most cases it may be > >>>>> easier to work with a null being returned than no record or an empty > >>>>> string. Returning [] might be interpreted as a string data type > >> pending > >>>>> user experience or tools being used?? > >>>>> > >>>>> —Andries > >>>>> > >>>>> > >>>>> > >>>>>> On Oct 31, 2015, at 4:44 PM, Hsuan Yi Chu <[email protected]> > >>> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> For example, say, we are querying a field with empty list: > >>>>>> > >>>>>> select a from `XXX,json` > >>>>>> > >>>>>> { > >>>>>> a: [] > >>>>>> } > >>>>>> > >>>>>> What is the expected result in the sqlline? Which one of the > >>> following > >>>>>> should it be? > >>>>>> > >>>>>> 1. [] > >>>>>> 2. null > >>>>>> 3. no record > >>>>>> > >>>>>> Thanks. > >>>>> > >>>>> > >>>> > >>> > >> > >
