0 is a legit value for some uses of numbers, so we do need an
out-of-band value.  :-)

On 12/19/15 10:16 PM, Wail Alkowaileet wrote:
I have a small thought on that one ... would 0=null for numerical sparse
list? or would better to extend the complex types to have "vectors and
matrices" ?

On Fri, Dec 18, 2015 at 5:45 PM, Mike Carey <[email protected]> wrote:

Agreed.  We probably need a mini design doc here. The short term urgency
seems to be a need to represent lists that can include nulls, as this is
blocking JPL and is also something easily produced by queries (AQL or
SQL++).  Longer term one can imagine where this would be something that
might vary (at the lowest level of detail) by list, e.g., you might
represent dense and sparse lists quite differently, you might use
compression for certain kinds of lists, etc.


On 12/18/15 1:57 AM, Till Westmann wrote:

Hi Ildar,

it seems that we have 2 separate points here:
1) There are bugs in the way we decide which list representation to use
and
2) we could add support for (and an optimized representation for) a list
of a fixed but nullable type.
It seems that - by fixing 1) - we could get rid of the issues you’ve
listed.

But I also think that it would be nice to support lists of a nullable
type (feels like an omission that we don’t support that in the language) -
and potentially provide an efficient representation for them.
However, it is not clear to me how we would do this.
A few thoughts:
- Would we maintain the current representation for homogenous lists of
non-nullable types?
- Would we introduce a new type tag for “nullable lists”?
- Would we redefine the current representation to mean something else?
Do you have thoughts on those?

Cheers,
Till

On 16 Dec 2015, at 8:12, Ildar Absalyamov wrote:

Hi devs,
Recently I have been playing around with lists and functions, which
receive/return list parameters/values. I have noticed one particular issue,
which seems to be incorrect.
As you might know internally we do support 2 types of lists homogeneous,
where all the items are untagged and the item type is stored in type
definition, and heterogeneous, where items on contrary are tagged, and the
list item type is effectively ANY.
The decision which of two types would be used is usually done by parser
or is altered by IntroduceEnforcedListTypeRule, which effectively turns
heterogenous list into homogenous if all the items have the same type.
Right now only we allow homogeneous lists to be defined as a field in
some type, we also restrict the item type to be only non-nullable type:
create type listType {
“id”:int64,
“list”:[int64]   // [int64?] is not possible
}

This constraint spans both of the language level as well as
serialization. Under that restriction the only way to load the list, which
contains null values, would be to make the appropriate field open (open
lists are heterogenous by definition).

1) Seems like we’re missing an optimization opportunity when we are
dealing with large sparse lists. Serialization in this case might use a bit
mask to specify which items in the lists are not null, and later encode
only those items.
2) I believe if we alter IntroduceEnforcedListTypeRule to enforce list
to homogeneous list with nullable item type we might resolve issues
https://issues.apache.org/jira/browse/ASTERIXDB-905,
https://issues.apache.org/jira/browse/ASTERIXDB-867,
https://issues.apache.org/jira/browse/ASTERIXDB-1131all at once.

Thoughts?

Best regards,
Ildar



Reply via email to