Hi Igor,
Glad the community was able to provide a bit of help.
Let's talk about about another topic. You said: "And main purpose will be
hiding of repeated map meta keys
("key","value") and simulation of real map functionality."
On the one hand, we are all accustomed to thinking of a Java (or Python) map as
a black box: store (key, value) pairs, retrieve values by key. This is the
programming view. I wonder, however, if it is the best SQL view.
Drill is, of course, SQL-based. It may be easier to bring the data to SQL than
to bring SQL to the data. SQL works on tables (relations) and is very powerful
when doing so. Standard SQL does not, however, provide tools to work with
dictionaries. (There is an extension, SQL++, that might provide such
extensions. But, even if Drill supported SQL++, no front-end tools provides
such support AFAIK.)
So, how do we bring the DICT type to SQL? We do so by noting that a DICT is
really a little table of (key, value) pairs (with a uniqueness constraint on
the key.) Once we adopt this view, we can apply (I hope!) the nested table
mechanism recently added to Drill.
This means that the user DOES want to know the name of the key and value
columns: they are columns in a tuple (relation) that can be joined and
filtered. Suppose each customer has a DICT of contact information with keys as
"office", "home", "cell",... and values as the phone number. You can use SQL to
find the office numbers:
SELECT custName, contactInfo.value as phone WHERE contactInfo.key = "office"...
So, rather than wanting to hide the (key, value) structure of a DICT, we could
argue that exposing that structure allows the DICT to look like a relation, and
thus exploit existing Drill features. In fact, this may make Drill more
powerful when working Hive maps than is Hive itself (If Hive treats maps as
opaque objects.)
You also showed the SQLLine output you would like for a DICT column. This
example exposes a "lie" (a short-cut) that Sqlline exploits. SqlLine asks Drill
to convert a column to a Java Object of some sort, then SqlLine calls
toString() on that object to produce the value you see in SqlLine output.
Some examples. An array (repeated) column is a set of values. Drill converts
the repeated value to a Java array, which toString() converts to something like
"[1, 2, 3]". The same is true of MAP: Drill converts it to a Java Map, toString
converts it to a JSON-like presentation.
So, your DICT (or repeated map) type should provide a getObject() method that
converts the repeated map to a Java Map. SqlLine will convert the map object to
the display format you showed in your example. (My guess is that a repeated map
today produces an array of Java Map objects: you want a single Java Map built
from the key/value pairs.)
A JDBC user can use the getObject() method to retrieve a Java Map
representation of a Drill DICT. (This functionality is not available in ODBC
AFAIK.) The same is true for anyone brave enough to use the native Drill client
API.
Thanks,
- Paul
On Monday, June 3, 2019, 7:08:42 AM PDT, Igor Guzenko
<[email protected]> wrote:
Hi all,
So finally, I'm going to abandon the renaming ticket DRILL-7097 and
related PR (1803).
Next, the DRILL-7096 should be rewritten to cover addition of new DICT
type. But, if I understand correctly,
based on repeated vector, now result for new type will be returned like:
row | dict_column MAP<INT, STRING>
------------------------------------------------------------------------------------------------------
1 | [{"key":1, "value":"v1"}, {"key":2, "value":"v2"} ]
2 | [{"key":0, "value":"v7"}, {"key":2, "value":"v2"}, {"key":4,
"value":"v4"} ]
3 | [{"key":-1, "value":"o"}]
And main purpose will be hiding of repeated map meta keys
("key","value") and simulation of real map functionality.
I believe that actually it won't be so easy to reuse all existing
functionality for repeated maps to return logically correct
results for DICT, because it's usage of repeated map in unexpected
way. Also I'd like to hear thoughts from Bohdan about
such application of repeated maps instead of new vector.
Thanks, Igor