Hi Igor,

Glad the community was able to provide a bit of help.

Let's talk about about another topic. You said: "And main purpose will be 
hiding of repeated map meta keys
("key","value") and simulation of real map functionality."

On the one hand, we are all accustomed to thinking of a Java (or Python) map as 
a black box: store (key, value) pairs, retrieve values by key. This is the 
programming view. I wonder, however, if it is the best SQL view.

Drill is, of course, SQL-based. It may be easier to bring the data to SQL than 
to bring SQL to the data. SQL works on tables (relations) and is very powerful 
when doing so. Standard SQL does not, however, provide tools to work with 
dictionaries. (There is an extension, SQL++, that might provide such 
extensions. But, even if Drill supported SQL++, no front-end tools provides 
such support AFAIK.)

So, how do we bring the DICT type to SQL? We do so by noting that a DICT is 
really a little table of (key, value) pairs (with a uniqueness constraint on 
the key.) Once we adopt this view, we can apply (I hope!) the nested table 
mechanism recently added to Drill.

This means that the user DOES want to know the name of the key and value 
columns: they are columns in a tuple (relation) that can be joined and 
filtered. Suppose each customer has a DICT of contact information with keys as 
"office", "home", "cell",... and values as the phone number. You can use SQL to 
find the office numbers:


SELECT custName, contactInfo.value as phone WHERE contactInfo.key = "office"...


So, rather than wanting to hide the (key, value) structure of a DICT, we could 
argue that exposing that structure allows the DICT to look like a relation, and 
thus exploit existing Drill features. In fact, this may make Drill more 
powerful when working Hive maps than is Hive itself (If Hive treats maps as 
opaque objects.)


You also showed the SQLLine output you would like for a DICT column. This 
example exposes a "lie" (a short-cut) that Sqlline exploits. SqlLine asks Drill 
to convert a column to a Java Object of some sort, then SqlLine calls 
toString() on that object to produce the value you see in SqlLine output.

Some examples. An array (repeated) column is a set of values. Drill converts 
the repeated value to a Java array, which toString() converts to something like 
"[1, 2, 3]". The same is true of MAP: Drill converts it to a Java Map, toString 
converts it to a JSON-like presentation.

So, your DICT (or repeated map) type should provide a getObject() method that 
converts the repeated map to a Java Map. SqlLine will convert the map object to 
the display format you showed in your example. (My guess is that a repeated map 
today produces an array of Java Map objects: you want a single Java Map built 
from the key/value pairs.)


A JDBC user can use the getObject() method to retrieve a Java Map 
representation of a Drill DICT. (This functionality is not available in ODBC 
AFAIK.) The same is true for anyone brave enough to use the native Drill client 
API.


Thanks,
- Paul

 

    On Monday, June 3, 2019, 7:08:42 AM PDT, Igor Guzenko 
<ihor.huzenko....@gmail.com> wrote:  
 
 Hi all,

So finally, I'm going to abandon the renaming ticket DRILL-7097 and
related PR (1803).

Next, the DRILL-7096 should be rewritten to cover addition of new DICT
type. But, if I understand correctly,
based on repeated vector, now result for new type will be returned like:

row |  dict_column MAP<INT, STRING>
------------------------------------------------------------------------------------------------------
  1  | [{"key":1, "value":"v1"}, {"key":2, "value":"v2"} ]
  2  | [{"key":0, "value":"v7"}, {"key":2, "value":"v2"}, {"key":4,
"value":"v4"} ]
  3  | [{"key":-1, "value":"o"}]

And main purpose will be hiding of repeated map meta keys
("key","value") and simulation of real map functionality.

I believe that actually it won't be so easy to reuse all existing
functionality for repeated maps to return logically correct
results for DICT, because it's usage of repeated map in unexpected
way. Also I'd like to hear thoughts from Bohdan about
such application of repeated maps instead of new vector.

Thanks, Igor

  

Reply via email to