Hi Charles,

You asked three questions.

* How do we write arrays?
* How do we write maps?
* What tools are available in the code to help?

Let’s start with maps because I happen to be mucking with those at the moment. 
A map in Drill is really just a nested record, it is not a map like you’d find 
in Java or Python. [1] is a conceptual write-up of how maps work in Drill.

To write to a map, you first create a map vector per record batch. The map is a 
container of vectors for each member. The trick here is to realize that a Drill 
map is not an independent collection of name/value pairs per record. It is 
instead a single collection of vectors shared by ALL records in a batch. That 
is, in Drill, a map is a nested record (tuple), not really a map in the classic 
sense. Once you create your vector for your map member, you can use it just 
like a top-level vector.

Array vectors are just like other vectors: there is one vector for the entire 
record batch. Arrays have an extra twist: an indirection vector that points to 
the first entry for each record. All values from your field2 go into that 
single array; with the indirection vector having an entry per record that 
points to the start of that record’s values. (The number of values is found by 
taking the difference between the entry for record i+1 and that for record i.)

The code does provide vector readers and writers, but I’m not very familiar 
with them.

The best place to see this in action is the JSON record reader, specifically 
the JsonReader class.

Perhaps others can provide better, more concrete suggestions.

Thanks,

- Paul

[1] https://github.com/paul-rogers/drill/wiki/Drill-Maps


On Mar 26, 2017, at 1:18 PM, Charles Givre 
<cgi...@gmail.com<mailto:cgi...@gmail.com>> wrote:

Hello all,
I’m working on a format plugin for a filetype that will have a mix of Strings 
and nested fields.  Basically something like this:

field1:  String
field2:  Array
etc…
My preference is to keep the nested data in the nested format rather than 
de-nest it, but I suppose that is always an option.

I’ve gotten the format plugin to write Strings to the Drill buffer, but I’m not 
quite sure how to get it to write an Array or Map.  I’ve found the Map and List 
writer objects, but I’m not quite sure how to use them in this context.  Are 
there any examples that someone could point me to, or could someone explain how 
this can be done?
Thanks,
— C

Reply via email to