paul-rogers commented on issue #1870: DRILL-7359: Add support for DICT type in 
RowSet Framework
URL: https://github.com/apache/drill/pull/1870#issuecomment-552769421
 
 
   The schema builder issue has been nagging at me. I observed above that the 
current way of handling nested types is getting too complex, with all the 
special cases and "resumeFoo" methods.
   
   The current design was originally designed to build a `BatchSchema` with 
`MaterializedField`s,. Those classes require a top-down construction. (That is 
also why the final `build()` method returns a `BatchSchema` and why we must 
call `buildSchema()` to get a `TupleMetadata`.)
   
   It turns out that the `SchemaMetadata` classes *do not* have the same 
constraints. They can actually be built bottom-up. It never occurred to me to 
clean up the old-style of building when we repurposed `SchemaBuilder` to build 
metadata objects.
   
   So, since we can change the structure, we could borrow a technique from, I 
think, certain JSON builders. Rather than having the schema builder create the 
nested type builder, ask the user to do so explicitly:
   
   ```
   TupleMetadata schema = new SchemaBuilder()
     .addMap(new MapBuilder("myMap")
       .add("name", MinorType.VARCHAR)
       .addMap(new MapBuilder("nested")
         .add("id", MinorType.INT)
         .build())
       .build())
     .add("last", MinorType.INT)
     .build();
   ```
   
   Adopting this model as additional benefits. It is currently a bit awkward to 
create a bare `ColumnMetadata`. We could create helper methods to create each 
kind of column schema. Something like:
   
   ```
   foo = SchemaBuilder.column("foo", MinorType.INT, DataMode.REQUIRED);
   bar = SchemaBuilder.dict("bar", MinorType.VARCHAR)
     .value(SchemaBuilder.map(...))
     .build();
   ```
   
   Lots of things we can play with.
   
   Given all of this, my advice for the `DICT` type is to figure out something 
that works with the structure we already have. Then, as a separate PR, we can 
clean up the `SchemaBuilder` to make it easier to use for the many complex 
types we now support. That way, you don't have to solve the schema builder 
problem in this PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to