Hangout minutes - 2015-07-28

Parth Chandra Wed, 29 Jul 2015 11:03:32 -0700

Attendees:  Andries, Daniel, Hanifi, Jacques, Jason,
Jinfeng, Khurram,  Kristine, Mehant , Neeraja, Parth, Sudheesh (host)


Minutes based on notes from Sudeesh -

1) Jacques working on the following -
      a) RPC changes - Sudheesh/Parth reported a regression in perf numbers
which was unexpected. Tests are being rerun.
      b) Apache log - format plugin.
      c) Support for Double quote.
      d) Allow JSON literals.

1) Parquet filter pushdown - Patch from Adam Gilmore is waiting review.
This patch with conflict with Steven's work on metadata caching. Metadata
caching needs to go in first.

2) JDBC storage plugin - Patch from Magnus. Parth to follow up to get
updated code.

3) Discussion on Embedded types -
   a) Two types of common problems are being hit -
        1) Soft Schema change - Lots of initial nulls and then a type
appears or the type changes to a type that can be promoted to the initial
type. Drill assumes type to be nullable int if it cannot determine the
type.Discussion on using nullable Varchar/Varbinary instead of nullable
int. Suggestion was that we need to introduce some additional types -
            i) Introduce a LATE  binding type ( type is not known).
            ii) Introduce a NULL type - only null
           iii) Schema sampling to determine schema- use for fast schema.
        2) Hard Schema Change - A schema change that is not transitionable.
   b) Open questions -    How do we materialize to the user?  How do
clients expect to handle the schema change events. What does a BI tool like
Tableau do if a new column is introduced. What is the expectation of a
JDBC/ODBC application (what do the standards specify, if anything). Neeraja
to follow up and specify.
   c) Proposal to add support for embedded types where each value carries
type information (covered in DRILL-3228) This requires a detailed design
before we begin implementation.

4) Discussion on 'Insert into' (based on Mehant's post)
   a) In general, the feature is expected to behave like in any database.
Complications arise when the user choses to insert a different schema or
partitions from the the original table.
  b) Jacques's main concern regarding this: Do we want Drill to be flexible
and be able to add columns and be able to not specify columns while
inserting or do we want it to behave like a traditional Data Warehouse
where we do ordinal matching and are strict about the number of columns
being inserted into the target table.
   c) We should validate the schema where we can (eg parquet), however we
should start by validating metadata for queries and use that feature in
Insert as opposed to building that in Insert.
   d) If we allow insert into with a different schema and we cannot read
the file, then that would be embarrassing.
   e) If we are trying to solve a specific BI tool use case for inserts
then we should explore going down the route of solving this specific use
case, and treat the insert like CTAS today.


5) Discussion on 'Drop table'
  a) Strict identification of table - Don't drop tables that Drill can't
query.
  b) Fail if there is a file that does not match.
  c) If no impersonation is enabled then drop only drill owned tables.

   More detailed notes on #4 and #5 to be posted by Jacques.

Hangout minutes - 2015-07-28

Reply via email to