Some thoughts around concurrency for INSERTs and SELECTs:

For the consistency model we could aim to provide something close to
snapshot-isolation.
Only 1 INSERT to a particular table is allowed at a time.
SELECT queries happen concurrently while the INSERT is going on, but in
reality there is a finer-granularity of steps:

 For INSERT operation:
   1. create a temp directory (or hidden directory) to store the files -
say foo/tmpdir  for the table foo
   2. start writing files to foo/tmpdir
   3. when complete, rename foo/tmpdir to indicate that it is available for
queries.  <--- Q: is there a need for a short term directory level locking
?
   4. call REFRESH METADATA that does the merging of the existing schema
with the new files and collects stats on the new files
      4(a).  During this process, a metadata write lock needs to be acquired

For any query that is running concurrently with the INSERT,  we should only
block for step 4(a).  This is a relatively short period of time
compared to the overall elapsed time of the INSERT.

The SELECT queries themselves have following steps:
  1. Query planning
     1(a) Read the table metadata (presumably from the metadata file)  by
first acquiring a read lock on the metadata, reading the metadata
            and releasing the lock.
     1(b) Perform query optimization
  2. Query execution and return results

The only blocking step is 1(a).

Q:  is it possible to completely avoid any locks and what would be the
consistency model that we can claim ?  We are not building transactional
system
      but do need to have a well-defined deterministic behavior in the
presence of concurrency.
Q:  if we do acquire locks, what should be the nature of these...are they
persistent etc.

Aman

On Thu, Jul 30, 2015 at 9:28 AM, Aman Sinha <asi...@maprtech.com> wrote:

> Two more observations:
> 1.  I would think the merged schema should be deterministic and not depend
> on the order of scanning the files.  How can
>      this be enforced ? Timestamp order, lexicographical order ? .. but
> these are being read in parallel, so we have to think
>      through this..
> 2.  Due to merging of schemas, a column that was of REQUIRED type in the
> original parquet file may end up as having
>      NULL values after merging.  If we do a CTAS of the merged schema,
> this column will be OPTIONAL type in parquet.
>      Is this acceptable ?
>
> On Wed, Jul 29, 2015 at 7:42 PM, Jacques Nadeau <jacq...@dremio.com>
> wrote:
>
>> One comment... I think this is an overreach:
>>
>> "The union schema is saved in a .drill file"
>>
>> There is no point in saving something derived in a .drill file.  Maybe a
>> cache file (extending the other work on metadata caching).  The idea for
>> .drill files is that they are supplementary to what can be acquired from
>> the data directly.
>>
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>>
>> On Wed, Jul 29, 2015 at 6:58 PM, Parth Chandra <par...@apache.org> wrote:
>>
>> > Notes based on discussion in hangout, JAcques' notes and later
>> discussion:
>> >
>> >
>> >    1. For Parquet files
>> >       - Merge schema information into a union schema
>> >       - The union schema is saved in a .drill file
>> >       - The union schema can be cached
>> >       - Union schema should maintain ordinal position of fields.
>> >    2. At the time the table is read, we can check directory timestamp
>> and
>> >    regenerate merged schema
>> >    3. Insert into does not need to update the merged schema. (It should
>> >    check if there is an incompatible schema being written? An
>> incompatible
>> >    schema is one that cannot be merged successfully. )
>> >    -   Suggestion 1 - Allow flexible insert. (Issue a warning if there
>> is
>> >       going to be a schema change conflict). Queries on columns with
>> > no conflict
>> >       can still run.
>> >       -   Suggestion 2 - Do not allow insert unless the schema can be
>> >       merged.
>> >    4. Rules for merging can be relaxed as we improve our ability to
>> promote
>> >    from one type to another
>> >    5. Concept of schema change different from a merged schema. Schema
>> >    change is needed for reading schema less sources like JSON.
>> >    6. If there is a different column order (or difference in number of
>> >    columns) in the table files, an insert based on ordinal positions
>> > cannot be
>> >    allowed.
>> >    7. Suggestion - go with a naming based insert. Ordinal based insert
>> can
>> >    be enabled based on a switch (?)
>> >    8. Reuse partitioning information from 'create table' in 'insert
>> into'.
>> >    No partitioning info is needed in the 'insert into' statement.
>> >
>> >
>> >
>> > On Mon, Jul 27, 2015 at 6:43 PM, Khurram Faraaz <kfar...@maprtech.com>
>> > wrote:
>> >
>> > > I have a question related to insert into partitioned tables. I see
>> that
>> > > DB2, Oracle, and Postgres do not support a PARTITION BY clause in
>> their
>> > > INSERT INTO implementation. MySQL does however provide support for a
>> > > PARTITION BY clause in their INSERT INTO implementation. I do not know
>> > what
>> > > the standard ANSI SQL specification says about INSERT INTO statement.
>> We
>> > > may want to keep it compliant with standard SQL syntax.
>> > >
>> > > >> B ) Support insert into a partitioned table
>> > > >> ----------------------
>> > >
>> > > >> Ex: INSERT INTO T1 col1, col2,col3  partition by col1,col2 SELECT
>> > > col4,col,col6 from T2
>> > >
>> > > >> * Target column specification is required when inserting data into
>> an
>> > > >>   already partitioned table
>> > > >> * Requirements A.3-A.7 above apply for insert into partitioned
>> tables
>> > > >>   as well
>> > > >> * A partition by clause along with one or more columns is required
>> > > >> * All the columns specified in partition by clause must exist in
>> the
>> > > >>   target column list
>> > > >> * Partition by columns specified do not need to match to the list
>> of
>> > > >>   columns that the original table partitioned with (i.e if the
>> > > >>   original table is partitioned with col1, col2,  new data during
>> > > >>   insert can be partitioned by col3 or just with col1 or col2..)
>> > >
>> > > On Mon, Jul 27, 2015 at 5:40 PM, Mehant Baid <baid.meh...@gmail.com>
>> > > wrote:
>> > >
>> > > > I wanted to start a conversation around supporting the "Insert into
>> > > Table"
>> > > > feature. As of 1.2 we initially want to support inserting into a
>> table
>> > > with
>> > > > Parquet files. Support for Json, CSV and other sources will follow
>> as
>> > > > future enhancements.
>> > > >
>> > > > Aman, Jinfeng, Neeraja and I had an initial discussion about this
>> and
>> > > > Neeraja provided a good summary of our discussion (pasted below)
>> also
>> > > > stating some of the requirements for this feature.
>> > > >
>> > > >  A ) Support Insert into a non-partitioned table
>> > > > ---------------------
>> > > >
>> > > > Ex: INSERT INTO T1 [col1, col2, col3]  SELECT col4, col5, col6 from
>> T2
>> > > > (Source table: T2, Target table T1)
>> > > > Requirements:
>> > > >
>> > > > 1. Target table column list specification is optional for Insert
>> > > statement
>> > > > 2. When specified, the column list in the Insert statement should
>> > > >    contain all the columns present in the target table (i.e No
>> support
>> > > >    for partial insert)
>> > > > 3. The column names specified for the source table do not need to
>> match
>> > > >    to the target table column names. Match is performed based on
>> > ordinal.
>> > > > 4.   # of Source table columns specified must be same as # of target
>> > > >    table columns
>> > > > 5. Types of specified source table columns must match to the types
>> of
>> > > >    target table columns
>> > > > 6. Specification of * is not allowed in the Select table syntax
>> > > > 7. Select table syntax can specify constant values for one or more
>> > > columns
>> > > >
>> > > >
>> > > >  B ) Support insert into a partitioned table
>> > > > ----------------------
>> > > >
>> > > > Ex: INSERT INTO T1 col1, col2,col3  partition by col1,col2 SELECT
>> > > > col4,col,col6 from T2
>> > > >
>> > > >  * Target column specification is required when inserting data into
>> an
>> > > >    already partitioned table
>> > > >  * Requirements A.3-A.7 above apply for insert into partitioned
>> tables
>> > > >    as well
>> > > >  * A partition by clause along with one or more columns is required
>> > > >  * All the columns specified in partition by clause must exist in
>> the
>> > > >    target column list
>> > > >  * Partition by columns specified do not need to match to the list
>> of
>> > > >    columns that the original table partitioned with (i.e if the
>> > > >    original table is partitioned with col1, col2,  new data during
>> > > >    insert can be partitioned by col3 or just with col1 or col2..)
>> > > >
>> > > >
>> > > > Couple of open questions from the design perspective are
>> > > >
>> > > > 1. How do we perform validation. Validation of data types, number of
>> > > > columns being inserted etc. In addition to validation we need to
>> make
>> > > sure
>> > > > that when we insert into an existing tables we insert data with the
>> > > > existing column names (select column list can have different names).
>> > This
>> > > > poses problems around needing to know the metadata at planning time,
>> > two
>> > > > approaches that have been floating around are
>> > > >         * DotDrill files: We can store metadata, partitioning
>> columns
>> > and
>> > > > other useful information here and we can perform validation during
>> > > planning
>> > > > time. However the challenges with introducing DotDrill files include
>> > > >              - consistency between metadata and the actual data
>> > (Nothing
>> > > > preventing users to copy files directly).
>> > > >              - security around DotDrill files (can be dealt in the
>> same
>> > > > way we perform security checks for drill tables in hdfs)
>> > > >              - interface to change the DotDrill file, in the case we
>> > need
>> > > > to add a column to the table or add a new partition etc.
>> > > >
>> > > >         * Explicit Syntax/ No metadata approach: Another approach
>> is to
>> > > > avoid DotDrill files and use explicit syntax to glean as much
>> > information
>> > > > as possible from the SQL statement itself. Some of the challenges
>> with
>> > > this
>> > > > approach are
>> > > >              - Gathering metadata information: Since we have no idea
>> > what
>> > > > the existing schema is we would need to perform a "mini scan" to
>> learn
>> > > the
>> > > > schema at planning time to be able to perform some validation. The
>> > > problem
>> > > > with this approach is how do we determine how many files we need to
>> > read
>> > > in
>> > > > order to learn the schema? If we use a sample set and not all the
>> files
>> > > > have the same schema,
>> > > >                 we could have non-deterministic results based on the
>> > > > sample of files read. Also reading all the files and merging the
>> schema
>> > > > seems like an expensive cost to pay.
>> > > >              - From the user's perspective, while inserting into a
>> > > > partitioned table, user will have to specify the partitioning
>> columns
>> > > again
>> > > > in the Insert statement, despite having specified the partition
>> columns
>> > > in
>> > > > the CTAS.
>> > > >
>> > > > 2. What is a reasonable assumption for a Drill table in terms of
>> > changing
>> > > > schema. Having the same exact schema for all files in a table is too
>> > > rigid
>> > > > an assumption at this point?
>> > > >
>> > > > One thing to remember with DotDrill file is to also the
>> repercussions
>> > on
>> > > > Drop table, Show tables, Describe table etc. i.e. it might make it
>> > easier
>> > > > to be able to support these operations.
>> > > >
>> > > > Thanks
>> > > > Mehant
>> > > >
>> > >
>> >
>>
>
>

Reply via email to