Some thoughts around concurrency for INSERTs and SELECTs: For the consistency model we could aim to provide something close to snapshot-isolation. Only 1 INSERT to a particular table is allowed at a time. SELECT queries happen concurrently while the INSERT is going on, but in reality there is a finer-granularity of steps:
For INSERT operation: 1. create a temp directory (or hidden directory) to store the files - say foo/tmpdir for the table foo 2. start writing files to foo/tmpdir 3. when complete, rename foo/tmpdir to indicate that it is available for queries. <--- Q: is there a need for a short term directory level locking ? 4. call REFRESH METADATA that does the merging of the existing schema with the new files and collects stats on the new files 4(a). During this process, a metadata write lock needs to be acquired For any query that is running concurrently with the INSERT, we should only block for step 4(a). This is a relatively short period of time compared to the overall elapsed time of the INSERT. The SELECT queries themselves have following steps: 1. Query planning 1(a) Read the table metadata (presumably from the metadata file) by first acquiring a read lock on the metadata, reading the metadata and releasing the lock. 1(b) Perform query optimization 2. Query execution and return results The only blocking step is 1(a). Q: is it possible to completely avoid any locks and what would be the consistency model that we can claim ? We are not building transactional system but do need to have a well-defined deterministic behavior in the presence of concurrency. Q: if we do acquire locks, what should be the nature of these...are they persistent etc. Aman On Thu, Jul 30, 2015 at 9:28 AM, Aman Sinha <asi...@maprtech.com> wrote: > Two more observations: > 1. I would think the merged schema should be deterministic and not depend > on the order of scanning the files. How can > this be enforced ? Timestamp order, lexicographical order ? .. but > these are being read in parallel, so we have to think > through this.. > 2. Due to merging of schemas, a column that was of REQUIRED type in the > original parquet file may end up as having > NULL values after merging. If we do a CTAS of the merged schema, > this column will be OPTIONAL type in parquet. > Is this acceptable ? > > On Wed, Jul 29, 2015 at 7:42 PM, Jacques Nadeau <jacq...@dremio.com> > wrote: > >> One comment... I think this is an overreach: >> >> "The union schema is saved in a .drill file" >> >> There is no point in saving something derived in a .drill file. Maybe a >> cache file (extending the other work on metadata caching). The idea for >> .drill files is that they are supplementary to what can be acquired from >> the data directly. >> >> -- >> Jacques Nadeau >> CTO and Co-Founder, Dremio >> >> On Wed, Jul 29, 2015 at 6:58 PM, Parth Chandra <par...@apache.org> wrote: >> >> > Notes based on discussion in hangout, JAcques' notes and later >> discussion: >> > >> > >> > 1. For Parquet files >> > - Merge schema information into a union schema >> > - The union schema is saved in a .drill file >> > - The union schema can be cached >> > - Union schema should maintain ordinal position of fields. >> > 2. At the time the table is read, we can check directory timestamp >> and >> > regenerate merged schema >> > 3. Insert into does not need to update the merged schema. (It should >> > check if there is an incompatible schema being written? An >> incompatible >> > schema is one that cannot be merged successfully. ) >> > - Suggestion 1 - Allow flexible insert. (Issue a warning if there >> is >> > going to be a schema change conflict). Queries on columns with >> > no conflict >> > can still run. >> > - Suggestion 2 - Do not allow insert unless the schema can be >> > merged. >> > 4. Rules for merging can be relaxed as we improve our ability to >> promote >> > from one type to another >> > 5. Concept of schema change different from a merged schema. Schema >> > change is needed for reading schema less sources like JSON. >> > 6. If there is a different column order (or difference in number of >> > columns) in the table files, an insert based on ordinal positions >> > cannot be >> > allowed. >> > 7. Suggestion - go with a naming based insert. Ordinal based insert >> can >> > be enabled based on a switch (?) >> > 8. Reuse partitioning information from 'create table' in 'insert >> into'. >> > No partitioning info is needed in the 'insert into' statement. >> > >> > >> > >> > On Mon, Jul 27, 2015 at 6:43 PM, Khurram Faraaz <kfar...@maprtech.com> >> > wrote: >> > >> > > I have a question related to insert into partitioned tables. I see >> that >> > > DB2, Oracle, and Postgres do not support a PARTITION BY clause in >> their >> > > INSERT INTO implementation. MySQL does however provide support for a >> > > PARTITION BY clause in their INSERT INTO implementation. I do not know >> > what >> > > the standard ANSI SQL specification says about INSERT INTO statement. >> We >> > > may want to keep it compliant with standard SQL syntax. >> > > >> > > >> B ) Support insert into a partitioned table >> > > >> ---------------------- >> > > >> > > >> Ex: INSERT INTO T1 col1, col2,col3 partition by col1,col2 SELECT >> > > col4,col,col6 from T2 >> > > >> > > >> * Target column specification is required when inserting data into >> an >> > > >> already partitioned table >> > > >> * Requirements A.3-A.7 above apply for insert into partitioned >> tables >> > > >> as well >> > > >> * A partition by clause along with one or more columns is required >> > > >> * All the columns specified in partition by clause must exist in >> the >> > > >> target column list >> > > >> * Partition by columns specified do not need to match to the list >> of >> > > >> columns that the original table partitioned with (i.e if the >> > > >> original table is partitioned with col1, col2, new data during >> > > >> insert can be partitioned by col3 or just with col1 or col2..) >> > > >> > > On Mon, Jul 27, 2015 at 5:40 PM, Mehant Baid <baid.meh...@gmail.com> >> > > wrote: >> > > >> > > > I wanted to start a conversation around supporting the "Insert into >> > > Table" >> > > > feature. As of 1.2 we initially want to support inserting into a >> table >> > > with >> > > > Parquet files. Support for Json, CSV and other sources will follow >> as >> > > > future enhancements. >> > > > >> > > > Aman, Jinfeng, Neeraja and I had an initial discussion about this >> and >> > > > Neeraja provided a good summary of our discussion (pasted below) >> also >> > > > stating some of the requirements for this feature. >> > > > >> > > > A ) Support Insert into a non-partitioned table >> > > > --------------------- >> > > > >> > > > Ex: INSERT INTO T1 [col1, col2, col3] SELECT col4, col5, col6 from >> T2 >> > > > (Source table: T2, Target table T1) >> > > > Requirements: >> > > > >> > > > 1. Target table column list specification is optional for Insert >> > > statement >> > > > 2. When specified, the column list in the Insert statement should >> > > > contain all the columns present in the target table (i.e No >> support >> > > > for partial insert) >> > > > 3. The column names specified for the source table do not need to >> match >> > > > to the target table column names. Match is performed based on >> > ordinal. >> > > > 4. # of Source table columns specified must be same as # of target >> > > > table columns >> > > > 5. Types of specified source table columns must match to the types >> of >> > > > target table columns >> > > > 6. Specification of * is not allowed in the Select table syntax >> > > > 7. Select table syntax can specify constant values for one or more >> > > columns >> > > > >> > > > >> > > > B ) Support insert into a partitioned table >> > > > ---------------------- >> > > > >> > > > Ex: INSERT INTO T1 col1, col2,col3 partition by col1,col2 SELECT >> > > > col4,col,col6 from T2 >> > > > >> > > > * Target column specification is required when inserting data into >> an >> > > > already partitioned table >> > > > * Requirements A.3-A.7 above apply for insert into partitioned >> tables >> > > > as well >> > > > * A partition by clause along with one or more columns is required >> > > > * All the columns specified in partition by clause must exist in >> the >> > > > target column list >> > > > * Partition by columns specified do not need to match to the list >> of >> > > > columns that the original table partitioned with (i.e if the >> > > > original table is partitioned with col1, col2, new data during >> > > > insert can be partitioned by col3 or just with col1 or col2..) >> > > > >> > > > >> > > > Couple of open questions from the design perspective are >> > > > >> > > > 1. How do we perform validation. Validation of data types, number of >> > > > columns being inserted etc. In addition to validation we need to >> make >> > > sure >> > > > that when we insert into an existing tables we insert data with the >> > > > existing column names (select column list can have different names). >> > This >> > > > poses problems around needing to know the metadata at planning time, >> > two >> > > > approaches that have been floating around are >> > > > * DotDrill files: We can store metadata, partitioning >> columns >> > and >> > > > other useful information here and we can perform validation during >> > > planning >> > > > time. However the challenges with introducing DotDrill files include >> > > > - consistency between metadata and the actual data >> > (Nothing >> > > > preventing users to copy files directly). >> > > > - security around DotDrill files (can be dealt in the >> same >> > > > way we perform security checks for drill tables in hdfs) >> > > > - interface to change the DotDrill file, in the case we >> > need >> > > > to add a column to the table or add a new partition etc. >> > > > >> > > > * Explicit Syntax/ No metadata approach: Another approach >> is to >> > > > avoid DotDrill files and use explicit syntax to glean as much >> > information >> > > > as possible from the SQL statement itself. Some of the challenges >> with >> > > this >> > > > approach are >> > > > - Gathering metadata information: Since we have no idea >> > what >> > > > the existing schema is we would need to perform a "mini scan" to >> learn >> > > the >> > > > schema at planning time to be able to perform some validation. The >> > > problem >> > > > with this approach is how do we determine how many files we need to >> > read >> > > in >> > > > order to learn the schema? If we use a sample set and not all the >> files >> > > > have the same schema, >> > > > we could have non-deterministic results based on the >> > > > sample of files read. Also reading all the files and merging the >> schema >> > > > seems like an expensive cost to pay. >> > > > - From the user's perspective, while inserting into a >> > > > partitioned table, user will have to specify the partitioning >> columns >> > > again >> > > > in the Insert statement, despite having specified the partition >> columns >> > > in >> > > > the CTAS. >> > > > >> > > > 2. What is a reasonable assumption for a Drill table in terms of >> > changing >> > > > schema. Having the same exact schema for all files in a table is too >> > > rigid >> > > > an assumption at this point? >> > > > >> > > > One thing to remember with DotDrill file is to also the >> repercussions >> > on >> > > > Drop table, Show tables, Describe table etc. i.e. it might make it >> > easier >> > > > to be able to support these operations. >> > > > >> > > > Thanks >> > > > Mehant >> > > > >> > > >> > >> > >