[ 
https://issues.apache.org/jira/browse/HIVE-3463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459424#comment-13459424
 ] 

Alexander Alten-Lorenz commented on HIVE-3463:
----------------------------------------------

More insight:

1. The drop table command deletes entries in different tables as UNIT 
transactions, so if there is a communication break or any other uncertain 
condition, database could rollback that individual transaction leaving database 
in inconsistent state. 

2. The drop table is not serialized action, which is causing Replication break 

3. As cascade is not defined in the FK definition during table creation, the 
stale could present in certain conditions. 

4. The database with id 511 is showing the tables of default database who’s 
db_id is 1 

5. The select on any portioned table with a limit operator scans through all 
the partitions using UNIT transactions, which just query single entry per 
transaction, means to say, if a table has got 100000 partitions, it does 100000 
queries to database before showing the result. Instead, it should have done one 
bulk query fetching all the partition details at once. 

Example:
In our database we are having 3500 partitioned tables, with total of 950000 
partitions. Every day we get hit on 90% of tables. And 60% of the users say, 
the queries takes its own sweet time to return, even though we kept the 
metastore level performance(MySQL) at 2miliSec to 20 miliSec, the query time at 
hive is always varies between few minutes to few hours. Even simple USE 
statement takes up to 5 min in some cases. (new thing I am bringing it up, as 
it took some time for me to confirm this) 

6. Dead locks happening on hive tables, during normal operations, and quite a 
number of rollbacks happening at database side. What I noticed is the 
transaction serialization issue, which is issue a delete before an update, see 
below for the detail on the dead lock. With upgrade to mysql 5.5, we are not 
seeing it not so frequently now.

One nit, Cascading for InnoDB is a great feature, since Hive doesn't support 
MySQL's binlog replication (Transaction level 'READ-COMMITTED' in InnoDB is not 
safe for binlog mode 'STATEMENT'). I was digging a bit deeper, that could be 
avoided by setting "SET GLOBAL binlog_format = 'ROW';: inside of MySQL. But 
this needs to be confirmed first.

I raise the FR up.
                
> Add CASCADING to MySQL's InnoDB schema
> --------------------------------------
>
>                 Key: HIVE-3463
>                 URL: https://issues.apache.org/jira/browse/HIVE-3463
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 0.9.0
>            Reporter: Alexander Alten-Lorenz
>            Priority: Minor
>
> Cascading could help to cleanup the tables when a FK is deleted.
> http://dev.mysql.com/doc/refman/5.5/en/innodb-foreign-key-constraints.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to