The "initialize trafodion, upgrade" executes in the child tdm_arkcmp process. The "initialize trafodion" from step d. executes in the grand child tdm_arkcmp process. It does not know that it is underneath an "initialize trafodion, upgrade". Which is probably a good thing from a modularity standpoint.
-----Original Message----- From: Roberta Marton [mailto:[email protected]] Sent: Friday, July 1, 2016 11:27 AM To: [email protected] Subject: RE: A chicken-and-egg issue with metadata upgrade Does the "initialize trafodion" performed in step 4 run in the master executor or in a compiler process? If in the master executer, are there any session attributes that indicates that we are doing an upgrade and what state we are executing? If so, could the privilege manager code check this state and just return instead of doing the upgrade in step 4. Roberta -----Original Message----- From: Dave Birdsall [mailto:[email protected]] Sent: Friday, July 1, 2016 10:55 AM To: [email protected] Subject: A chicken-and-egg issue with metadata upgrade Hi, This e-mail concerns https://github.com/apache/incubator-trafodion/pull/565. Roberta Marton pointed out to me (privately) that I needed to test “initialize trafodion, upgrade” for the case where privileges had been previously enabled. This morning I did this, and I ran into the following failure: >>initialize trafodion, upgrade; Metadata Upgrade: started Version Check: started Metadata need to be upgraded from Version 1.0.1 to 2.1.0. Upgrade needed for Catalogs, Privileges, Repository. Version Check: done Drop Old Metadata: started Drop Old Metadata: done Backup Current Metadata: started Backup Current Metadata: done Drop Current Metadata: started Drop Current Metadata: done Initialize New Metadata: started Restore from Old Metadata: started Restore from Old Metadata: done Drop Old Metadata: started Drop Old Metadata: done Metadata Upgrade: failed *** ERROR[1431] Object TRAFODION._PRIVMGR_MD_.OBJECT_PRIVILEGES exists in HBase. This could be due to a concurrent transactional ddl operation in progress on this table. --- SQL operation failed with errors. >> I debugged the failure and analyzed the cause. Here’s what happens. The “initialize trafodion, upgrade” logic is implemented in CmpSeabaseMDupgrade::executeSeabaseMDupgrade (sqlcomp/CmpSeabaseDDLupgrade.cpp). It implements a state machine that roughly speaking does this: a. Checks current version information, determining if the metadata needs upgrading and if so whether this software knows how to upgrade it b. Assuming the answers in step a are both “Yes”, uses HBase snapshots to make a copy of the existing metadata. The new tables created have “old” names. c. Drops the current metadata tables. d. Does an “INITIALIZE TRAFODION”. So a completely new set of tables is created. (There is no optimization of creating just the changed tables, for example. Keeps it simple.) e. Copies the data from the old metadata tables to the new ones. The DML to do this is pre-defined (where?) f. Customize the new metadata as needed. (examples?) g. Validate that the copy was successful. At the moment, this step is stubbed. h. Delete information about the old metadata tables from the new metadata tables using SQL DELETEs on object_uid. i. Using HBase drop, drop the old metadata tables. j. Update metadata views as needed. k. Upgrade privilege manager tables as needed. l. Update repository tables as needed. m. Update the version info in the VERSION table. n. Report success The state machine architecture was chosen so that status messages could be returned to the caller (sqlci or trafci say) as the steps are progressing. That is, the method CmpSeabaseMDupgrade::executeSeabaseMDupgrade returns to its caller whenever it has something interesting to report. The SQL executor then redrives CmpSeabaseMDupgrade::executeSeabaseMDupgrade with another call to move to the next step or substep. The thing to notice is steps d., e. and k. When we get to step d., we do an “initialize trafodion” DDL statement. This is a common recursive technique used in many places in DDL processing. For example, “drop schema cascade” under the covers executes “drop table” statements for any tables that exist in a schema. The “initialize trafodion” logic creates new metadata tables of the proper shape. But it also tries to create privilege manager tables. The privilege manager code in PrivMgrMDAdmin::initializeMetadata (sqlcomp/PrivMgrMD.cpp) looks in the metadata to see if the privilege manager tables exist, and if they don’t, tries to create them. And herein lies the problem. This logic is being executed as part of step d. in the state machine. At this point, new metadata tables have been created, but they are pristine. Knowledge about other tables that existed at the time of the upgrade has not been copied into them yet. That happens in step e. So, the privilege manager tables do in fact exist in HBase but not in the metadata. So, when PrivMgrMDAdmin::initializeMetadata tries to create the first one, it gets a 1431 error. Which causes the upgrade to fail. It looks like the state machine isn’t expecting to upgrade the privilege manager tables until step k. If we had gotten that far, I think it would have worked, because the metadata about existing privilege manager tables would have been copied in step e. to the new metadata tables. I am guessing this problem wasn’t detected before because there was no privilege manager upgrade logic at the time of the last Trafodion metadata upgrade. So this chicken-and-egg problem was not realized. How to fix it? I can think of a few approaches. 1. Change the “initialize trafodion” logic so that if it gets a 1431 error from PrivMgrMDAdmin::initializeMetadata, it simply ignores it and moves on. I explored this idea some, but don’t like it. The error is detected a few layers down from the CmpSeabaseMDupgrade::executeSeabaseMDupgrade logic. Seems like too much risk of there being detritus laying around that doesn’t get cleaned up. 2. Treat the privilege manager tables at the same time as the metadata tables. That is, in step b. make snapshots of the privilege manager tables, in step c. drop the current ones, and so on. This is fairly major surgery on the existing code. I’d like to find an easier way. 3. Add a new option to “initialize trafodion, minimal” that does the same thing as “initialize trafodion”, except it skips the privilege manager step. That allows the state machine to create or upgrade the privilege manager tables in step k, as the state machine expects. Change step d. to execute “initialize trafodion, minimal” instead of “initialize trafodion”. As of this moment, I favor this approach. As part of my testing, I unit tested repository upgrade. That works, but the design for repository upgrade seems like it is similar to the privilege manager code. I’m not quite sure why it works yet. And I will investigate that before deciding on an approach. But I wanted to throw this conundrum out there, to see what other folks think. Dave
