[ https://issues.apache.org/jira/browse/SPARK-31099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-31099: ---------------------------------- Parent: SPARK-30034 Issue Type: Sub-task (was: Improvement) > Create migration script for metastore_db > ---------------------------------------- > > Key: SPARK-31099 > URL: https://issues.apache.org/jira/browse/SPARK-31099 > Project: Spark > Issue Type: Sub-task > Components: SQL > Affects Versions: 3.0.0 > Reporter: Gengliang Wang > Priority: Major > > When an existing Derby database exists (in ./metastore_db) created by Hive > 1.2.x profile, it'll fail to upgrade itself to the Hive 2.3.x profile. > Repro steps: > 1. Build OSS or DBR master with SBT with -Phive-1.2 -Phive > -Phive-thriftserver. Make sure there's no existing ./metastore_db directory > in the repo. > 2. Run bin/spark-shell, and then spark.sql("show databases"). This will > populate the ./metastore_db directory, where the Derby-based metastore > database is hosted. This database is populated from Hive 1.2.x. > 3. Re-build OSS or DBR master with SBT with -Phive -Phive-thriftserver (drops > the Hive 1.2 profile, which makes it use the default Hive 2.3 profile) > 4. Repeat Step (2) above. This will trigger Hive 2.3.x to load the Derby > database created in Step (2), which triggers an upgrade step, and that's > where the following error will be reported. > 5. Delete the ./metastore_db and re-run Step (4). The error is no longer > reported. > {code:java} > 20/03/09 13:57:04 ERROR Datastore: Error thrown executing ALTER TABLE TBLS > ADD IS_REWRITE_ENABLED CHAR(1) NOT NULL CHECK (IS_REWRITE_ENABLED IN > ('Y','N')) : In an ALTER TABLE statement, the column 'IS_REWRITE_ENABLED' has > been specified as NOT NULL and either the DEFAULT clause was not specified or > was specified as DEFAULT NULL. > java.sql.SQLSyntaxErrorException: In an ALTER TABLE statement, the column > 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT > clause was not specified or was specified as DEFAULT NULL. > at > org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source) > at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown > Source) > at > org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown > Source) > at > org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown > Source) > at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown > Source) > at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown > Source) > at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) > at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source) > at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254) > at > org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879) > at > org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830) > at > org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:257) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3398) > at > org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896) > at > org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627) > at > org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672) > at > org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:425) > at > org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:865) > at > org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:347) > at org.datanucleus.store.query.Query.executeQuery(Query.java:1816) > at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744) > at org.datanucleus.store.query.Query.execute(Query.java:1726) > at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374) > at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:184) > at > org.apache.hadoop.hive.metastore.MetaStoreDirectSql.<init>(MetaStoreDirectSql.java:144) > at > org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:410) > at > org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342) > at > org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92) > at > org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164) > at > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) > at > org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652) > at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632) > at > org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894) > at > org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248) > at > org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231) > at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:388) > at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332) > at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312) > at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288) > at > org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:343) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:369) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$retryLocked$1(HiveClientImpl.scala:280) > at > org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:316) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:272) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:359) > at > org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:472) > at > org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1(PoolingHiveClient.scala:267) > at > org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1$adapted(PoolingHiveClient.scala:266) > at > org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:112) > at > org.apache.spark.sql.hive.client.PoolingHiveClient.databaseExists(PoolingHiveClient.scala:266) > at > org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:286) > at > scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23) > at > org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:145) > at > org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:106) > at > org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:144) > at > com.databricks.spark.util.NoopProgressReporter$.withStatusCode(ProgressReporter.scala:52) > at > com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:143) > at > org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:286) > at > org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:212) > at > org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:199) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:47) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:62) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:94) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:94) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:270) > at > org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:191) > at > org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45) > at > org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:231) > at > org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3612) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:115) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:246) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:100) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) > at > org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:76) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:196) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3610) > at org.apache.spark.sql.Dataset.<init>(Dataset.scala:231) > at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98) > at > org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:662) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:657) > at > $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24) > at > $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:28) > at > $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30) > at > $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32) > at $line50594476574342420814.$read$$iw$$iw$$iw$$iw.<init>(<console>:34) > at $line50594476574342420814.$read$$iw$$iw$$iw.<init>(<console>:36) > at $line50594476574342420814.$read$$iw$$iw.<init>(<console>:38) > at $line50594476574342420814.$read$$iw.<init>(<console>:40) > at $line50594476574342420814.$read.<init>(<console>:42) > at $line50594476574342420814.$read$.<init>(<console>:46) > at $line50594476574342420814.$read$.<clinit>(<console>) > at $line50594476574342420814.$eval$.$print$lzycompute(<console>:7) > at $line50594476574342420814.$eval$.$print(<console>:6) > at $line50594476574342420814.$eval.$print(<console>) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745) > at > scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021) > at > scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574) > at > scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41) > at > scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37) > at > scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) > at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570) > at > scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894) > at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762) > at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464) > at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239) > at org.apache.spark.repl.Main$.doMain(Main.scala:78) > at org.apache.spark.repl.Main$.main(Main.scala:58) > at org.apache.spark.repl.Main.main(Main.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: ERROR 42601: In an ALTER TABLE statement, the column > 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT > clause was not specified or was specified as DEFAULT NULL. > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at org.apache.derby.iapi.error.StandardException.newException(Unknown > Source) > at > org.apache.derby.impl.sql.compile.ColumnDefinitionNode.bindAndValidateDefault(Unknown > Source) > at org.apache.derby.impl.sql.compile.TableElementList.validate(Unknown > Source) > at > org.apache.derby.impl.sql.compile.AlterTableNode.bindStatement(Unknown Source) > at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source) > at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source) > at > org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown > Source) > ... 157 more > ... > 20/03/09 13:57:05 ERROR ObjectStore: Version information found in metastore > differs 1.2.0 from expected schema version 2.3.0. Schema verififcation is > disabled hive.metastore.schema.verification > 20/03/09 13:57:05 WARN ObjectStore: setMetaStoreSchemaVersion called but > recording version is disabled: version = 2.3.0, comment = Set by MetaStore > krismok@10.0.0.76 > {code} > It would be great if there is a migration script to upgrade the metastore_db > from the older version to new version. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org