[ 
https://issues.apache.org/jira/browse/SPARK-31099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31099:
----------------------------------
        Parent: SPARK-30034
    Issue Type: Sub-task  (was: Improvement)

> Create migration script for metastore_db
> ----------------------------------------
>
>                 Key: SPARK-31099
>                 URL: https://issues.apache.org/jira/browse/SPARK-31099
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Gengliang Wang
>            Priority: Major
>
> When an existing Derby database exists (in ./metastore_db) created by Hive 
> 1.2.x profile, it'll fail to upgrade itself to the Hive 2.3.x profile.
> Repro steps:
> 1. Build OSS or DBR master with SBT with -Phive-1.2 -Phive 
> -Phive-thriftserver. Make sure there's no existing ./metastore_db directory 
> in the repo.
> 2. Run bin/spark-shell, and then spark.sql("show databases"). This will 
> populate the ./metastore_db directory, where the Derby-based metastore 
> database is hosted. This database is populated from Hive 1.2.x.
> 3. Re-build OSS or DBR master with SBT with -Phive -Phive-thriftserver (drops 
> the Hive 1.2 profile, which makes it use the default Hive 2.3 profile)
> 4. Repeat Step (2) above. This will trigger Hive 2.3.x to load the Derby 
> database created in Step (2), which triggers an upgrade step, and that's 
> where the following error will be reported.
> 5. Delete the ./metastore_db and re-run Step (4). The error is no longer 
> reported.
> {code:java}
> 20/03/09 13:57:04 ERROR Datastore: Error thrown executing ALTER TABLE TBLS 
> ADD IS_REWRITE_ENABLED CHAR(1) NOT NULL CHECK (IS_REWRITE_ENABLED IN 
> ('Y','N')) : In an ALTER TABLE statement, the column 'IS_REWRITE_ENABLED' has 
> been specified as NOT NULL and either the DEFAULT clause was not specified or 
> was specified as DEFAULT NULL.
> java.sql.SQLSyntaxErrorException: In an ALTER TABLE statement, the column 
> 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT 
> clause was not specified or was specified as DEFAULT NULL.
>       at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>       at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:879)
>       at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:830)
>       at 
> org.datanucleus.store.rdbms.table.TableImpl.validateColumns(TableImpl.java:257)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3398)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2896)
>       at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:119)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.manageClasses(RDBMSStoreManager.java:1627)
>       at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:672)
>       at 
> org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:425)
>       at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileQueryFull(JDOQLQuery.java:865)
>       at 
> org.datanucleus.store.rdbms.query.JDOQLQuery.compileInternal(JDOQLQuery.java:347)
>       at org.datanucleus.store.query.Query.executeQuery(Query.java:1816)
>       at org.datanucleus.store.query.Query.executeWithArray(Query.java:1744)
>       at org.datanucleus.store.query.Query.execute(Query.java:1726)
>       at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:374)
>       at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.ensureDbInit(MetaStoreDirectSql.java:184)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.<init>(MetaStoreDirectSql.java:144)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.initializeHelper(ObjectStore.java:410)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:342)
>       at 
> org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:303)
>       at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
>       at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58)
>       at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:628)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:594)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:588)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:655)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:431)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:79)
>       at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6902)
>       at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:164)
>       at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:70)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>       at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1707)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:83)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>       at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
>       at 
> org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
>       at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:388)
>       at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
>       at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
>       at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:343)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:369)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$retryLocked$1(HiveClientImpl.scala:280)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:316)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:272)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:359)
>       at 
> org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:472)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1(PoolingHiveClient.scala:267)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.$anonfun$databaseExists$1$adapted(PoolingHiveClient.scala:266)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:112)
>       at 
> org.apache.spark.sql.hive.client.PoolingHiveClient.databaseExists(PoolingHiveClient.scala:266)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:286)
>       at 
> scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$2(HiveExternalCatalog.scala:145)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.maybeSynchronized(HiveExternalCatalog.scala:106)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$withClient$1(HiveExternalCatalog.scala:144)
>       at 
> com.databricks.spark.util.NoopProgressReporter$.withStatusCode(ProgressReporter.scala:52)
>       at 
> com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:143)
>       at 
> org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:286)
>       at 
> org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:212)
>       at 
> org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:199)
>       at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:47)
>       at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:62)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:94)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:94)
>       at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:270)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:191)
>       at 
> org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:43)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:39)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:39)
>       at 
> org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:45)
>       at 
> org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:231)
>       at 
> org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3612)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:115)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:246)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:100)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:76)
>       at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:196)
>       at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3610)
>       at org.apache.spark.sql.Dataset.<init>(Dataset.scala:231)
>       at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:101)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:98)
>       at 
> org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:662)
>       at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
>       at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:657)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:28)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30)
>       at 
> $line50594476574342420814.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32)
>       at $line50594476574342420814.$read$$iw$$iw$$iw$$iw.<init>(<console>:34)
>       at $line50594476574342420814.$read$$iw$$iw$$iw.<init>(<console>:36)
>       at $line50594476574342420814.$read$$iw$$iw.<init>(<console>:38)
>       at $line50594476574342420814.$read$$iw.<init>(<console>:40)
>       at $line50594476574342420814.$read.<init>(<console>:42)
>       at $line50594476574342420814.$read$.<init>(<console>:46)
>       at $line50594476574342420814.$read$.<clinit>(<console>)
>       at $line50594476574342420814.$eval$.$print$lzycompute(<console>:7)
>       at $line50594476574342420814.$eval$.$print(<console>:6)
>       at $line50594476574342420814.$eval.$print(<console>)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
>       at 
> scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
>       at 
> scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
>       at 
> scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
>       at 
> scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
>       at 
> scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
>       at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
>       at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
>       at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
>       at 
> scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894)
>       at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762)
>       at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464)
>       at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485)
>       at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239)
>       at org.apache.spark.repl.Main$.doMain(Main.scala:78)
>       at org.apache.spark.repl.Main$.main(Main.scala:58)
>       at org.apache.spark.repl.Main.main(Main.scala)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:498)
>       at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>       at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
>       at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
>       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
>       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
>       at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: ERROR 42601: In an ALTER TABLE statement, the column 
> 'IS_REWRITE_ENABLED' has been specified as NOT NULL and either the DEFAULT 
> clause was not specified or was specified as DEFAULT NULL.
>       at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>       at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.ColumnDefinitionNode.bindAndValidateDefault(Unknown
>  Source)
>       at org.apache.derby.impl.sql.compile.TableElementList.validate(Unknown 
> Source)
>       at 
> org.apache.derby.impl.sql.compile.AlterTableNode.bindStatement(Unknown Source)
>       at org.apache.derby.impl.sql.GenericStatement.prepMinion(Unknown Source)
>       at org.apache.derby.impl.sql.GenericStatement.prepare(Unknown Source)
>       at 
> org.apache.derby.impl.sql.conn.GenericLanguageConnectionContext.prepareInternalStatement(Unknown
>  Source)
>       ... 157 more
> ...
> 20/03/09 13:57:05 ERROR ObjectStore: Version information found in metastore 
> differs 1.2.0 from expected schema version 2.3.0. Schema verififcation is 
> disabled hive.metastore.schema.verification
> 20/03/09 13:57:05 WARN ObjectStore: setMetaStoreSchemaVersion called but 
> recording version is disabled: version = 2.3.0, comment = Set by MetaStore 
> krismok@10.0.0.76
> {code}
> It would be great if there is a migration script to upgrade the metastore_db 
> from the older version to new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to