I have a different opinion on this. Usually, in production deployments (atleast whatever I am aware of), database is generally managed at the org/group level. Privacy policies like ACLs are usually done at database level and would need first level management by admins. With such a setup, its feels safer to let database creation done through separate process and let hudi hive sync only alter/create tables (current setup).
Open to hearing other's thoughts. Regards, Balaji.V On Wed, Nov 6, 2019 at 12:01 PM Bhavani Sudha <bhavanisud...@gmail.com> wrote: > +1 I think we should create db if it does not exist. > > On Tue, Nov 5, 2019 at 11:08 PM Pratyaksh Sharma <pratyaks...@gmail.com> > wrote: > > > Hi, > > > > While doing hive sync using HiveSyncTool, we first check if the target > > table exists in hive. If not, we try to create it. However in this flow, > if > > the database itself does not exist, we do not create the database before > > creating hive table, which results in exception like below - > > > > org.apache.hive.service.cli.HiveSQLException: Error while compiling > > statement: FAILED: SemanticException [Error 10072]: Database does not > > exist: test_db > > at > > > > > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:335) > > at > > > > > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:199) > > at > > > > > org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:262) > > at > org.apache.hive.service.cli.operation.Operation.run(Operation.java:247) > > at > > > > > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:575) > > at > > > > > org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:561) > > at sun.reflect.GeneratedMethodAccessor108.invoke(Unknown Source) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:498) > > at > > > > > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > > at > > > > > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > > at > > > > > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:422) > > at > > > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > > at > > > > > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > > at com.sun.proxy.$Proxy68.executeStatementAsync(Unknown Source) > > at > > > > > org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315) > > at > > > > > org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:566) > > at > > > > > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) > > at > > > > > org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) > > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > > at > > > > > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) > > at > > > > > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > > ... 3 more > > Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Database > does > > not exist: test_db > > at > > > > > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getDatabase(BaseSemanticAnalyzer.java:2154) > > > > > > So just wanted to discuss if we should try creating database first in > above > > case using query like - > > > > CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name> > > >