xushiyan commented on issue #6808: URL: https://github.com/apache/hudi/issues/6808#issuecomment-1310314615
@schlichtanders the issue is rooted in ``` "--conf spark.hadoop.javax.jdo.option.ConnectionURL='jdbc:derby:memory:databaseName=metastore_db;create=true'", # noqa ``` where `memory` is set as subsubprotocol, which means it won't persist any data. You should leave it empty like `jdbc:derby:databaseName=metastore_db;create=true` so it will use the default `directory` mode which persists to file system. see https://db.apache.org/derby/docs/10.14/ref/rrefjdbc37352.html Another note, the embedded driver has limitation where only 1 connection can stay open with that database, hence if you run a spark-shell with sample code like below to perform hive-sync, you'll run into `Another instance of Derby may have already booted the database` https://github.com/apache/hudi/blob/6508b11d7c1c1e4cb22aac86f9977cd951a91c9b/packaging/bundle-validation/spark_hadoop_mr/write.scala So it really depends on how you setup the unit tests. For functional tests which usually involves some local servers and multiple processes, client driver works better, and it's much more lightweight than postgres. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org