xushiyan commented on issue #6808:
URL: https://github.com/apache/hudi/issues/6808#issuecomment-1310314615

   @schlichtanders the issue is rooted in 
   
   ```
       "--conf 
spark.hadoop.javax.jdo.option.ConnectionURL='jdbc:derby:memory:databaseName=metastore_db;create=true'",
  # noqa
   ```
   
   where `memory` is set as subsubprotocol, which means it won't persist any 
data. You should leave it empty like 
`jdbc:derby:databaseName=metastore_db;create=true` so it will use the default 
`directory` mode which persists to file system. see 
https://db.apache.org/derby/docs/10.14/ref/rrefjdbc37352.html
   
   
   Another note, the embedded driver has limitation where only 1 connection can 
stay open with that database, hence if you run a spark-shell with sample code 
like below to perform hive-sync, you'll run into `Another instance of Derby may 
have already booted the database`
   
   
https://github.com/apache/hudi/blob/6508b11d7c1c1e4cb22aac86f9977cd951a91c9b/packaging/bundle-validation/spark_hadoop_mr/write.scala
   
   So it really depends on how you setup the unit tests. For functional tests 
which usually involves some local servers and multiple processes, client driver 
works better, and it's much more lightweight than postgres.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to