[GitHub] [hudi] xushiyan commented on issue #6808: [SUPPORT] Cannot sync to spark embedded derby hive meta store (the default one)

GitBox Thu, 10 Nov 2022 05:56:12 -0800


xushiyan commented on issue #6808:
URL: https://github.com/apache/hudi/issues/6808#issuecomment-1310314615


   @schlichtanders the issue is rooted in 
   
   ```
       "--conf 
spark.hadoop.javax.jdo.option.ConnectionURL='jdbc:derby:memory:databaseName=metastore_db;create=true'",
  # noqa
   ```
   
   where `memory` is set as subsubprotocol, which means it won't persist any 
data. You should leave it empty like 
`jdbc:derby:databaseName=metastore_db;create=true` so it will use the default 
`directory` mode which persists to file system. see 
https://db.apache.org/derby/docs/10.14/ref/rrefjdbc37352.html
   
   
   Another note, the embedded driver has limitation where only 1 connection can 
stay open with that database, hence if you run a spark-shell with sample code 
like below to perform hive-sync, you'll run into `Another instance of Derby may 
have already booted the database`
   
   
https://github.com/apache/hudi/blob/6508b11d7c1c1e4cb22aac86f9977cd951a91c9b/packaging/bundle-validation/spark_hadoop_mr/write.scala
   
   So it really depends on how you setup the unit tests. For functional tests 
which usually involves some local servers and multiple processes, client driver 
works better, and it's much more lightweight than postgres.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] xushiyan commented on issue #6808: [SUPPORT] Cannot sync to spark embedded derby hive meta store (the default one)

Reply via email to