What does your data source structure look like? Can’t you release it at the end of the build scan method?
What technology is used in the transactional data endpoint? > Am 24.05.2019 um 15:36 schrieb Abhishek Somani <abhisheksoman...@gmail.com>: > > Hi experts, > > I am trying to create a custom Spark Datasource(v1) to read from a > transactional data endpoint, and I need to acquire a lock with the endpoint > before fetching data and release the lock after reading. Note that the lock > acquisition and release needs to happen in the Driver JVM. > > I have created a custom RDD for this purpose, and tried acquiring the lock in > MyRDD.getPartitions(), and releasing the lock at the end of the job by > registering a QueryExecutionListener. > > Now as I have learnt, this is not the right approach as the RDD can get > reused on further actions WITHOUT calling getPartitions() again(as the > partitions of an RDD get cached). For example, if someone calls > Dataset.collect() twice, the first time MyRDD.getPartitions() will get > invoked, I will acquire a lock and release the lock at the end. However the > second time collect() is called, getPartitions will NOT be called again as > the RDD would be reused and the partitions would have gotten cached in the > RDD. > > Can someone advice me on where would be the right places to acquire and > release a lock with my data endpoint in this scenario. > > Thanks a lot, > Abhishek Somani --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org