Re: Custom datasource: when acquire and release a lock?

Jörn Franke Sun, 26 May 2019 22:14:30 -0700

What does your data source structure look like?
Can’t you release it at the end of the build scan method?


What technology is used in the transactional data endpoint?


> Am 24.05.2019 um 15:36 schrieb Abhishek Somani <abhisheksoman...@gmail.com>:
> 
> Hi experts,
> 
> I am trying to create a custom Spark Datasource(v1) to read from a 
> transactional data endpoint, and I need to acquire a lock with the endpoint 
> before fetching data and release the lock after reading. Note that the lock 
> acquisition and release needs to happen in the Driver JVM.
> 
> I have created a custom RDD for this purpose, and tried acquiring the lock in 
> MyRDD.getPartitions(), and releasing the lock at the end of the job by 
> registering a QueryExecutionListener. 
> 
> Now as I have learnt, this is not the right approach as the RDD can get 
> reused on further actions WITHOUT calling getPartitions() again(as the 
> partitions of an RDD get cached). For example, if someone calls 
> Dataset.collect() twice, the first time MyRDD.getPartitions() will get 
> invoked, I will acquire a lock and release the lock at the end. However the 
> second time collect() is called, getPartitions will NOT be called again as 
> the RDD would be reused and the partitions would have gotten cached in the 
> RDD. 
> 
> Can someone advice me on where would be the right places to acquire and 
> release a lock with my data endpoint in this scenario.
> 
> Thanks a lot,
> Abhishek Somani

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Custom datasource: when acquire and release a lock?

Reply via email to