bolkedebruin commented on PR #34729:
URL: https://github.com/apache/airflow/pull/34729#issuecomment-1748698174

   > What I’m envisioning is something like
   > 
   > ```python
   > warehouse_mnt = afs.mount("s3://warehouse")  # Can have conn_id too, it’s 
orthogontal.
   > 
   > @dag
   > def my_dag:
   > 
   >   @task
   >   def load_file(src):
   >      with afs.open(src) as f:
   >        f.read()
   > 
   >   load_file(warehouse_mnt / "my_data.csv")
   > ```
   > 
   > instead of exposing the mount to the user, we encapsulate the data inside 
the Mount object and expose a Path-like interface to let the user operate on it 
directly. You can work with the mount directly as well, either by passing a 
mount point explicitly to `mount` or by accessing `mnt.mount_location` (or 
whatever, returns the location as as string) and work with that.
   > 
   
   I think I like that. If we can use both patterns that would be pretty cool. 
   
   > The Dataset part I’m thinking now is pretty simple, just make the Mount 
object inherit from Dataset (or _is_ Dataset?) so that object can be used for 
both purposes without duplicating the URL if you need that. Not that useful but 
the two are really the same idea (a reference to some resource) that I feel 
shouldn’t be two things.
   
   Not entirely sure about this. To me, for now, both are quite different. A 
dataset points to data and a mount provides an interface that allows you to 
manipulate file like objects. So not really a reference to a resource imho. But 
maybe I am seeing that wrongly. If you have an example how you think that would 
work on the user side it would help.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to