Hello, I'm working on a DSv2 implementation with a userbase that is 100% pyspark based.
There's some interesting additional DS-level functionality I'd like to expose from the Java side to pyspark -- e.g. I/O metrics, which source site provided the data, etc... Does someone have an example of how to expose that to pyspark? We provide a python library for scientists to use, so I can also provide the python half, I just don't know where to begin. Part of the mental issue I'm having is that when a user does the following in pyspark: df = spark.read.format('edu.vanderbilt.accre.laurelin.Root') \ .option("tree", "tree") \ .load('small-flat-tree.root') They don't have a reference to any of my DS objects -- "df" is a DataFrame object, which I don't own. Does anyone have a tip? Thanks Andrew --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org