Hello again, Is it possible to grab a handle to the underlying DataSourceReader backing a DataFrame? I see that there's no nice way to add extra methods to Dataset<Row>, so being able to grab the DataSource backing the dataframe would be a good escape hatch.
Cheers Andrew On Mon, Sep 30, 2019 at 3:48 PM Andrew Melo <andrew.m...@gmail.com> wrote: > > Hello, > > I'm working on a DSv2 implementation with a userbase that is 100% pyspark > based. > > There's some interesting additional DS-level functionality I'd like to > expose from the Java side to pyspark -- e.g. I/O metrics, which source > site provided the data, etc... > > Does someone have an example of how to expose that to pyspark? We > provide a python library for scientists to use, so I can also provide > the python half, I just don't know where to begin. Part of the mental > issue I'm having is that when a user does the following in pyspark: > > df = spark.read.format('edu.vanderbilt.accre.laurelin.Root') \ > .option("tree", "tree") \ > .load('small-flat-tree.root') > > They don't have a reference to any of my DS objects -- "df" is a > DataFrame object, which I don't own. > > Does anyone have a tip? > Thanks > Andrew --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org