Hello, During testing of our DSv2 implementation (on 2.4.3 FWIW), it appears that our DataSourceReader is being instantiated multiple times for the same dataframe. For example, the following snippet
Dataset<Row> df = spark .read() .format("edu.vanderbilt.accre.laurelin.Root") .option("tree", "Events") .load("testdata/pristine/2018nanoaod1june2019.root"); Constructs edu.vanderbilt.accre.laurelin.Root twice and then calls createReader once (as an aside, this seems like a lot for 1000 columns? "CodeGenerator: Code generated in 8162.847517 ms") but then running operations on that dataframe (e.g. df.count()) calls createReader for each call, instead of holding the existing DataSourceReader. Is that the expected behavior? Because of the file format, it's quite expensive to deserialize all the various metadata, so I was holding the deserialized version in the DataSourceReader, but if Spark is repeatedly constructing new ones, then that doesn't help. If this is the expected behavior, how should I handle this as a consumer of the API? Thanks! Andrew