Re: DSv2 reader lifecycle

2019-11-06 Thread Andrew Melo
Hi Ryan, Thanks for the pointers On Thu, Nov 7, 2019 at 8:13 AM Ryan Blue wrote: > Hi Andrew, > > This is expected behavior for DSv2 in 2.4. A separate reader is configured > for each operation because the configuration will change. A count, for > example, doesn't need to project any columns,

Re: DSv2 reader lifecycle

2019-11-06 Thread Ryan Blue
Hi Andrew, This is expected behavior for DSv2 in 2.4. A separate reader is configured for each operation because the configuration will change. A count, for example, doesn't need to project any columns, but a count distinct will. Similarly, if your read has different filters we need to apply

DSv2 reader lifecycle

2019-11-05 Thread Andrew Melo
Hello, During testing of our DSv2 implementation (on 2.4.3 FWIW), it appears that our DataSourceReader is being instantiated multiple times for the same dataframe. For example, the following snippet Dataset df = spark .read()