subject:"DSv2 reader lifecycle"

Re: DSv2 reader lifecycle

2019-11-06 Thread Andrew Melo

Hi Ryan, Thanks for the pointers On Thu, Nov 7, 2019 at 8:13 AM Ryan Blue wrote: > Hi Andrew, > > This is expected behavior for DSv2 in 2.4. A separate reader is configured > for each operation because the configuration will change. A count, for > example, doesn't need to project any columns,

Re: DSv2 reader lifecycle

2019-11-06 Thread Ryan Blue

Hi Andrew, This is expected behavior for DSv2 in 2.4. A separate reader is configured for each operation because the configuration will change. A count, for example, doesn't need to project any columns, but a count distinct will. Similarly, if your read has different filters we need to apply

DSv2 reader lifecycle

2019-11-05 Thread Andrew Melo

Hello, During testing of our DSv2 implementation (on 2.4.3 FWIW), it appears that our DataSourceReader is being instantiated multiple times for the same dataframe. For example, the following snippet Dataset df = spark .read()