Re: DSv2 reader lifecycle

2019-11-06 Thread Andrew Melo
Hi Ryan, Thanks for the pointers On Thu, Nov 7, 2019 at 8:13 AM Ryan Blue wrote: > Hi Andrew, > > This is expected behavior for DSv2 in 2.4. A separate reader is configured > for each operation because the configuration will change. A count, for > example, doesn't need to project any columns,

Re: Build customized resource manager

2019-11-06 Thread Klaus Ma
Any suggestions? - Klaus On Mon, Nov 4, 2019 at 5:04 PM Klaus Ma wrote: > Hi team, > > AFAIK, we built k8s/yarn/mesos as resource manager; but I'd like to did > some enhancement to them, e.g. integrate with Volcano > in k8s. Is that possible to do > that

Re: DSv2 reader lifecycle

2019-11-06 Thread Ryan Blue
Hi Andrew, This is expected behavior for DSv2 in 2.4. A separate reader is configured for each operation because the configuration will change. A count, for example, doesn't need to project any columns, but a count distinct will. Similarly, if your read has different filters we need to apply

Re: [SPARK-29176][DISCUSS] Optimization should change join type to CROSS

2019-11-06 Thread Enrico Minack
So you say the optimized inner join with no conditions is also a valid query? Then I agree the optimizer is not breaking the query, hence it is not a bug. Enrico Am 06.11.19 um 15:53 schrieb Sean Owen: You asked for an inner join but it turned into a cross-join. This might be surprising,

Re: [SPARK-29176][DISCUSS] Optimization should change join type to CROSS

2019-11-06 Thread Sean Owen
You asked for an inner join but it turned into a cross-join. This might be surprising, hence the error you can disable. The query is not invalid in any case. It's just stopping you from doing something you may not meant to, and which may be expensive. However I think we've already changed the

[SPARK-29176][DISCUSS] Optimization should change join type to CROSS

2019-11-06 Thread Enrico Minack
Hi, I would like to discuss issue SPARK-29176 to see if this is considered a bug and if so, to sketch out a fix. In short, the issue is that a valid inner join with condition gets optimized so that no condition is left, but the type is still INNER. Then CheckCartesianProducts throws an

Re: [DISCUSS] Remove sorting of fields in PySpark SQL Row construction

2019-11-06 Thread Wenchen Fan
Sounds reasonable to me. We should make the behavior consistent within Spark. On Tue, Nov 5, 2019 at 6:29 AM Bryan Cutler wrote: > Currently, when a PySpark Row is created with keyword arguments, the > fields are sorted alphabetically. This has created a lot of confusion with > users because it