Re: AQE effectiveness

2020-08-21 Thread Maryann Xue
t 10:54 PM Koert Kuipers wrote: > in our inhouse spark version i changed this without trouble and it didnt > even break any tests > just some minor changes in CacheManager it seems > > On Thu, Aug 20, 2020 at 1:12 PM Maryann Xue > wrote: > >> No. The worst case of enab

Re: AQE effectiveness

2020-08-20 Thread Maryann Xue
Koert Kuipers wrote: > i see. it makes sense to maximize re-use of cached data. i didn't realize > we have two potentially conflicting goals here. > > > On Thu, Aug 20, 2020 at 12:41 PM Maryann Xue > wrote: > >> AQE has been turned off deliberately so t

Re: AQE effectiveness

2020-08-20 Thread Maryann Xue
AQE has been turned off deliberately so that the `outputPartitioning` of the cached relation won't be changed by AQE partition coalescing or skew join optimization and the outputPartitioning can potentially be used by relations built on top of the cache. On a second thought, we should probably

Re: FYI: The evolution on `CHAR` type behavior

2020-03-17 Thread Maryann Xue
It would be super weird not to support VARCHAR as SQL engine. Banning CHAR is probably fine, as its semantics is genuinely confusing. We can issue a warning when parsing VARCHAR with a limit and suggest the usage of String instead. On Tue, Mar 17, 2020 at 10:27 AM Wenchen Fan wrote: > I agree

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Maryann Xue
ld be great. Was there one written up internally that you could >> share? >> >> On Wed, Oct 2, 2019 at 10:40 AM Maryann Xue >> wrote: >> >>> > It lists 3 cases for how a filter is built, but nothing about the >>> overall approac

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Maryann Xue
lly that you could > share? > > On Wed, Oct 2, 2019 at 10:40 AM Maryann Xue > wrote: > >> > It lists 3 cases for how a filter is built, but nothing about the >> overall approach or design that helps when trying to find out where it >> should be placed in the opt

Re: [DISCUSS] Out of order optimizer rules?

2019-10-02 Thread Maryann Xue
> It lists 3 cases for how a filter is built, but nothing about the overall approach or design that helps when trying to find out where it should be placed in the optimizer rules. The overall idea/design of DPP can be simply put as using the result of one side of the join to prune partitions of a

Re: Why hint does not traverse down subquery alias

2019-06-11 Thread Maryann Xue
BTW, I've actually just done some work on hint error handling, which might be helpful to what you mentioned: https://github.com/apache/spark/pull/24653 On Tue, Jun 11, 2019 at 8:04 PM Maryann Xue wrote: > I believe in the SQL standard, the original name cannot be accessed once > it’s a

Re: Why hint does not traverse down subquery alias

2019-06-11 Thread Maryann Xue
id = t2.id; > 2) select /*+ broadcast(t1) */ * from db.t1 a1 join db.t2 a2 on a1.id = > a2.id; > > 2) is the same as 1) but with aliases. Many users were surprised that 2) > stopped working. > > Thanks, > John > > > On Tue, Jun 11, 2019 at 4:38 PM Maryann Xue wrote: &g

Re: Why hint does not traverse down subquery alias

2019-06-11 Thread Maryann Xue
Yes, and for a good reason: the hint relation has exactly the same scope with other elements of queries/sub-queries. Suppose there's a query like: select /*+ broadcast(s) */ from (select a, b from s) t join (select a, b from t) s on t1.a = t2.b If we allowed the hint resolving to "cross" the

Re: [VOTE] SPIP: Identifiers for multi-catalog Spark

2019-02-18 Thread Maryann Xue
+1 On Mon, Feb 18, 2019 at 10:46 PM John Zhuge wrote: > +1 > > On Mon, Feb 18, 2019 at 8:43 PM Dongjoon Hyun > wrote: > >> +1 >> >> Dongjoon. >> >> On 2019/02/19 04:12:23, Wenchen Fan wrote: >> > +1 >> > >> > On Tue, Feb 19, 2019 at 10:50 AM Ryan Blue >> > wrote: >> > >> > > Hi everyone, >>