Antonio-RiveroMartnez commented on PR #27386: URL: https://github.com/apache/superset/pull/27386#issuecomment-1977412955
> > > > time_offsets joins DataFrames which are not where you would need to apply the join, would you? You need the join to happen at query pre execution level, not when you get the results of individual queries. `time_offset` works when you have the total data at your disposal like for example when you're working with other vizs but not for example when working with paginated data like in tables. > > > > > > > > > Okay, Let's narrow the scope of the discussion, I'm not sure what join you are talking about. The goal of this PR to do one thing that build comparison time range in frontend rather than backend. So, let' discussion around this PR. > > > > > > Yes, totally fine with that, just giving you context of why/when the query object needs to identify something "new" for time comparisons because again, you not always are able to compare using multiple queries sent from the frontend. Meaning the backend would be generating those ranges too, so that'w why I'm not worried but just pointing out the fact that some of the code deleted here would be actually required to handle those cases and that instead of having this in two places "front" and "back" I would rather having it in just one place. > > If it's easier to resonate, we can move ahead with these changes the moment CI passes and when I'm adjusting the other PR I will point out and reintroduce the necessary code and we can discuss there. > > Thanks as always. > > Thanks for your reply, I hope you could provide some examples for why/when/what is the exact `JOIN` in the your [PR](https://github.com/apache/superset/pull/27355), but may be current PR is not a good place to talk about `join`. > > Theoretically, the frontend and backend build time range are exactly the same because all the time range will parse by the pyparser. > > I think we might need to think some new way to address like this https://github.com/apache/superset/pull/27355/files#diff-af5e15a55ff9c81f441eb6b9b10dae13b2ee17fec614d3e3ad0057ed0dc814aeR1434, I don't think such codes is maintable, but we can discussion in your PR. I tagged you in that other PR already if you want to discuss about the `JOIN`. But again, IMHO the backend still would need to build these ranges because as I said, it's not always a good idea/or even valid to have separate `queryObjects` to build the comparison. Meaning, the "frontend" doing it like in here only address one type of comparison scenario. While having it on the backend might serve all scenarios, or at least ease reusability for when needed. TLDR; The code you're removing here will be needed after and reintroduced again very shortly so I see no point in doing it the first time. I'm going to try to explain why building in the frontend is only "good" for one time of comparisons now: Imagine your queries are like this is like this: ``` Query A: a metric from 1900-01-01 to 1950-01-01 Results: 1000 QueryB: a metric from 1950-01-01 to 2000-01-01 Results: 100 ``` Here having the frontend sending two queries is OK and totally fine like you have in this PR. Now, what's the limitation of this approach? Let's imagine another type of queries for another type of vizs. For example, Table Chart. Same exercise as before: ``` Query A: a list of users that have bought the more items along with the num of items from 1900-01-01 to 1950-01-01 ordered by # items bought desc Results: User1: 100, User2: 90, User3:80 Query B: a list of users that have bought the more items along with the num of items from 1950-01-01 to 2000-01-01 ordered by # items bought desc Results: User4:95, User1:70, User2:65 ``` In that scenario it's also fine to build two queries in the frontend with individual time_ranges and process each queryObject individually to run the metrics etc. Again, just fine with your PR here. Limitation? You can paginate a result set in Table chart (or any other chart that does the same or that would do the same later on). If you paginate using again just the frontend with isolated queries like you have here in this PR you would have something like: ``` Query A: a list of users that have bought the more items along with the num of items from 1900-01-01 to 1950-01-01 ordered by # items bought desc LIMIT 2 Results: User1: 100, User2: 90, User3:80 Query B: a list of users that have bought the more items along with the num of items from 1950-01-01 to 2000-01-01 ordered by # items bought desc LIMIT2 Results: User4:95, User1:70 ``` When creating the comparison metrics you would assume "User2 in query A has 90 items and it's not present in the result set of query B so it's 0" which is as you can imagine, it's invalid. So, how to solve that? We could produce the queryObject needed in the backend itself (with the `JOIN` we can discuss in the other PR) and let the backend build the time ranges needed when constructing the queries. But yet again why not just two queries from the frontend with different ranges? 1. It's redundant (as per the definition of redundant) to send you two queryObjects where we can produce both with just one: My mistake because I used Mixed time series as base when building the BigNumber with time comparison, so I introduced that pattern there. 2. Data might have changed between execution time of each query, which may affect your results with inconsistent data. Hopefully this little exercise helped to clarify my point of view on why we should NOT bring that time_rage building to the frontend only (if not at all), and also can help to realize why time_offsets are not usable here neither. Just want to be clear of the limitation and the fact that solving the comparison for good like in the other PR would require the backend to process these ranges at what's more adding an optional in the extras or similar in the queryObject. But if others are happy with this we can move forward acknowledging what I stated here. Thanks as always @zhaoyongjie for the help and helping to bring more robust solutions whenever possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: notifications-unsubscr...@superset.apache.org For additional commands, e-mail: notifications-h...@superset.apache.org