[jira] [Created] (CALCITE-5565) Implement BigQuery LOG function
Tanner Clary created CALCITE-5565: - Summary: Implement BigQuery LOG function Key: CALCITE-5565 URL: https://issues.apache.org/jira/browse/CALCITE-5565 Project: Calcite Issue Type: Task Reporter: Tanner Clary Assignee: Tanner Clary Calcite currently supports the {{LN}} and {{LOG10}} function. BigQuery offers the {{LOG}} function which allows for the base of the logarithm to be specified as a second argument. If the second argument is not provided, the base is defaulted to e (making it identical to the {{LN}} function. Example: {{LOG(64, 10)}} would return {{2}}. Example 2: {{LOG(10)}} would return the same as {{LN(10)}}. [BigQuery docs|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#log] -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: Future of the Spatial Type Extension
> On Mar 6, 2023, at 3:37 PM, Bertil Chapuis wrote: > > Thank you for your answers and for the pointers. > >> PS Regarding which specification we choose to implement. The four principles >> you outline sound good to me. It’s always better to follow the standard. If >> leading implementations (e.g. PostGIS and H2GIS) diverge from the standard, >> we can make a note, and possibly support them as secondary implementations. >> >> Sometimes there are compatible extensions. E.g. the OpenGIS function has two >> arguments, and the PostGIS version accepts two or three arguments. If >> someone calls Calcite's implementation of the function with three arguments, >> it is reasonable that they get the PostGIS behavior. > > My feeling is that spatial is a rather small niche and that developers do not > necessarily seek compatibility with Postgis or H2GIS. Personally, I almost > always have to refer to the documentation of the RDBMS when writing queries > that use spatial functions and I have few expectations regarding their > behaviour beyond what is stated in the documentation. In this regard, I feel > that a good standalone documentation is preferable to a documentation with > references to other implementations that may differ in subtle ways. It is a truism that the majority of SQL is generated by programs, not people. Those programs have already been written for industry-standard dialects (e.g. Oracle, Postgres) and therefore the best semantics is to be consistent with those standards. For geospatial SQL, I suspect that a somewhat higher percentage of queries are written by people. Still, the principle of least surprise dictates that we should comply with the de facto standard unless there are compelling reasons not to. If we depart from the standard for particular functions, let’s do it consciously, explicitly, and rarely. Calcite does not currently have detailed documentation. We have benefited hugely from products that do, such as Postgres and H2GIS, because our users can safely assume that our semantics is consistent with the other products. Documenting our functions exhaustively - say, moving from one page for all functions to one page per function - would be a huge endeavor and I can’t see it happening with the current community. The best we could reasonably achieve would be separate pages for a very small number of exceptional functions. > >>> Similar issues have come up with non-GIS functions. For example, the >>> DATEDIFF function [1]. Snowflake and MSSQL have ‘DATEDIFF(timeUnit, >>> datetime, datetime2)’, whereas MySQL has ‘DATEDIFF(date, date2)’. We >>> document which specification we implement, and potentially we could >>> implement both specifications, as either functions with different names, or >>> by selectively enabling the behavior that the user wants. >>> >>> The most important thing is to document which specification we are >>> implementing (when there is ambiguity). Then if someone would prefer the >>> other specification, they can contribute an implementation, and we can >>> selectively enable it. > > Do you think we should eventually add a @LibraryOperator for spatial type > functions (i.e. OpenGIS, Postgis, H2GIS)? There is still a TODO (prior to the > changes I made) in the SpatialTypeFunctions class asking the following > question: Should we create aliases for functions in upper-case [1]? My > current understanding of this question is that a possibility may be to have a > method named “buffer” in the class and an alias named “ST_Buffer” registered > somewhere else. Appart from resolving lots of sonar code smells related to > method names, this approach would allow to more easily introduce new methods > that don’t have ST_ counterparts in other RDBMS. It would be good to move all spatial functions out of the “core” module at some point. If we did that, it would probably make sense to declare the geospatial functions in a class with similar structure to SqlLibraryOperators so that they can be selectively enabled. At this point I don’t see much benefit for adding separate libraries for OpenGIS, Postgis, H2GIS. But that might change. As far as I can tell (correct me if I am wrong) they are mostly the same functions, with slightly different specifications. If so, a ‘compliance’ flag might be more appropriate. I wouldn’t worry about code smells. Something done for a good reason isn’t a code smell. It is a false negative from a robot. > After merging CALCITE-5367 [2], I think we will have a relatively decent > support for spatial types in core. At this stage, I think it would be good to > see if the organisation of the code can be improved before investigating > support for more advanced spatial functions (e.g. 3D, ST_AsMVT, > ST_AsFlatGeobuf, etc.). Sounds good. Identify problems and propose solutions. Algebraic approaches - e.g. decomposing functions into smaller composable units, like how AVG decomposes into SUM and CO
Re: A new logo for avatica?
Oops, yes. I want people to see it as a standalone component. On Tue, Mar 7, 2023 at 8:38 AM Michael Mior wrote: > > > I don't want people to see it as a standalone component. > > Can I assume you meant that you *do* want it to be seen as a standalone > component? > > -- > Michael Mior > mm...@apache.org > > > On Mon, Mar 6, 2023 at 8:50 PM Julian Hyde wrote: > > > A logo would be great. > > > > I don't think the logo should be a variation of Calcite's logo. > > Although Avatica is developed by the Calcite community, I don't want > > people to see it as a standalone component. > > > > By the way, the name Avatica is inspired by the scientific name of the > > Barn spider, Araneus cavaticus. So, if people are looking for an > > inspiration for a logo, a spider might be a good place to start. > > > > Julian > > > > On Mon, Mar 6, 2023 at 12:33 AM Francis Chuang > > wrote: > > > > > > Calcite had a new logo designed in 2018/2019 [1] [2]. > > > > > > I was wondering if it would be possible to get one made for Avatica. As > > > Avatica is a subproject, perhaps its logo should be a variation of the > > > Calcite logo to indicate that? > > > > > > Francis > > > > > > [1] https://lists.apache.org/thread/px48o0o16h6t3l22gvxkoxpnozcwocfn > > > [2] https://lists.apache.org/thread/6pv73tdrox1lpf8ohz46gj78j4t3ko7s > >
Re: A new logo for avatica?
> I don't want people to see it as a standalone component. Can I assume you meant that you *do* want it to be seen as a standalone component? -- Michael Mior mm...@apache.org On Mon, Mar 6, 2023 at 8:50 PM Julian Hyde wrote: > A logo would be great. > > I don't think the logo should be a variation of Calcite's logo. > Although Avatica is developed by the Calcite community, I don't want > people to see it as a standalone component. > > By the way, the name Avatica is inspired by the scientific name of the > Barn spider, Araneus cavaticus. So, if people are looking for an > inspiration for a logo, a spider might be a good place to start. > > Julian > > On Mon, Mar 6, 2023 at 12:33 AM Francis Chuang > wrote: > > > > Calcite had a new logo designed in 2018/2019 [1] [2]. > > > > I was wondering if it would be possible to get one made for Avatica. As > > Avatica is a subproject, perhaps its logo should be a variation of the > > Calcite logo to indicate that? > > > > Francis > > > > [1] https://lists.apache.org/thread/px48o0o16h6t3l22gvxkoxpnozcwocfn > > [2] https://lists.apache.org/thread/6pv73tdrox1lpf8ohz46gj78j4t3ko7s >
[jira] [Created] (CALCITE-5564) Add 2-argument support for PERCENTILE_CONT/DISC
Tanner Clary created CALCITE-5564: - Summary: Add 2-argument support for PERCENTILE_CONT/DISC Key: CALCITE-5564 URL: https://issues.apache.org/jira/browse/CALCITE-5564 Project: Calcite Issue Type: Improvement Reporter: Tanner Clary Assignee: Tanner Clary Calcite currently has implementations for the {{PERCENTILE_CONT}} and {{PERCENTILE_DISC}} functions. Their syntax may be found [here|https://learn.microsoft.com/en-us/sql/t-sql/functions/percentile-cont-transact-sql?view=sql-server-ver16]. BigQuery offers these functions as well, but the syntax is slightly different, and may be found [here|https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#percentile_cont]. The main difference is that instead of using a {{WITHIN GROUP}} clause, the array is passed in directly as the first argument to the function. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CALCITE-5563) Add a break to the inner loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a matching Relsubset from its parent input
asdfgh19 created CALCITE-5563: - Summary: Add a break to the inner loop of RelSubset#getParents and RelSubset#getParentSubsets after we find a matching Relsubset from its parent input Key: CALCITE-5563 URL: https://issues.apache.org/jira/browse/CALCITE-5563 Project: Calcite Issue Type: Improvement Components: core Reporter: asdfgh19 Assignee: asdfgh19 Fix For: 1.34.0 {code:java} /** * Returns the collection of RelNodes one of whose inputs is in this * subset. */ Set getParents() { final Set list = new LinkedHashSet<>(); for (RelNode parent : set.getParentRels()) { for (RelSubset rel : inputSubsets(parent)) { // see usage of this method in propagateCostImprovements0() if (rel == this) { list.add(parent); break; } } } return list; } /** * Returns the collection of distinct subsets that contain a RelNode one * of whose inputs is in this subset. */ Set getParentSubsets(VolcanoPlanner planner) { final Set list = new LinkedHashSet<>(); for (RelNode parent : set.getParentRels()) { for (RelSubset rel : inputSubsets(parent)) { if (rel.set == set && rel.getTraitSet().equals(traitSet)) { list.add(planner.getSubsetNonNull(parent)); break; } } } return list; }{code} Once we have found a matching Relsubset from its parent input, we can immediately end the inner loop. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [QUESTION]: Bug Fix Release
Will do. Sent from my iPhone > On Mar 6, 2023, at 17:12, Stamatis Zampetakis wrote: > > Hey Charles, > > Please test Drill with the latest calcite-1.34.0-SNAPSHOT [1] and if all is > good on your end I will prepare an RC for vote. > > Best, > Stamatis > > [1] > https://repository.apache.org/content/groups/snapshots/org/apache/calcite/calcite-core/1.34.0-SNAPSHOT/ > >> On Sun, Mar 5, 2023 at 7:16 PM Charles Givre wrote: >> >> Julian, >> Now that Drill is on main Calcite instead of the fork, I'll commit that >> the Drill community will do our best to try Drill with the RC candidates to >> see if we can catch issues during the release cycle. >> Thanks, >> -- C >> >> On Mar 5, 2023, at 12:20 PM, Julian Hyde wrote: >>> >>> It was indeed a regression, but it didn’t break any of Calcite’s tests >> and no one spoke up during the release vote. Mistakes are expensive to fix >> after a release, cheaper during the release vote, and cheapest of all if >> found by the test suite. >>> On Mar 5, 2023, at 6:33 AM, Charles Givre wrote: That would be great! Again I’m only asking because this was a >> regression. I really do appreciate it. Thanks! Sent from my iPhone > On Mar 4, 2023, at 13:59, Stamatis Zampetakis >> wrote: > > If we get the 1.34.0 out a bit sooner than usual I guess this will be >> good > enough for Drill. If the others agree I can try to prepare an RC during > next week. WDYT ? > > Best, > Stamatis > > >> On Sat, Mar 4, 2023, 6:13 PM Alessandro Solimando < >> alessandro.solima...@gmail.com> wrote: >> >> The second option Benchao mentions is what Hive currently does as >> well. >> >> Best regards, >> Alessandro >> On Sat 4 Mar 2023, 13:19 Benchao Li, wrote: Hi Charles, Thank for reaching out! IIRC, the idea of releasing bugfix version has been brought up in >> the >> past, >>> but I couldn't find the discussion (in Jira and dev ML). >>> >>> I'd like to share my understanding why we chose not to release bug >> fix >>> versions, please correct me if I'm wrong, >>> - Calcite has many bug fixes that span multi versions (even more >> that 10 >>> versions), then only keeping several (such as 3) bug fix releases >> does >> not >>> solve all these problems. >>> - Actually we usually do not distinguish too much between "bugfix" >> and >> "new >>> feature", so maintaining bug fix releases is not that easy. >>> - Calcite lacks reviewers and also release managers, only keeping >> linear >>> releasing in rhythm could save us some efforts. >>> >>> For regressions, I agree that this hurts downstream projects. For >> such >>> cases, there are two approaches come into my mind: >>> - We can release a new version quickly than usual. >>> - The projects that need the fix/feature before our next scheduled >> release, >>> they could copy these files into their projects, as we already did in >>> Flink[1]. They could remove these files once they adopt the new >> release >> of >>> Calcite. >>> >>> I hope this helps. >>> >>> [1] >>> >>> >> >> https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite >>> >>> >>> Charles Givre 于2023年3月2日周四 06:22写道: >>> Hello Calcite Devs, I wanted to thank everyone for the recent release of Calcite 1.33. >> I >> am the PMC Chair for Apache Drill and we just released Drill 1.21[0] >> which >>> is now using the latest version of Calcite instead of our 2-3 year old >> fork! However, we encountered a small issue with Calcite 1.33 that does >> not affect just Drill. Specifically, there was a regression which was >> caused by CALCITE-5447[1] which effectively broke the DATE_TRUNC function. >> The bugfix has been fixed and merged in CALCITE-5522[2]. In any event, given that this function is fairly important and the >>> lengthy release schedules of both Drill and Calcite, I wanted to ask whether >> the Calcite might consider doing a quick bugfix release with this and >> any >>> other regressions that may have popped up in 1.33 and have since been >> fixed. Thank you very much for all your work! Best, -- Charles [0]: >>> >> >> https://github.com/apache/drill-site/blob/master/blog/_posts/en/2023-02-21-drill-1.21.0-released.md [1]: https://issues.apache.org/jira/browse/CALCITE-5447 [2]: https://issues.apache.org/jira/browse/CALCITE-5522 >>> >>> >>> >>> -- >>> >>> Best, >>> Benchao Li >>> >> >> >>
Re: Insert data exceeding fixed length column.
I rewrite test with end-to-end usage (hope it`s correct) and put it into blank.iq (test is near) and found that it not follow the sql standard. Also probably someone can suggest how can i write test with dynamic params involved ? I found that Quidem has no appropriate functionality. Thanks ! create table fix_char_col (i int, c char(5)); (0 rows modified) !update insert into fix_char_col values (1, 'a'), (2, 'aa'), (3, ' aa'), (4, 'aa'), (5, ' aa'); (1 row modified) !update SELECT * FROM fix_char_col ORDER BY i; +---+---+ | I | C | +---+---+ | 1 | a | | 2 | a | | 3 | | | 4 | aa| | 5 | aa | +---+---+ (2 rows) !ok and as a result : expected: < +---+---+ < | I | C | < +---+---+ < | 1 | a | < | 2 | a | < | 3 | | < | 4 | aa| < | 5 | aa | < +---+---+ calcite output: +---+-+ | I | C | +---+-+ | 1 | a | | 2 | aa | | 3 | aa | | 4 | aa | | 5 | aa | +---+-+ Ok, thanks i understand what you are talking about, i will move the test and fill issue if it necessary. I agree that that query should throw when executed. If it doesn’t, you should log a bug. However, that test in SqlToRelConverterTest doesn’t prove that Calcite doesn’t throw at execute time. On Feb 8, 2023, at 11:20 PM, stanilovsky evgeny wrote: Hello, seems sql standard regulate situation when insertion (char, varchar) data exceeds fixed len column as : If the declared type T is fixed length string with length in characters L and the length in characters M of V is larger than L, then Case: a) If the right part M–L characters of V are all space`s, then the value of T is set to the first L characters of V. b) If one or more of the right part M–L characters of V are not space`s, then exception is raised. But fast test [1] in SqlToRelConverterTest highlights that calcite not satisfy of such a rule. I miss something or need to fill the issue ? Thanks ! [1] @Test void testInsert1() { String moreThanVarcharLimit = "a".repeat(30); final String sql = "insert into emp (empno, ename, job, mgr, hiredate,\n" + " sal, comm, deptno, slacker)\n" + "values(1, '" + moreThanVarcharLimit + "', 'job', 0,\n" + " timestamp '1970-01-01 00:00:00', 1, 1, 1, false)"; sql(sql).ok(); }
[jira] [Created] (CALCITE-5562) The result data is disordered when u tried to get data from an ordered set
Yuxin Wu created CALCITE-5562: - Summary: The result data is disordered when u tried to get data from an ordered set Key: CALCITE-5562 URL: https://issues.apache.org/jira/browse/CALCITE-5562 Project: Calcite Issue Type: Bug Reporter: Yuxin Wu *It's based on CSV-TEST-MODEL. The same as {{lex=MYSQL}}.* {quote}!connect jdbc:calcite:model=src/test/resources/mode {quote} Step1: You need to get a table named 'emps' with data as below; 0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps; +---+---+ | NAME | EMPID | +---+---+ | Fred | 30 | | Eric | 3 | | John | 2 | | Wilma | 1 | | Alice | 2 | +---+---+ Step2: Get an ordered set by using {_}{color:#172b4d}*{{select name, empid from emps order by empid}}*{color}{_}; 0: jdbc:calcite:model=src/test/resources/mode> select name, empid from emps order by empid; +---+---+ | NAME | EMPID | +---+---+ | Wilma | 1 | | John | 2 | | Alice | 2 | | Eric | 3 | | Fred | 30 | +---+---+ Step3: Using the statement in step2 as a sub query like this: 0: jdbc:calcite:model=src/test/resources/mode> select * from (select name, empid from emps order by empid); +---+---+ | NAME | EMPID | +---+---+ | Fred | 30 | | Eric | 3 | | John | 2 | | Wilma | 1 | | Alice | 2 | +---+---+ *THE QUESTION IS:* *The result data is disordered while it's supposed to be the same as it is in step2.* -- This message was sent by Atlassian Jira (v8.20.10#820010)