You’re proposing lazy consensus. We don’t do that for commits. (Yeah, we use a mixture of CTR and RTC. But adding lazy consensus muddies things too much, I think.)
For this change, I think you should do RTC, and require a +1 from at least one committer. Therefore, someone please review https://issues.apache.org/jira/browse/CALCITE-2651 <https://issues.apache.org/jira/browse/CALCITE-2651>. > On Nov 14, 2018, at 1:27 PM, Andrei Sereda <and...@sereda.cc> wrote: > > Thanks, all, for your input. > > Following this discussion, I have prepared a PR > https://github.com/apache/calcite/pull/919 > > Let me know if you agree with the approach. If there are no objections / > comments, I will commit it in a couple of days. > > Propagation of Statement.setFetchSize(int) into actual query will be done > in separate commit (I’m still checking DataContext usages) > > On Fri, Oct 26, 2018 at 10:45 AM Kevin Risden <kris...@apache.org> wrote: > >> #1 sounds reasonable. >> #2 will have to see what that means for actual aggregations. Sounds like it >> could work. >> #3 not sure what is meant here. I think fetch size is a hint and not >> required. I wouldn't default to 10k though. >> >> Kevin Risden >> >> >> On Thu, Oct 25, 2018 at 4:25 PM Andrei Sereda <and...@sereda.cc> wrote: >> >>> There is new Composite Aggregation >>> < >>> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html >>>> >>> (still in beta) which allows pagination with aggregates. It is available >> on >>> versions >= 5.6 (unfortunately we have to support 2.6+). >>> >>> To sum up: >>> >>> 1. For non-aggregate queries always use scrolling or search-after by >>> default (not pagination) . >>> 2. For aggregates use pagination. Either Terms Aggregation (with >>> partitioning) or Composite Aggregation. >>> 3. If possible Retrieve batch size from JDBC api (known as >>> Statement.setFetchSize(int)). If not default to 10k ? >>> >>> Does it sound reasonable ? >>> >>> >>> On Thu, Oct 25, 2018 at 3:43 PM >>> Kevin Risden >>> <kris...@apache.org> wrote: >>> >>>> What I was saying is that scrolling is the only way to ensure you get >>>> correct results coming back from Elasticsearch if you are going to do >>> more >>>> processing. >>>> >>>> >>>> >>> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html >>>> >>>> As long as you delete the scroll at the end its not the end of the >> world. >>>> There is no deep paging unless you use scrolling or a search after >> which >>> is >>>> basically the same idea as scrolling >>>> >>>> >>> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html >>>> . >>>> >>>> I would say that scrolling should be the default behavior and then >> should >>>> look at optimizing for aggregations, smaller result sets, etc. If you >>> look >>>> at scrolling as an optimization the default results won't be correct >> (ie: >>>> only returning 10 results when there are a lot more). For default >>>> pagination there is a limit of 10000 results anyway unless you use >>>> scrolling or search after. >>>> >>>> Kevin Risden >>>> >>>> >>>> On Thu, Oct 25, 2018 at 3:33 PM Andrei Sereda <and...@sereda.cc> >> wrote: >>>> >>>>> Hi Kevin, >>>>> >>>>> You suggest to use scrolling for all elastic queries ? Even when >> there >>>>> are predicates ? Some questions : >>>>> >>>>> 1) Scrolling has a runtime overhead for elastic cluster. Having it >>>>> enabled by default (against vendor recommendation) is risky. Does it >>>>> (not) cause issues in Solr ? >>>>> 2) Scrolling does not work with aggregates, which means I'll have to >>>>> fallback to pagination (for whole result set). >>>>> 3) When result set is small scrolling / pagination is not necessary. >>>>> >>>>> Having said that I don't see any way to automatically enable / >> disable >>>>> scrolling based on RelTree (pls let me know if there is one). >>>>> >>>>> You seem to agree with Stamatis that `select * from elastic` (1) >>>>> should return all documents (rows) not just 10 (ES default). If so, >>>>> one either has to use pagination or scrolling. >>>>> What about queries `select count(*), foo from elastic group by foo` >>>>> (2) (term aggregation). In this case only pagination works. If >> calcite >>>>> returns all rows for (1) it should do the same for (2). Should one >>>>> use pagination with aggregates (2) and scrolling for simple queries >>>>> (1) ? >>>>> What if user specifies a high `LIMIT` (eg 10M) should we batch the >>>> result ? >>>>> Should user explicitly enable scrolling (eg. JDBC api) ? If so, >> should >>>>> one throw an exception when scrolling and aggregation is used or >>>>> quietly default to pagination ? >>>>> >>>>> Thanks everybody for your input, it helps. >>>>> >>>>> Regards, >>>>> Andrei. >>>>> >>>>> >>>>> On Thu, Oct 25, 2018 at 2:08 PM >>>>> Kevin Risden >>>>> <kris...@apache.org> wrote: >>>>>> >>>>>>> >>>>>>> There is one more “issue”. Currently select * from elastic >> returns >>> at >>>>> most >>>>>>> 10 rows (in calcite). This is consistent with elastic behaviour >>> which >>>>>>> limits result set to 10 documents (unless size is specified). >>>>>> >>>>>> >>>>>> In Solr land for the Calcite integration it uses the /export >> handler >>> or >>>>>> streaming expressions when there is no limit specified (these are >>> just >>>>> like >>>>>> scrolling or called CursorMark in Solr). It falls back to /select >>> with >>>>> the >>>>>> default limit of 10 rows. >>>>>> >>>>>> I think that scroll should be the default almost always since it is >>> the >>>>>> only way to return all the results. Even with a limit = 100000 >> would >>>>> cause >>>>>> problems with the default query handler. Paging deep into the >> result >>>> set >>>>> is >>>>>> not something that Lucene (Elasticsearch or Solr) does well. >>>>>> >>>>>> >>>>>> Kevin Risden >>>>>> >>>>>> >>>>>> On Thu, Oct 25, 2018 at 1:31 PM Julian Hyde <jh...@apache.org> >>> wrote: >>>>>> >>>>>>> Do you need to generate a different plan (i.e. a different tree >> of >>>>>>> RelNodes) for scrolling vs non-scrolling? If so, it’s certainly >>>>>>> inconvenient that you don’t know until execute time whether they >>> want >>>>>>> scrolling. A possible solution would be to generate TWO plans - >> one >>>>>>> scrolling, one non-scrolling inside the prepared statement - and >>> pick >>>>> which >>>>>>> one based on the runtime context. >>>>>>> >>>>>>> Julian >>>>>>> >>>>>>>> On Oct 25, 2018, at 1:42 AM, Christian Beikov < >>>>>>> christian.bei...@gmail.com> wrote: >>>>>>>> >>>>>>>> Hey Andrei, >>>>>>>> >>>>>>>> I don't have an answer for how you can access these settings >> from >>>>> within >>>>>>> the adapter nor how one could do that via RelNodes but the >>> suggestion >>>>> to >>>>>>> use DataContext for that purpose sounds reasonable. Maybe someone >>>> else >>>>> has >>>>>>> an idea? >>>>>>>> >>>>>>>> Anyway, since these are settings that don't affect the general >>>>> semantics >>>>>>> of the query/statement and also usually require a special API to >> be >>>>> used, >>>>>>> I'd rather see these aspects not end up in the query string. >>>>>>>> >>>>>>>> Am 25.10.2018 um 02:15 schrieb Andrei Sereda: >>>>>>>>> Christian, >>>>>>>>> >>>>>>>>> I like TYPE_SCROLL_INSENSITIVE / fetchSize in >> PreparedStatement >>>>>>>>> generally but have some reservations (questions) : >>>>>>>>> >>>>>>>>> How to pass resultSetType / fetchSize from PreparedStatement >> to >>>>>>> RelNodes ? >>>>>>>>> What if user doesn’t use JDBC (eg. RelBuilders) ? >>>>>>>>> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov >>>>>>>>> <christian.bei...@gmail.com> wrote: >>>>>>>>>> In JDBC one can configure a fetch size which would reflect >> the >>>>> amount >>>>>>> of >>>>>>>>>> rows to be fetched initially, but also subsequently. >>>>>>>>>> >>>>>>> >>>>> >>>> >>> >> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int) >>>>>>>>>> >>>>>>>>>> According to what you are writing, ES behvior is what >>>>>>>>>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view >>>> that >>>>>>> isn't >>>>>>>>>> affected by changes. >>>>>>>>>> >>>>>>>>>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, >>> R3, >>>>> R4, >>>>>>>>>> ... and view R1, R2, then R3 is deleted and you fetch the >> next >>>>> rows, >>>>>>> you >>>>>>>>>> wouldn't see R3. >>>>>>>>>> >>>>>>>>>> According to the JDBC spec >>>>>>>>>> ( >>>>>>> >>>>> >>>> >>> >> https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int) >>>>>>>>>> ) you don't have to support all modes. Usually, user code >>> doesn't >>>>> use >>>>>>>>>> scrolling that much, but at least forward scrolling makes >>> sense. >>>>>>>>>> >>>>>>>>>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda: >>>>>>>>>>> Hi Julian, >>>>>>>>>>> >>>>>>>>>>> Scrolling (in elastic) does not only mean “open a cursor” >> but >>>> also >>>>>>> iterate >>>>>>>>>>> over consistent snapshot. From docs: >>>>>>>>>>> >>>>>>>>>>> The results that are returned from a scroll request reflect >>> the >>>>> state >>>>>>> of >>>>>>>>>>> the index at the time that the initial search request was >>> made, >>>>> like a >>>>>>>>>>> snapshot in time. Subsequent changes to documents (index, >>> update >>>>> or >>>>>>> delete) >>>>>>>>>>> will only affect later search requests. >>>>>>>>>>> >>>>>>>>>>> So pagination (fetch / offset) can’t exactly replicate this >>>>>>> functionality. >>>>>>>>>>> >>>>>>>>>>> The problem with scrolling (in elastic) is that it is >>> expensive >>>>> and >>>>>>> can’t >>>>>>>>>>> (shouldn’t) be enabled it by default. >>>>>>>>>>> >>>>>>>>>>> There is one more “issue”. Currently select * from elastic >>>>> returns at >>>>>>> most >>>>>>>>>>> 10 rows (in calcite). This is consistent with elastic >>> behaviour >>>>> which >>>>>>>>>>> limits result set to 10 documents (unless size is >> specified). >>>> When >>>>>>>>>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE >>>>>>>>>>> < >>>>>>> >>>>> >>>> >>> >> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE >>>>>>>> >>>>>>>>>>> or SQL hint) does it mean return whole elastic index ? I’m >> not >>>> at >>>>>>> ease with >>>>>>>>>>> returning different results based on hints or cursor >> settings. >>>>>>>>>>> >>>>>>>>>>> Andrei. >>>>>>>>>>> >>>>>>>>>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde < >>>>> jhyde.apa...@gmail.com> >>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> It seems to me that Elasticsearch scroll means return a >>> cursor >>>> - >>>>> a >>>>>>>>>>>> collection of rows that you iterate over, and you may not >>> read >>>>> all >>>>>>> of them. >>>>>>>>>>>> This is the default operation of JDBC. >>>>>>>>>>>> >>>>>>>>>>>> So, I guess we need to give the user a way to signal their >>>>> intent to >>>>>>> read >>>>>>>>>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and >>>>> ALL_ROWS >>>>>>>>>>>> hints[1] seem close to this. We would want the hints to be >>>> acted >>>>>>> upon by >>>>>>>>>>>> both the optimizer and the JDBC transport. >>>>>>>>>>>> >>>>>>>>>>>> Related is pagination. SQL has FETCH and OFFSET, which >> allow >>>> you >>>>> to >>>>>>>>>>>> retrieve different pieces of a large result set in separate >>>>>>> statements or >>>>>>>>>>>> (using query parameters) executions. It would be useful if >>> the >>>>>>> server could >>>>>>>>>>>> be given a hint to cache a statement across page requests. >>>>>>>>>>>> >>>>>>>>>>>> Julian >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> >>>>>>> >>>>> >>> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924 >>>>>>>>>>>> >>>>>>>>>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov < >>>>>>>>>>>> christian.bei...@gmail.com> wrote: >>>>>>>>>>>>> Hey, >>>>>>>>>>>>> >>>>>>>>>>>>> not sure if this should be an SQL keyword. JDBC specifies >>>>> various >>>>>>>>>>>> constants that can be used at statement creation time: >>>>>>>>>>>> >>>>> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html >>>>>>>>>>>>> Not sure though if or how these configurations are >>> accessible >>>>> for >>>>>>> data >>>>>>>>>>>> stores or dialects, but IMO using these would be the proper >>>> way. >>>>>>>>>>>>> Regards >>>>>>>>>>>>> >>>>>>>>>>>>> Christian >>>>>>>>>>>>> >>>>>>>>>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda: >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I was thinking about adding [scrolling functionality]( >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>> >>>>> >>>> >>> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html >>>>>>>>>>>> ) >>>>>>>>>>>>>> to elastic search adapter. Since scrolling has >>> non-negligible >>>>>>> effect on >>>>>>>>>>>> the >>>>>>>>>>>>>> cluster it should be selectively enabled on per query >>> basis. >>>>> So, >>>>>>> likely, >>>>>>>>>>>>>> user has to explicitly set "scroll flag" somewhere. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Most natural way seems in SQL. [Calcite sql grammar]( >>>>>>>>>>>>>> https://calcite.apache.org/docs/reference.html) has >>> `SCROLL` >>>>>>> keyword >>>>>>>>>>>>>> (unused to my knowledge). There were also discussions >> about >>>>> adding >>>>>>>>>>>> hints to >>>>>>>>>>>>>> Calcite. >>>>>>>>>>>>>> >>>>>>>>>>>>>> ### Examples >>>>>>>>>>>>>> ```sql >>>>>>>>>>>>>> -- special sql keyword ? >>>>>>>>>>>>>> SCROLL select * from elastic; >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- assuming hints are available in calcite >>>>>>>>>>>>>> /* HINT: scroll */ select * from elastic; >>>>>>>>>>>>>> ``` >>>>>>>>>>>>>> >>>>>>>>>>>>>> What people think about this use-case ? Are there better >>>> ideas >>>>> ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Andrei. >>>>>>>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>> >>