#1 sounds reasonable. #2 will have to see what that means for actual aggregations. Sounds like it could work. #3 not sure what is meant here. I think fetch size is a hint and not required. I wouldn't default to 10k though.
Kevin Risden On Thu, Oct 25, 2018 at 4:25 PM Andrei Sereda <and...@sereda.cc> wrote: > There is new Composite Aggregation > < > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html > > > (still in beta) which allows pagination with aggregates. It is available on > versions >= 5.6 (unfortunately we have to support 2.6+). > > To sum up: > > 1. For non-aggregate queries always use scrolling or search-after by > default (not pagination) . > 2. For aggregates use pagination. Either Terms Aggregation (with > partitioning) or Composite Aggregation. > 3. If possible Retrieve batch size from JDBC api (known as > Statement.setFetchSize(int)). If not default to 10k ? > > Does it sound reasonable ? > > > On Thu, Oct 25, 2018 at 3:43 PM > Kevin Risden > <kris...@apache.org> wrote: > > > What I was saying is that scrolling is the only way to ensure you get > > correct results coming back from Elasticsearch if you are going to do > more > > processing. > > > > > > > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html > > > > As long as you delete the scroll at the end its not the end of the world. > > There is no deep paging unless you use scrolling or a search after which > is > > basically the same idea as scrolling > > > > > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-after.html > > . > > > > I would say that scrolling should be the default behavior and then should > > look at optimizing for aggregations, smaller result sets, etc. If you > look > > at scrolling as an optimization the default results won't be correct (ie: > > only returning 10 results when there are a lot more). For default > > pagination there is a limit of 10000 results anyway unless you use > > scrolling or search after. > > > > Kevin Risden > > > > > > On Thu, Oct 25, 2018 at 3:33 PM Andrei Sereda <and...@sereda.cc> wrote: > > > > > Hi Kevin, > > > > > > You suggest to use scrolling for all elastic queries ? Even when there > > > are predicates ? Some questions : > > > > > > 1) Scrolling has a runtime overhead for elastic cluster. Having it > > > enabled by default (against vendor recommendation) is risky. Does it > > > (not) cause issues in Solr ? > > > 2) Scrolling does not work with aggregates, which means I'll have to > > > fallback to pagination (for whole result set). > > > 3) When result set is small scrolling / pagination is not necessary. > > > > > > Having said that I don't see any way to automatically enable / disable > > > scrolling based on RelTree (pls let me know if there is one). > > > > > > You seem to agree with Stamatis that `select * from elastic` (1) > > > should return all documents (rows) not just 10 (ES default). If so, > > > one either has to use pagination or scrolling. > > > What about queries `select count(*), foo from elastic group by foo` > > > (2) (term aggregation). In this case only pagination works. If calcite > > > returns all rows for (1) it should do the same for (2). Should one > > > use pagination with aggregates (2) and scrolling for simple queries > > > (1) ? > > > What if user specifies a high `LIMIT` (eg 10M) should we batch the > > result ? > > > Should user explicitly enable scrolling (eg. JDBC api) ? If so, should > > > one throw an exception when scrolling and aggregation is used or > > > quietly default to pagination ? > > > > > > Thanks everybody for your input, it helps. > > > > > > Regards, > > > Andrei. > > > > > > > > > On Thu, Oct 25, 2018 at 2:08 PM > > > Kevin Risden > > > <kris...@apache.org> wrote: > > > > > > > > > > > > > > There is one more “issue”. Currently select * from elastic returns > at > > > most > > > > > 10 rows (in calcite). This is consistent with elastic behaviour > which > > > > > limits result set to 10 documents (unless size is specified). > > > > > > > > > > > > In Solr land for the Calcite integration it uses the /export handler > or > > > > streaming expressions when there is no limit specified (these are > just > > > like > > > > scrolling or called CursorMark in Solr). It falls back to /select > with > > > the > > > > default limit of 10 rows. > > > > > > > > I think that scroll should be the default almost always since it is > the > > > > only way to return all the results. Even with a limit = 100000 would > > > cause > > > > problems with the default query handler. Paging deep into the result > > set > > > is > > > > not something that Lucene (Elasticsearch or Solr) does well. > > > > > > > > > > > > Kevin Risden > > > > > > > > > > > > On Thu, Oct 25, 2018 at 1:31 PM Julian Hyde <jh...@apache.org> > wrote: > > > > > > > > > Do you need to generate a different plan (i.e. a different tree of > > > > > RelNodes) for scrolling vs non-scrolling? If so, it’s certainly > > > > > inconvenient that you don’t know until execute time whether they > want > > > > > scrolling. A possible solution would be to generate TWO plans - one > > > > > scrolling, one non-scrolling inside the prepared statement - and > pick > > > which > > > > > one based on the runtime context. > > > > > > > > > > Julian > > > > > > > > > > > On Oct 25, 2018, at 1:42 AM, Christian Beikov < > > > > > christian.bei...@gmail.com> wrote: > > > > > > > > > > > > Hey Andrei, > > > > > > > > > > > > I don't have an answer for how you can access these settings from > > > within > > > > > the adapter nor how one could do that via RelNodes but the > suggestion > > > to > > > > > use DataContext for that purpose sounds reasonable. Maybe someone > > else > > > has > > > > > an idea? > > > > > > > > > > > > Anyway, since these are settings that don't affect the general > > > semantics > > > > > of the query/statement and also usually require a special API to be > > > used, > > > > > I'd rather see these aspects not end up in the query string. > > > > > > > > > > > > Am 25.10.2018 um 02:15 schrieb Andrei Sereda: > > > > > >> Christian, > > > > > >> > > > > > >> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement > > > > > >> generally but have some reservations (questions) : > > > > > >> > > > > > >> How to pass resultSetType / fetchSize from PreparedStatement to > > > > > RelNodes ? > > > > > >> What if user doesn’t use JDBC (eg. RelBuilders) ? > > > > > >> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov > > > > > >> <christian.bei...@gmail.com> wrote: > > > > > >>> In JDBC one can configure a fetch size which would reflect the > > > amount > > > > > of > > > > > >>> rows to be fetched initially, but also subsequently. > > > > > >>> > > > > > > > > > > > https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int) > > > > > >>> > > > > > >>> According to what you are writing, ES behvior is what > > > > > >>> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view > > that > > > > > isn't > > > > > >>> affected by changes. > > > > > >>> > > > > > >>> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, > R3, > > > R4, > > > > > >>> ... and view R1, R2, then R3 is deleted and you fetch the next > > > rows, > > > > > you > > > > > >>> wouldn't see R3. > > > > > >>> > > > > > >>> According to the JDBC spec > > > > > >>> ( > > > > > > > > > > > https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int) > > > > > >>> ) you don't have to support all modes. Usually, user code > doesn't > > > use > > > > > >>> scrolling that much, but at least forward scrolling makes > sense. > > > > > >>> > > > > > >>> Am 24.10.2018 um 21:38 schrieb Andrei Sereda: > > > > > >>>> Hi Julian, > > > > > >>>> > > > > > >>>> Scrolling (in elastic) does not only mean “open a cursor” but > > also > > > > > iterate > > > > > >>>> over consistent snapshot. From docs: > > > > > >>>> > > > > > >>>> The results that are returned from a scroll request reflect > the > > > state > > > > > of > > > > > >>>> the index at the time that the initial search request was > made, > > > like a > > > > > >>>> snapshot in time. Subsequent changes to documents (index, > update > > > or > > > > > delete) > > > > > >>>> will only affect later search requests. > > > > > >>>> > > > > > >>>> So pagination (fetch / offset) can’t exactly replicate this > > > > > functionality. > > > > > >>>> > > > > > >>>> The problem with scrolling (in elastic) is that it is > expensive > > > and > > > > > can’t > > > > > >>>> (shouldn’t) be enabled it by default. > > > > > >>>> > > > > > >>>> There is one more “issue”. Currently select * from elastic > > > returns at > > > > > most > > > > > >>>> 10 rows (in calcite). This is consistent with elastic > behaviour > > > which > > > > > >>>> limits result set to 10 documents (unless size is specified). > > When > > > > > >>>> returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE > > > > > >>>> < > > > > > > > > > > > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE > > > > > > > > > > > >>>> or SQL hint) does it mean return whole elastic index ? I’m not > > at > > > > > ease with > > > > > >>>> returning different results based on hints or cursor settings. > > > > > >>>> > > > > > >>>> Andrei. > > > > > >>>> > > > > > >>>> On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde < > > > jhyde.apa...@gmail.com> > > > > > wrote: > > > > > >>>> > > > > > >>>>> It seems to me that Elasticsearch scroll means return a > cursor > > - > > > a > > > > > >>>>> collection of rows that you iterate over, and you may not > read > > > all > > > > > of them. > > > > > >>>>> This is the default operation of JDBC. > > > > > >>>>> > > > > > >>>>> So, I guess we need to give the user a way to signal their > > > intent to > > > > > read > > > > > >>>>> all rows versus only the first few. Oracle’s FIRST_ROWS and > > > ALL_ROWS > > > > > >>>>> hints[1] seem close to this. We would want the hints to be > > acted > > > > > upon by > > > > > >>>>> both the optimizer and the JDBC transport. > > > > > >>>>> > > > > > >>>>> Related is pagination. SQL has FETCH and OFFSET, which allow > > you > > > to > > > > > >>>>> retrieve different pieces of a large result set in separate > > > > > statements or > > > > > >>>>> (using query parameters) executions. It would be useful if > the > > > > > server could > > > > > >>>>> be given a hint to cache a statement across page requests. > > > > > >>>>> > > > > > >>>>> Julian > > > > > >>>>> > > > > > >>>>> [1] > > > > > >>>>> > > > > > > > > > https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924 > > > > > >>>>> > > > > > >>>>>> On Oct 24, 2018, at 11:19 AM, Christian Beikov < > > > > > >>>>> christian.bei...@gmail.com> wrote: > > > > > >>>>>> Hey, > > > > > >>>>>> > > > > > >>>>>> not sure if this should be an SQL keyword. JDBC specifies > > > various > > > > > >>>>> constants that can be used at statement creation time: > > > > > >>>>> > > > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html > > > > > >>>>>> Not sure though if or how these configurations are > accessible > > > for > > > > > data > > > > > >>>>> stores or dialects, but IMO using these would be the proper > > way. > > > > > >>>>>> Regards > > > > > >>>>>> > > > > > >>>>>> Christian > > > > > >>>>>> > > > > > >>>>>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda: > > > > > >>>>>>> Hello, > > > > > >>>>>>> > > > > > >>>>>>> > > > > > >>>>>>> I was thinking about adding [scrolling functionality]( > > > > > >>>>>>> > > > > > >>>>> > > > > > > > > > > > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html > > > > > >>>>> ) > > > > > >>>>>>> to elastic search adapter. Since scrolling has > non-negligible > > > > > effect on > > > > > >>>>> the > > > > > >>>>>>> cluster it should be selectively enabled on per query > basis. > > > So, > > > > > likely, > > > > > >>>>>>> user has to explicitly set "scroll flag" somewhere. > > > > > >>>>>>> > > > > > >>>>>>> Most natural way seems in SQL. [Calcite sql grammar]( > > > > > >>>>>>> https://calcite.apache.org/docs/reference.html) has > `SCROLL` > > > > > keyword > > > > > >>>>>>> (unused to my knowledge). There were also discussions about > > > adding > > > > > >>>>> hints to > > > > > >>>>>>> Calcite. > > > > > >>>>>>> > > > > > >>>>>>> ### Examples > > > > > >>>>>>> ```sql > > > > > >>>>>>> -- special sql keyword ? > > > > > >>>>>>> SCROLL select * from elastic; > > > > > >>>>>>> > > > > > >>>>>>> -- assuming hints are available in calcite > > > > > >>>>>>> /* HINT: scroll */ select * from elastic; > > > > > >>>>>>> ``` > > > > > >>>>>>> > > > > > >>>>>>> What people think about this use-case ? Are there better > > ideas > > > ? > > > > > >>>>>>> > > > > > >>>>>>> Regards, > > > > > >>>>>>> Andrei. > > > > > >>>>>>> > > > > > > > > > > > > > > > >