[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626997#comment-17626997 ] Stamatis Zampetakis commented on CALCITE-2322: -- I plan to do a final pass in the following weeks and merge this unless I discover something major. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626710#comment-17626710 ] Julian Hyde commented on CALCITE-2322: -- [~zabetak], If you have time, could you get this PR into a state you are happy with and merge it? I think that it's more important to merge this PR than to get it perfect to everyone's satisfaction. I now think {{fetch_size}} is a good name. JDK has a method {{ResultSet.setFetchSize(int rows)}}, so it is clear that {{fetch_size}} refers to rows. I would also accept whatever name [~zabetak] decides. I apologize for my bike-shedding on this. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371654#comment-17371654 ] Julian Hyde commented on CALCITE-2322: -- I agree with [~vlsi] that it is more useful to "fetch N bytes" than "fetch N rows". (Efficiency is about doing a reasonable amount of work per network round-trip and per batch of rows, and since the scarce resources are network packet size and CPU cache, things are best measured in bytes.) @vlsi wrote: {quote}I don't really know how to add a byte-sized property side by side with fetch_row_count.{quote} I think the ideal strategy will be a mix of rows and bytes, something like "fetch 100 rows or 64K bytes, whichever is larger, but in any case not more than 1024K bytes". That strategy wouldn't make much sense if decomposed into 4 properties ("minRows", "maxRows", "minBytes", "maxBytes") because someone might forget to set one property and end up with a default that contradicts the other properties. So I would have a new property called "fetchPolicy", which combines them all. (The format of "fetchPolicy" is TBD, but you can think of it as a JSON string containing various numbers.) I'm fine with Vladimir's suggestion "fetch_size_rows". I don't like the "default_" prefix because I think of "fetch_size_rows" as a property that can be set at one level (in this case, when the connection is created) and overridden at other levels (say by setting a property during the life of the connection, or by overriding for a particular statement). There are many other properties, e.g. "locale", "timeZone", that could be set in a similar way. Using the same property name throughout, without "default" prefix, makes it easier to override at different points in the lifecycle. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371618#comment-17371618 ] Zac commented on CALCITE-2322: -- {quote}In other words: suppose you configure fetch_row_count=42, and you do not call setFetchSize with Java API. Is the system allowed to fetch 1000 rows behind the scenes? {quote} It seems reasonable that fetch_row_count could also be _just_ a hint and not a mandate, given that is what [setFetchSize|https://docs.oracle.com/javase/8/docs/api/java/sql/Statement.html#setFetchSize-int-] does: {quote}Gives the JDBC driver a hint as to the number of rows that should be fetched from the database {quote} Furthermore, it seems like in the future, "dynamic fetch" would have a separate configuration that would be parallel to fetch_row_count (the user would not both set an explicit row count and whatever the corresponding "dynamic fetch" configuration is), and the documentation could clearly state that using both would have undefined results (or throw an exception, etc). Regarding the name, I do see your point, but (FWIW) I personally find "default_" to be a confusing prefix. In my mind, a default value is what's used when no other value is specified. So to be explicitly setting a value named default seems confusing (possibly there's some synonym of default that would work, but i'm not coming up with anything). That said, i'll change it to whatever you and [~julianhyde] decide. I'm definitely fine with changing it to fetch_size_rows. [~julianhyde] - are you ok with "fetch_size_rows" (/ would you prefer to name it "default_fetch_size_rows")? > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371594#comment-17371594 ] Vladimir Sitnikov commented on CALCITE-2322: Just in case, I think {{(default_)fetch_size_rows}} and {{(default_)fetch_size_bytes}} are more consistent than {{fetch_row_count}} and {{fetch_byte_count}} (?) {{fetch_bytes}} (?). I don't really know how to add a byte-sized property side by side with {{fetch_row_count}}. Julian asked for a byte-size property in 2018 (https://issues.apache.org/jira/browse/CALCITE-2322?focusedCommentId=16484416=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16484416), and I missed the comments completely. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371587#comment-17371587 ] Vladimir Sitnikov commented on CALCITE-2322: Just in case, the users rarely need "row count" when they configure fetch size. In 99.42% of the cases what they need is to make sure fetch size is big enough (to avoid too many roundtrips), and they want it small enough (to avoid out of memory conditions). So, in my opinion, what they really need is "fetch_size_bytes" rather than "fetch_size_rows". Unfortunately, JDBC spec is broken there, and there's no API for "fetch size bytes", however, the databases might support that (e.g. "please fetch 1 rows maximum, but stop as soon as you overflow 1 megabyte"). Suppose dynamic fetch would be implemented in the future. In other words, Avatica could fetch more or fewer rows depending on the volume of the data (e.g. to prevent too small or too large resultsets). I guess it is more or less the direction many JDBC drivers are moving. In that case, default_... makes it easier to reason: ok, this is a default fetch size we start from, then it could adapt. In other words: suppose you configure fetch_row_count=42, and you do not call setFetchSize with Java API. Is the system allowed to fetch 1000 rows behind the scenes? My point is that default_fetch_row_count=42 (or default_fetch_size_rows or even default_fetch_size_bytes) would keep the door open. However, I do not insist on the naming since there is no single right answer. As far as I know, Microsoft SQL Server JDBC driver supports only dynamic fetch algorithm (I guess they just ignore setFetchSize calls). > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371515#comment-17371515 ] Zac commented on CALCITE-2322: -- [~vladimirsitnikov] - you bring up a good point, however i feel we might be able to solve this with clear documentation (as [~julianhyde] suggests). What do you think of this: [https://github.com/apache/calcite-avatica/pull/148/files#diff-141a82cee99f7d96b403700fcfc1b03d0bd5d40ea29f925a5b4ed3544a6d7164R178-R180] Does that help remove some of the ambiguity? If not, do you have suggestions for improving that description? > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371507#comment-17371507 ] Julian Hyde commented on CALCITE-2322: -- In my opinion it is fairly obvious that it can be overridden per statement. But we should say so in the comments. Comments should also say how to set this parameter so that we don’t override the providers default. And there should be another value that means “unlimited”. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17371499#comment-17371499 ] Vladimir Sitnikov commented on CALCITE-2322: "fetch_row_count" implies that Avatica would **always** use the given row count. That is too restrictive, and it creates ambiguities: "what if I specify fetch_row_count both at connection and at the statement level?" "default_fetch_row_count" makes perfect sense for the connection property: it tells the connection which value to start with, and then the implementation should be free to adjust it. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370948#comment-17370948 ] Zac commented on CALCITE-2322: -- Thanks for the guidance [~julianhyde]! I'll open an updated PR shortly. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370941#comment-17370941 ] Julian Hyde commented on CALCITE-2322: -- Sorry, cancel that. I didn't read the context. I think the property should be called "fetch_row_count": * not "fetch_size" (because that implies bytes, not rows); * not "default_fetch_row_count" (because it is understood that this is a default that can be overridden per-statement; furthermore, there is a default default, which is 100); * not "fetchRowCount" (because Avatica uses snake_case, even though Calcite uses lowerCamelCase). Please add the new property to [Avatica client reference.|https://calcite.apache.org/avatica/docs/client_reference.html] Re. camel vs. snake. I'll log a separate Jira case to allow properties to be specified in any form - UPPER_SNAKE or lower_snake or lowerCamel or UpperCamel. For now, stick with Avatica's house style, which is lower_snake. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370933#comment-17370933 ] Julian Hyde commented on CALCITE-2322: -- Properties are defined in [class CalciteConnectionProperty|https://github.com/apache/calcite/blob/4bc916619fd286b2c0cc4d5c653c96a68801d74e/core/src/main/java/org/apache/calcite/config/CalciteConnectionProperty.java#L39]. Follow the examples there. CALCITE-2995 is an example change that added a property. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: avatica, core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Priority: Major > Time Spent: 50m > Remaining Estimate: 0h > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685674#comment-16685674 ] ASF GitHub Bot commented on CALCITE-2322: - Github user F21 commented on the issue: https://github.com/apache/calcite-avatica/pull/49 @kminder We are planning to release Avatica 1.13.0 soon. Would you be able to take another look at this PR? > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505460#comment-16505460 ] Julian Hyde commented on CALCITE-2322: -- Oh crap. Calcite connect string parameters use {{lowerCamelCase}} (see https://calcite.apache.org/docs/adapter.html#jdbc-connect-string-parameters); and now I see that Avatica connect string parameters use {{snake_case}}. Now we somehow need to bridge between two conflicting styles. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505395#comment-16505395 ] ASF GitHub Bot commented on CALCITE-2322: - Github user laurentgo commented on a diff in the pull request: https://github.com/apache/calcite-avatica/pull/49#discussion_r193906310 --- Diff: core/src/main/java/org/apache/calcite/avatica/AvaticaStatement.java --- @@ -108,6 +109,7 @@ protected AvaticaStatement(AvaticaConnection connection, this.resultSetType = resultSetType; this.resultSetConcurrency = resultSetConcurrency; this.resultSetHoldability = resultSetHoldability; +this.fetchSize = connection.config().fetchSize(); // Default to connection config fetch size. --- End diff -- the value is both set in the field declaration and in the constructor, just suggesting to remove the default value in the field initialization... > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505392#comment-16505392 ] ASF GitHub Bot commented on CALCITE-2322: - Github user laurentgo commented on a diff in the pull request: https://github.com/apache/calcite-avatica/pull/49#discussion_r193906103 --- Diff: core/src/main/java/org/apache/calcite/avatica/BuiltInConnectionProperty.java --- @@ -77,7 +77,10 @@ TRUSTSTORE_PASSWORD("truststore_password", Type.STRING, null, false), HOSTNAME_VERIFICATION("hostname_verification", Type.ENUM, HostnameVerification.STRICT, - HostnameVerification.class, false); + HostnameVerification.class, false), + + /** Fetch size limit, default is 100 rows. */ + FETCH_SIZE("fetch_size", Type.NUMBER, AvaticaStatement.DEFAULT_FETCH_SIZE, false); --- End diff -- yes, default_fetch_size for the jdbc property, which means the enum become DEFAULT_FETCH_SIZE (that was my point :) ) > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16505070#comment-16505070 ] Kevin Minder commented on CALCITE-2322: --- Sorry, I'm still uncertain. I specifically chose "fetch_size" because it aligned with the style of all of the other URL parameters in BuiltInConnectionProperty (i.e. avatica_user, avatica_password, httpclient_factory, httpclient_impl, truststore_password, hostname_verification). Can you confirm that you want a URL like below? jdbc:phoenix:thin:url=...;avatica_user=u;avatica_password=p;httpclient_impl=c;truststore_password=t;fetchSize=1000 {code:java} /** Avatica-based authentication user name */ AVATICA_USER("avatica_user", Type.STRING, null, false), /** Avatica-based authentication password */ AVATICA_PASSWORD("avatica_password", Type.STRING, null, false), /** Factory for constructing http clients. */ HTTP_CLIENT_FACTORY("httpclient_factory", Type.PLUGIN, AvaticaHttpClientFactoryImpl.class.getName(), false), /** HttpClient implementation class name. */ HTTP_CLIENT_IMPL("httpclient_impl", Type.STRING, null, false), /** Principal to use to perform Kerberos login. */ PRINCIPAL("principal", Type.STRING, null, false), /** Keytab to use to perform Kerberos login. */ KEYTAB("keytab", Type.STRING, null, false), /** Truststore for SSL/TLS communication */ TRUSTSTORE("truststore", Type.STRING, null, false), /** Password for the truststore */ TRUSTSTORE_PASSWORD("truststore_password", Type.STRING, null, false), HOSTNAME_VERIFICATION("hostname_verification", Type.ENUM, HostnameVerification.STRICT, HostnameVerification.class, false), /** Fetch size limit, default is 100 rows. */ FETCH_SIZE("fetch_size", Type.NUMBER, AvaticaStatement.DEFAULT_FETCH_SIZE, false);{code} Most of the ones you mention (i.e. escapeProcessing, fetchDirection, maxRows, poolable, queryTimeout) don't even appear to be settable via the connection URL. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498553#comment-16498553 ] Julian Hyde commented on CALCITE-2322: -- Same would go for other Statement properties (escapeProcessing, fetchDirection, maxRows, poolable, queryTimeout) and Connection properties (autoCommit, catalog, holdability, readOnly, transactionIsolation) we might add in the future. We already have a "schema" connection property, which maps to Connection.setSchema(String). > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498547#comment-16498547 ] Julian Hyde commented on CALCITE-2322: -- My vote would be for "fetchSize" rather than "defaultFetchSize". It is shorter, but I think it is still clear that this is the initial value of Statement.getFetchSize that can be overridden by calling Statement.setFetchSize. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498451#comment-16498451 ] ASF GitHub Bot commented on CALCITE-2322: - Github user kminder commented on a diff in the pull request: https://github.com/apache/calcite-avatica/pull/49#discussion_r192495397 --- Diff: core/src/main/java/org/apache/calcite/avatica/AvaticaStatement.java --- @@ -108,6 +109,7 @@ protected AvaticaStatement(AvaticaConnection connection, this.resultSetType = resultSetType; this.resultSetConcurrency = resultSetConcurrency; this.resultSetHoldability = resultSetHoldability; +this.fetchSize = connection.config().fetchSize(); // Default to connection config fetch size. --- End diff -- I'm not clear on what you are suggesting here. Are you suggesting this be defaulted to 0 and have the real value resolved in getFetchSize()? I went with the approach I did because a setting made via the connection url is accessed through the connection object. The connection reference isn't stored within AvaticaStatement so the connection value needs to be propagated down somehow. You are correct that if the value is explicitly set to 0 either via url param or via Statement.setFetchSize or ResultSet.setFetchSize I have no idea what will happen. All I can say is that I didn't change that behavior with this change. It will do whatever is would have done before. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498429#comment-16498429 ] ASF GitHub Bot commented on CALCITE-2322: - Github user kminder commented on a diff in the pull request: https://github.com/apache/calcite-avatica/pull/49#discussion_r192492050 --- Diff: core/src/main/java/org/apache/calcite/avatica/BuiltInConnectionProperty.java --- @@ -77,7 +77,10 @@ TRUSTSTORE_PASSWORD("truststore_password", Type.STRING, null, false), HOSTNAME_VERIFICATION("hostname_verification", Type.ENUM, HostnameVerification.STRICT, - HostnameVerification.class, false); + HostnameVerification.class, false), + + /** Fetch size limit, default is 100 rows. */ + FETCH_SIZE("fetch_size", Type.NUMBER, AvaticaStatement.DEFAULT_FETCH_SIZE, false); --- End diff -- I'm not clear on your suggestion. Are you suggesting that the url connection parameter be named "DEFAULT_FETCH_SIZE" or as you suggest above "defaultFetchSize". You mention having looked at a few other drivers I've looked too and don't see much consistency. I chose "fetch_size" because that seemed consistent with the other parameters. Would "default_fetch_size" be agreeable? > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494573#comment-16494573 ] ASF GitHub Bot commented on CALCITE-2322: - Github user laurentgo commented on a diff in the pull request: https://github.com/apache/calcite-avatica/pull/49#discussion_r191620980 --- Diff: core/src/main/java/org/apache/calcite/avatica/BuiltInConnectionProperty.java --- @@ -77,7 +77,10 @@ TRUSTSTORE_PASSWORD("truststore_password", Type.STRING, null, false), HOSTNAME_VERIFICATION("hostname_verification", Type.ENUM, HostnameVerification.STRICT, - HostnameVerification.class, false); + HostnameVerification.class, false), + + /** Fetch size limit, default is 100 rows. */ + FETCH_SIZE("fetch_size", Type.NUMBER, AvaticaStatement.DEFAULT_FETCH_SIZE, false); --- End diff -- I would recommend to name it `DEFAULT_FETCH_SIZE` > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494572#comment-16494572 ] ASF GitHub Bot commented on CALCITE-2322: - Github user laurentgo commented on a diff in the pull request: https://github.com/apache/calcite-avatica/pull/49#discussion_r191620296 --- Diff: core/src/main/java/org/apache/calcite/avatica/AvaticaStatement.java --- @@ -108,6 +109,7 @@ protected AvaticaStatement(AvaticaConnection connection, this.resultSetType = resultSetType; this.resultSetConcurrency = resultSetConcurrency; this.resultSetHoldability = resultSetHoldability; +this.fetchSize = connection.config().fetchSize(); // Default to connection config fetch size. --- End diff -- let's remove the default value set for the field declaration > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487313#comment-16487313 ] Kevin Minder commented on CALCITE-2322: --- An additional parameter for fetch_bytes or buffer_size seems like a useful addition. The interplay between them however may become confusing. What would the behavior be if the bytes required to supply fetch_size rows exceeds the byte could suppled by fetch_bytes? > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486585#comment-16486585 ] ASF GitHub Bot commented on CALCITE-2322: - GitHub user kminder opened a pull request: https://github.com/apache/calcite-avatica/pull/49 CALCITE-2322: Add fetch size support to connection url and JDBC state… Adds the support for both a URL connection param fetch_size and the JDBD Statement API setFetchSize. This can be used to tune performance for queries that return a large number of rows. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kminder/calcite-avatica calcite-2322_add-fetch-size-support-to-connection-url-and-jdbc-statement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/calcite-avatica/pull/49.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #49 commit c67cd9ac190532e7255b7936812665999b6e334f Author: Kevin MinderDate: 2018-05-23T01:46:23Z CALCITE-2322: Add fetch size support to connection url and JDBC statement > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > Attachments: CALCITE-2322.patch > > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16486588#comment-16486588 ] Kevin Minder commented on CALCITE-2322: --- Created pull request: https://github.com/apache/calcite-avatica/pull/49 > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > Attachments: CALCITE-2322.patch > > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484416#comment-16484416 ] Julian Hyde commented on CALCITE-2322: -- We prefer pull requests to patches, if you don't mind. What are your thoughts on also having a fetch-bytes parameter? (Not necessarily as part of this change.) It's frustrating that ODBC and JDBC specify row counts, whereas all of the infrastructure between client and server (fixed-size buffer, MTU) cares about byte counts. Mapping between the two is not straightforward because we don't always have good estimates for row width. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > Attachments: calcite-avatica.patch > > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CALCITE-2322) Add fetch size support to connection url and JDBC statement
[ https://issues.apache.org/jira/browse/CALCITE-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484409#comment-16484409 ] Kevin Minder commented on CALCITE-2322: --- I'll have to review your development workflow but I've attached a patch that implements this. > Add fetch size support to connection url and JDBC statement > --- > > Key: CALCITE-2322 > URL: https://issues.apache.org/jira/browse/CALCITE-2322 > Project: Calcite > Issue Type: Improvement > Components: core >Affects Versions: 1.11.0 >Reporter: Kevin Minder >Assignee: Julian Hyde >Priority: Major > Attachments: calcite-avatica.patch > > > Currently the remote driver defaults to hard coded fetch size of 100 rows. > When a connection is operating in HTTP mode having such a small fetch size > can add enormous overhead. This is especially true if TLS connections are > used and made worse if each connection flows throw multiple proxies. > Consider that 100K rows returned 100 rows at a time will make 1K HTTP POST > requests. One might say that nobody should ever do that but some tools like > Spotfire may end up doing this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)