[jira] [Commented] (DRILL-7107) Unable to connect to Drill 1.15 through ZK
[ https://issues.apache.org/jira/browse/DRILL-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801211#comment-16801211 ] ASF GitHub Bot commented on DRILL-7107: --- bitblender commented on pull request #1702: DRILL-7107 Unable to connect to Drill 1.15 through ZK URL: https://github.com/apache/drill/pull/1702#discussion_r268885320 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/coord/zk/ZKClusterCoordinator.java ## @@ -81,23 +78,20 @@ private ConcurrentHashMap endpointsMap = new ConcurrentHashMap(); private static final Pattern ZK_COMPLEX_STRING = Pattern.compile("(^.*?)/(.*)/([^/]*)$"); - public ZKClusterCoordinator(DrillConfig config, String connect) - throws IOException, DrillbitStartupException { -this(config, connect, null); + public ZKClusterCoordinator(DrillConfig config, String connect) { +this(config, connect, new DefaultACLProvider()); } - public ZKClusterCoordinator(DrillConfig config, BootStrapContext context) - throws IOException, DrillbitStartupException { -this(config, null, context); + public ZKClusterCoordinator(DrillConfig config, ACLProvider aclProvider) { +this(config, null, aclProvider); Review comment: I tried writing a test where the Drillbits (inside ClusterFixture) are setup with ZK_APPLY_SECURE_ACL=false (to avoid the need to setup a secure ZK server within the unit test) and the ClientFixture is setup with ZK_APPLY_SECURE_ACL=true (to simulate the failure). Starting a test with different values for the same property turns out to be quite hard because the ClusterFixture internally instantiates a ClientFixure. Changing this behavior might affect other tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unable to connect to Drill 1.15 through ZK > -- > > Key: DRILL-7107 > URL: https://issues.apache.org/jira/browse/DRILL-7107 > Project: Apache Drill > Issue Type: Bug >Reporter: Karthikeyan Manivannan >Assignee: Karthikeyan Manivannan >Priority: Major > Fix For: 1.16.0 > > > After upgrading to Drill 1.15, users are seeing they are no longer able to > connect to Drill using ZK quorum. They are getting the following "Unable to > setup ZK for client" error. > [~]$ sqlline -u "jdbc:drill:zk=172.16.2.165:5181;auth=maprsasl" > Error: Failure in connecting to Drill: > org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for client. > (state=,code=0) > java.sql.SQLNonTransientConnectionException: Failure in connecting to Drill: > org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for client. > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:174) > at > org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:67) > at > org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:67) > at > org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138) > at org.apache.drill.jdbc.Driver.connect(Driver.java:72) > at sqlline.DatabaseConnection.connect(DatabaseConnection.java:130) > at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:179) > at sqlline.Commands.connect(Commands.java:1247) > at sqlline.Commands.connect(Commands.java:1139) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38) > at sqlline.SqlLine.dispatch(SqlLine.java:722) > at sqlline.SqlLine.initArgs(SqlLine.java:416) > at sqlline.SqlLine.begin(SqlLine.java:514) > at sqlline.SqlLine.start(SqlLine.java:264) > at sqlline.SqlLine.main(SqlLine.java:195) > Caused by: org.apache.drill.exec.rpc.RpcException: Failure setting up ZK for > client. > at org.apache.drill.exec.client.DrillClient.connect(DrillClient.java:340) > at > org.apache.drill.jdbc.impl.DrillConnectionImpl.(DrillConnectionImpl.java:165) > ... 18 more > Caused by: java.lang.NullPointerException > at > org.apache.drill.exec.coord.zk.ZKACLProviderFactory.findACLProvider(ZKACLProviderFactory.java:68) > at > org.apache.drill.exec.coord.zk.ZKACLProviderFactory.getACLProvider(ZKACLProviderFactory.java:47) > at > org.apache.drill.exec.coord.zk.ZKClusterCoordinator.(ZKClusterCoordinator
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801044#comment-16801044 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on issue #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#issuecomment-476368650 @vvysotskyi made changes and verified that tests passed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7108) With statistics enabled TPCH 16 has two additional exchange operators
[ https://issues.apache.org/jira/browse/DRILL-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801040#comment-16801040 ] Robert Hou commented on DRILL-7108: --- I have verified this fix. > With statistics enabled TPCH 16 has two additional exchange operators > - > > Key: DRILL-7108 > URL: https://issues.apache.org/jira/browse/DRILL-7108 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > TPCH 16 with sf 100 runs 14% slower. Here is the query: > {noformat} > select > p.p_brand, > p.p_type, > p.p_size, > count(distinct ps.ps_suppkey) as supplier_cnt > from > partsupp ps, > part p > where > p.p_partkey = ps.ps_partkey > and p.p_brand <> 'Brand#21' > and p.p_type not like 'MEDIUM PLATED%' > and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24) > and ps.ps_suppkey not in ( > select > s.s_suppkey > from > supplier s > where > s.s_comment like '%Customer%Complaints%' > ) > group by > p.p_brand, > p.p_type, > p.p_size > order by > supplier_cnt desc, > p.p_brand, > p.p_type, > p.p_size; > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (DRILL-7108) With statistics enabled TPCH 16 has two additional exchange operators
[ https://issues.apache.org/jira/browse/DRILL-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou closed DRILL-7108. - > With statistics enabled TPCH 16 has two additional exchange operators > - > > Key: DRILL-7108 > URL: https://issues.apache.org/jira/browse/DRILL-7108 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.16.0 >Reporter: Robert Hou >Assignee: Gautam Parai >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > TPCH 16 with sf 100 runs 14% slower. Here is the query: > {noformat} > select > p.p_brand, > p.p_type, > p.p_size, > count(distinct ps.ps_suppkey) as supplier_cnt > from > partsupp ps, > part p > where > p.p_partkey = ps.ps_partkey > and p.p_brand <> 'Brand#21' > and p.p_type not like 'MEDIUM PLATED%' > and p.p_size in (38, 2, 8, 31, 44, 5, 14, 24) > and ps.ps_suppkey not in ( > select > s.s_suppkey > from > supplier s > where > s.s_comment like '%Customer%Complaints%' > ) > group by > p.p_brand, > p.p_type, > p.p_size > order by > supplier_cnt desc, > p.p_brand, > p.p_type, > p.p_size; > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800974#comment-16800974 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268800813 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ## @@ -270,4 +271,10 @@ public void setResultSet(AvaticaResultSet resultSet) { public void setUpdateCount(int value) { updateCount = value; } + + @Override + public void setLargeMaxRows(long maxRowCount) throws SQLException { +execute("ALTER SESSION SET `" + ExecConstants.QUERY_MAX_ROWS + "`="+maxRowCount); +this.maxRowCount = maxRowCount; Review comment: We need this here to ensure that when `getLargeMaxRows()` is called, we are reading it back from the value that was set using `setLargeMaxRows()`. Avatica only holds the value that has been set. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800919#comment-16800919 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268772086 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/DrillPreparedStatement.java ## @@ -32,4 +33,25 @@ */ public interface DrillPreparedStatement extends PreparedStatement { + /** + * @throws SQLException + *Any SQL exception + */ + @Override + int getMaxRows() throws SQLException; Review comment: I think I was declaring it because it seemed Avatica preferred using `setLargeMaxRows() / getLargeMaxRows()` . I, anyway, cannot really override this (all calls to `setMaxRows() / getMaxRows()` are redirected to the `...Large...` methods), so I'll remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800917#comment-16800917 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268771066 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ## @@ -107,6 +108,20 @@ public QueryContext(final UserSession session, final DrillbitContext drillbitCon this.table = drillbitContext.getOperatorTable(); } +// Checking for limit on ResultSet rowcount and if user attempting to override the system value +int sessionMaxRowCount = queryOptions.getOption(ExecConstants.QUERY_MAX_ROWS).num_val.intValue(); +int defaultMaxRowCount = queryOptions.getOptionManager(OptionScope.SYSTEM).getOption(ExecConstants.QUERY_MAX_ROWS).num_val.intValue(); +if (sessionMaxRowCount > 0 && defaultMaxRowCount > 0) { + this.autoLimitRowCount = Math.min(sessionMaxRowCount, defaultMaxRowCount); +} else { + this.autoLimitRowCount = Math.max(sessionMaxRowCount, defaultMaxRowCount); +} +if (autoLimitRowCount == defaultMaxRowCount && defaultMaxRowCount != sessionMaxRowCount) { + // Required to indicate via OptionScope=QueryLevel that session limit is overridden by system limit + queryOptions.setLocalOption(ExecConstants.QUERY_MAX_ROWS, autoLimitRowCount); +} +logger.debug("ResultSet size is auto-limited to {} rows [Session: {} / Default: {}]", this.autoLimitRowCount, sessionMaxRowCount, defaultMaxRowCount); Review comment: Good point. Will mark it for when autoLimit is non-zero. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800913#comment-16800913 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268769689 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ## @@ -283,6 +298,21 @@ public RemoteFunctionRegistry getRemoteFunctionRegistry() { return drillbitContext.getRemoteFunctionRegistry(); } + /** + * Returns the maximum size of auto-limited resultset + * @return Maximum size of auto-limited resultSet + */ + public int getAutoLimitRowCount() { Review comment: Ok. Since I was using it at 4 other places, It seemed cleaner to have a single API instead of resolving it through `getOptions().getOption...` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800911#comment-16800911 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268768631 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java ## @@ -283,7 +283,9 @@ new OptionDefinition(ExecConstants.NDV_BLOOM_FILTER_FPOS_PROB_VALIDATOR), new OptionDefinition(ExecConstants.RM_QUERY_TAGS_VALIDATOR, new OptionMetaData(OptionValue.AccessibleScopes.SESSION_AND_QUERY, false, false)), - new OptionDefinition(ExecConstants.RM_QUEUES_WAIT_FOR_PREFERRED_NODES_VALIDATOR) + new OptionDefinition(ExecConstants.RM_QUEUES_WAIT_FOR_PREFERRED_NODES_VALIDATOR), + new OptionDefinition(ExecConstants.RETURN_RESULT_SET_FOR_DDL_VALIDATOR), Review comment: Looks like this slipped through when resolving merge conflicts with the latest master branch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800909#comment-16800909 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268767703 ## File path: exec/jdbc/src/test/java/org/apache/drill/jdbc/PreparedStatementTest.java ## @@ -72,12 +73,18 @@ public class PreparedStatementTest extends JdbcTestBase { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(PreparedStatementTest.class); + private static final Random RANDOMIZER = new Random(20150304); private static final String SYS_VERSION_SQL = "select * from sys.version"; private static final String SYS_RANDOM_SQL = "SELECT cast(random() as varchar) as myStr FROM (VALUES(1)) " + "union SELECT cast(random() as varchar) as myStr FROM (VALUES(1)) " + "union SELECT cast(random() as varchar) as myStr FROM (VALUES(1)) "; + private static final String SYS_OPTIONS_SQL = "SELECT * FROM sys.options"; + private static final String SYS_OPTIONS_SQL_LIMIT_10 = "SELECT * FROM sys.options LIMIT 12"; + private static final String ALTER_SYS_OPTIONS_MAX_ROWS_LIMIT_X = "ALTER SYSTEM SET `"+ExecConstants.QUERY_MAX_ROWS+"`="; Review comment: 👍 Will fix for `StatementTest` as well This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800904#comment-16800904 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268766977 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillPreparedStatementImpl.java ## @@ -259,4 +261,17 @@ public void setObject(int parameterIndex, Object x, SQLType targetSqlType) throw checkOpen(); super.setObject(parameterIndex, x, targetSqlType); } + + @Override + public void setLargeMaxRows(long maxRowCount) throws SQLException { +Statement setMaxStmt = this.connection.createStatement(); +setMaxStmt.execute("ALTER SESSION SET `" + ExecConstants.QUERY_MAX_ROWS + "`="+maxRowCount); Review comment: 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800905#comment-16800905 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268767018 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ## @@ -270,4 +271,10 @@ public void setResultSet(AvaticaResultSet resultSet) { public void setUpdateCount(int value) { updateCount = value; } + + @Override + public void setLargeMaxRows(long maxRowCount) throws SQLException { +execute("ALTER SESSION SET `" + ExecConstants.QUERY_MAX_ROWS + "`="+maxRowCount); Review comment: 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800903#comment-16800903 ] ASF GitHub Bot commented on DRILL-7048: --- kkhatua commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268766549 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillCursor.java ## @@ -361,6 +361,9 @@ void close() { ExecConstants.JDBC_BATCH_QUEUE_THROTTLING_THRESHOLD ); resultsListener = new ResultsListener(this, batchQueueThrottlingThreshold); currentBatchHolder = new RecordBatchLoader(client.getAllocator()); + +// Set Query Timeout and MaxRows Review comment: Actually, we don't need it here any more. I'll fix the comment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7051) Upgrade jetty
[ https://issues.apache.org/jira/browse/DRILL-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800870#comment-16800870 ] ASF GitHub Bot commented on DRILL-7051: --- sohami commented on pull request #1681: DRILL-7051: Upgrade jetty URL: https://github.com/apache/drill/pull/1681#discussion_r268740814 ## File path: pom.xml ## @@ -2553,7 +2548,13 @@ 4.0.1 provided - + + +javax.ws.rs +javax.ws.rs-api Review comment: why we need this and other Javax Servlet dependencies ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Upgrade jetty > -- > > Key: DRILL-7051 > URL: https://issues.apache.org/jira/browse/DRILL-7051 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.15.0 >Reporter: Veera Naranammalpuram >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.16.0 > > > Is Drill using a version of jetty web server that's really old? The jar's > suggest it's using jetty 9.1 that was built sometime in 2014? > {noformat} > -rw-r--r-- 1 veeranaranammalpuram staff 15988 Nov 20 2017 > jetty-continuation-9.1.1.v20140108.jar > -rw-r--r-- 1 veeranaranammalpuram staff 103288 Nov 20 2017 > jetty-http-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 101519 Nov 20 2017 > jetty-io-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 95906 Nov 20 2017 > jetty-security-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 401593 Nov 20 2017 > jetty-server-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 110992 Nov 20 2017 > jetty-servlet-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 119215 Nov 20 2017 > jetty-servlets-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 341683 Nov 20 2017 > jetty-util-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 38707 Dec 21 15:42 > jetty-util-ajax-9.3.19.v20170502.jar > -rw-r--r-- 1 veeranaranammalpuram staff 111466 Nov 20 2017 > jetty-webapp-9.1.1.v20140108.jar > -rw-r--r-- 1 veeranaranammalpuram staff 41763 Nov 20 2017 > jetty-xml-9.1.1.v20140108.jar {noformat} > This version is shown as deprecated: > [https://www.eclipse.org/jetty/documentation/current/what-jetty-version.html#d0e203] > Opening this to upgrade jetty to the latest stable supported version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7077) Add Function to Facilitate Time Series Analysis
[ https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7077: Labels: doc-impacting ready-to-commit (was: doc-impacting) > Add Function to Facilitate Time Series Analysis > --- > > Key: DRILL-7077 > URL: https://issues.apache.org/jira/browse/DRILL-7077 > Project: Apache Drill > Issue Type: New Feature >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting, ready-to-commit > Fix For: 1.16.0 > > > When analyzing time based data, you will often have to aggregate by time > grains. While some time grains will be easy to calculate, others, such as > quarter, can be quite difficult. These functions enable a user to quickly and > easily aggregate data by various units of time. Usage is as follows: > {code:java} > SELECT > FROM > GROUP BY nearestDate(, {code} > So let's say that a user wanted to count the number of hits on a web server > per 15 minute, the query might look like this: > {code:java} > SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate, > COUNT(*) AS hitCount > FROM dfs.`log.httpd` > GROUP BY nearestDate(`eventDate`, '15MINUTE'){code} > Currently supports the following time units: > * YEAR > * QUARTER > * MONTH > * WEEK_SUNDAY > * WEEK_MONDAY > * DAY > * HOUR > * HALF_HOUR / 30MIN > * QUARTER_HOUR / 15MIN > * MINUTE > * 30SECOND > * 15SECOND > * SECOND > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7077) Add Function to Facilitate Time Series Analysis
[ https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800832#comment-16800832 ] ASF GitHub Bot commented on DRILL-7077: --- arina-ielchiieva commented on issue #1680: DRILL-7077: Add Function to Facilitate Time Series Analysis URL: https://github.com/apache/drill/pull/1680#issuecomment-476264054 +1, LGTM. Please squash the commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Function to Facilitate Time Series Analysis > --- > > Key: DRILL-7077 > URL: https://issues.apache.org/jira/browse/DRILL-7077 > Project: Apache Drill > Issue Type: New Feature >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting > Fix For: 1.16.0 > > > When analyzing time based data, you will often have to aggregate by time > grains. While some time grains will be easy to calculate, others, such as > quarter, can be quite difficult. These functions enable a user to quickly and > easily aggregate data by various units of time. Usage is as follows: > {code:java} > SELECT > FROM > GROUP BY nearestDate(, {code} > So let's say that a user wanted to count the number of hits on a web server > per 15 minute, the query might look like this: > {code:java} > SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate, > COUNT(*) AS hitCount > FROM dfs.`log.httpd` > GROUP BY nearestDate(`eventDate`, '15MINUTE'){code} > Currently supports the following time units: > * YEAR > * QUARTER > * MONTH > * WEEK_SUNDAY > * WEEK_MONDAY > * DAY > * HOUR > * HALF_HOUR / 30MIN > * QUARTER_HOUR / 15MIN > * MINUTE > * 30SECOND > * 15SECOND > * SECOND > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7077) Add Function to Facilitate Time Series Analysis
[ https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800825#comment-16800825 ] ASF GitHub Bot commented on DRILL-7077: --- ihuzenko commented on pull request #1680: DRILL-7077: Add Function to Facilitate Time Series Analysis URL: https://github.com/apache/drill/pull/1680#discussion_r268716085 ## File path: contrib/udfs/src/test/java/org/apache/drill/exec/udfs/TestNearestDateFunctions.java ## @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.udfs; + +import org.apache.drill.categories.SqlFunctionTest; +import org.apache.drill.categories.UnlikelyTest; +import org.apache.drill.common.exceptions.DrillRuntimeException; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterFixtureBuilder; +import org.apache.drill.test.ClusterTest; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.experimental.categories.Category; + +import java.time.LocalDateTime; + +import static org.junit.Assert.assertTrue; + +@Category({UnlikelyTest.class, SqlFunctionTest.class}) +public class TestNearestDateFunctions extends ClusterTest { + + @BeforeClass + public static void setup() throws Exception { +ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher); +startCluster(builder); + } + + @Test + public void testNearestDate() throws Exception { Review comment: ok, cool) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add Function to Facilitate Time Series Analysis > --- > > Key: DRILL-7077 > URL: https://issues.apache.org/jira/browse/DRILL-7077 > Project: Apache Drill > Issue Type: New Feature >Reporter: Charles Givre >Assignee: Charles Givre >Priority: Major > Labels: doc-impacting > Fix For: 1.16.0 > > > When analyzing time based data, you will often have to aggregate by time > grains. While some time grains will be easy to calculate, others, such as > quarter, can be quite difficult. These functions enable a user to quickly and > easily aggregate data by various units of time. Usage is as follows: > {code:java} > SELECT > FROM > GROUP BY nearestDate(, {code} > So let's say that a user wanted to count the number of hits on a web server > per 15 minute, the query might look like this: > {code:java} > SELECT nearestDate(`eventDate`, '15MINUTE' ) AS eventDate, > COUNT(*) AS hitCount > FROM dfs.`log.httpd` > GROUP BY nearestDate(`eventDate`, '15MINUTE'){code} > Currently supports the following time units: > * YEAR > * QUARTER > * MONTH > * WEEK_SUNDAY > * WEEK_MONDAY > * DAY > * HOUR > * HALF_HOUR / 30MIN > * QUARTER_HOUR / 15MIN > * MINUTE > * 30SECOND > * 15SECOND > * SECOND > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7077) Add Function to Facilitate Time Series Analysis
[ https://issues.apache.org/jira/browse/DRILL-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800823#comment-16800823 ] ASF GitHub Bot commented on DRILL-7077: --- ihuzenko commented on pull request #1680: DRILL-7077: Add Function to Facilitate Time Series Analysis URL: https://github.com/apache/drill/pull/1680#discussion_r268714887 ## File path: contrib/udfs/src/main/java/org/apache/drill/exec/udfs/NearestDateUtils.java ## @@ -0,0 +1,149 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.udfs; + +import org.apache.drill.common.exceptions.DrillRuntimeException; + +import java.time.temporal.TemporalAdjusters; +import java.time.LocalDateTime; +import java.time.DayOfWeek; +import java.time.temporal.ChronoUnit; +import java.util.Arrays; + +public class NearestDateUtils { + /** + * Specifies the time grouping to be used with the nearest date function + */ + private enum TimeInterval { +YEAR, +QUARTER, +MONTH, +WEEK_SUNDAY, +WEEK_MONDAY, +DAY, +HOUR, +HALF_HOUR, +QUARTER_HOUR, +MINUTE, +HALF_MINUTE, +QUARTER_MINUTE, +SECOND + } + + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(NearestDateUtils.class); + + /** + * This function takes a Java LocalDateTime object, and an interval string and returns + * the nearest date closets to that time. For instance, if you specified the date as 2018-05-04 and YEAR, the function + * will return 2018-01-01 + * + * @param dthe original datetime before adjustments + * @param interval The interval string to deduct from the supplied date + * @return the modified LocalDateTime + */ + public final static java.time.LocalDateTime getDate(java.time.LocalDateTime d, String interval) { +java.time.LocalDateTime newDate = d; +int year = d.getYear(); +int month = d.getMonth().getValue(); +int day = d.getDayOfMonth(); +int hour = d.getHour(); +int minute = d.getMinute(); +int second = d.getSecond(); +TimeInterval adjustmentAmount; +try { + adjustmentAmount = TimeInterval.valueOf(interval.toUpperCase()); +} catch (IllegalArgumentException e) { + throw new DrillRuntimeException(String.format("[%s] is not a valid time statement. Expecting: %s", interval, Arrays.asList(TimeInterval.values(; +} +switch (adjustmentAmount) { + case YEAR: +newDate = LocalDateTime.of(year, 1, 1, 0, 0, 0); +break; + case QUARTER: +newDate = LocalDateTime.of(year, (month / 3) * 3 + 1, 1, 0, 0, 0); +break; + case MONTH: +newDate = LocalDateTime.of(year, month, 1, 0, 0, 0); +break; + case WEEK_SUNDAY: +newDate = newDate.with(TemporalAdjusters.previousOrSame(DayOfWeek.SUNDAY)) +.truncatedTo(ChronoUnit.DAYS); +break; + case WEEK_MONDAY: +newDate = newDate.with(TemporalAdjusters.previousOrSame(DayOfWeek.MONDAY)) +.truncatedTo(ChronoUnit.DAYS); +break; + case DAY: +newDate = LocalDateTime.of(year, month, day, 0, 0, 0); +break; + case HOUR: +newDate = LocalDateTime.of(year, month, day, hour, 0, 0); +break; + case HALF_HOUR: +if (minute >= 30) { + minute = 30; +} else { + minute = 0; +} +newDate = LocalDateTime.of(year, month, day, hour, minute, 0); +break; + case QUARTER_HOUR: +if (minute >= 45) { + minute = 45; +} else if (minute >= 30) { + minute = 30; +} else if (minute >= 15) { + minute = 15; +} else { + minute = 0; +} +newDate = LocalDateTime.of(year, month, day, hour, minute, 0); +break; + case MINUTE: +newDate = LocalDateTime.of(year, month, day, hour, minute, 0); +break; + case HALF_MINUTE: +if (second >= 30) { + second = 30; +} else { + second = 0; +} +newDate = LocalDateTime.of(year, month, day, hour, minute, second);
[jira] [Commented] (DRILL-7032) Ignore corrupt rows in a PCAP file
[ https://issues.apache.org/jira/browse/DRILL-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800813#comment-16800813 ] ASF GitHub Bot commented on DRILL-7032: --- cgivre commented on issue #1637: DRILL-7032: Ignore corrupt rows in a PCAP file URL: https://github.com/apache/drill/pull/1637#issuecomment-476254011 Commits squashed. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Ignore corrupt rows in a PCAP file > -- > > Key: DRILL-7032 > URL: https://issues.apache.org/jira/browse/DRILL-7032 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.15.0 > Environment: OS: Ubuntu 18.4 > Drill version: 1.15.0 > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) >Reporter: Giovanni Conte >Assignee: Charles Givre >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > Would be useful for Drill to have some ability to ignore corrupt rows in a > PCAP file instead of trow the java exception. > This is because there are many pcap files with corrupted lines and this > funcionality will avoid to do a pre-fixing of the packet-captures (example > attached file). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7032) Ignore corrupt rows in a PCAP file
[ https://issues.apache.org/jira/browse/DRILL-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800807#comment-16800807 ] ASF GitHub Bot commented on DRILL-7032: --- arina-ielchiieva commented on issue #1637: DRILL-7032: Ignore corrupt rows in a PCAP file URL: https://github.com/apache/drill/pull/1637#issuecomment-476251224 +1, LGTM. @cgivre please squash the commits. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Ignore corrupt rows in a PCAP file > -- > > Key: DRILL-7032 > URL: https://issues.apache.org/jira/browse/DRILL-7032 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.15.0 > Environment: OS: Ubuntu 18.4 > Drill version: 1.15.0 > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) >Reporter: Giovanni Conte >Assignee: Charles Givre >Priority: Major > Fix For: 1.16.0 > > > Would be useful for Drill to have some ability to ignore corrupt rows in a > PCAP file instead of trow the java exception. > This is because there are many pcap files with corrupted lines and this > funcionality will avoid to do a pre-fixing of the packet-captures (example > attached file). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-7032) Ignore corrupt rows in a PCAP file
[ https://issues.apache.org/jira/browse/DRILL-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-7032: Labels: ready-to-commit (was: ) > Ignore corrupt rows in a PCAP file > -- > > Key: DRILL-7032 > URL: https://issues.apache.org/jira/browse/DRILL-7032 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.15.0 > Environment: OS: Ubuntu 18.4 > Drill version: 1.15.0 > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) >Reporter: Giovanni Conte >Assignee: Charles Givre >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > Would be useful for Drill to have some ability to ignore corrupt rows in a > PCAP file instead of trow the java exception. > This is because there are many pcap files with corrupted lines and this > funcionality will avoid to do a pre-fixing of the packet-captures (example > attached file). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7032) Ignore corrupt rows in a PCAP file
[ https://issues.apache.org/jira/browse/DRILL-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800779#comment-16800779 ] ASF GitHub Bot commented on DRILL-7032: --- cgivre commented on pull request #1637: DRILL-7032: Ignore corrupt rows in a PCAP file URL: https://github.com/apache/drill/pull/1637#discussion_r268688621 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java ## @@ -324,7 +333,12 @@ public int getDst_port() { byte[] data = null; if (packetLength >= payloadDataStart) { data = new byte[packetLength - payloadDataStart]; - System.arraycopy(raw, ipOffset + payloadDataStart, data, 0, data.length); + try { +System.arraycopy(raw, ipOffset + payloadDataStart, data, 0, data.length); + } catch (Exception e) { +isCorrupt = true; +logger.info("Error while parsing PCAP data: ", e.getMessage()); Review comment: Thanks @arina-ielchiieva. Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Ignore corrupt rows in a PCAP file > -- > > Key: DRILL-7032 > URL: https://issues.apache.org/jira/browse/DRILL-7032 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.15.0 > Environment: OS: Ubuntu 18.4 > Drill version: 1.15.0 > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) >Reporter: Giovanni Conte >Assignee: Charles Givre >Priority: Major > Fix For: 1.16.0 > > > Would be useful for Drill to have some ability to ignore corrupt rows in a > PCAP file instead of trow the java exception. > This is because there are many pcap files with corrupted lines and this > funcionality will avoid to do a pre-fixing of the packet-captures (example > attached file). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800735#comment-16800735 ] ASF GitHub Bot commented on DRILL-7048: --- vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268648485 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ## @@ -107,6 +108,20 @@ public QueryContext(final UserSession session, final DrillbitContext drillbitCon this.table = drillbitContext.getOperatorTable(); } +// Checking for limit on ResultSet rowcount and if user attempting to override the system value +int sessionMaxRowCount = queryOptions.getOption(ExecConstants.QUERY_MAX_ROWS).num_val.intValue(); +int defaultMaxRowCount = queryOptions.getOptionManager(OptionScope.SYSTEM).getOption(ExecConstants.QUERY_MAX_ROWS).num_val.intValue(); +if (sessionMaxRowCount > 0 && defaultMaxRowCount > 0) { + this.autoLimitRowCount = Math.min(sessionMaxRowCount, defaultMaxRowCount); +} else { + this.autoLimitRowCount = Math.max(sessionMaxRowCount, defaultMaxRowCount); +} +if (autoLimitRowCount == defaultMaxRowCount && defaultMaxRowCount != sessionMaxRowCount) { + // Required to indicate via OptionScope=QueryLevel that session limit is overridden by system limit + queryOptions.setLocalOption(ExecConstants.QUERY_MAX_ROWS, autoLimitRowCount); +} +logger.debug("ResultSet size is auto-limited to {} rows [Session: {} / Default: {}]", this.autoLimitRowCount, sessionMaxRowCount, defaultMaxRowCount); Review comment: This message will be logged even for the case when auto limit does not happen. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800743#comment-16800743 ] ASF GitHub Bot commented on DRILL-7048: --- vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268658212 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillPreparedStatementImpl.java ## @@ -259,4 +261,17 @@ public void setObject(int parameterIndex, Object x, SQLType targetSqlType) throw checkOpen(); super.setObject(parameterIndex, x, targetSqlType); } + + @Override + public void setLargeMaxRows(long maxRowCount) throws SQLException { +Statement setMaxStmt = this.connection.createStatement(); +setMaxStmt.execute("ALTER SESSION SET `" + ExecConstants.QUERY_MAX_ROWS + "`="+maxRowCount); Review comment: ```suggestion setMaxStmt.execute("ALTER SESSION SET `" + ExecConstants.QUERY_MAX_ROWS + "`=" + maxRowCount); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800737#comment-16800737 ] ASF GitHub Bot commented on DRILL-7048: --- vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268650777 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java ## @@ -283,7 +283,9 @@ new OptionDefinition(ExecConstants.NDV_BLOOM_FILTER_FPOS_PROB_VALIDATOR), new OptionDefinition(ExecConstants.RM_QUERY_TAGS_VALIDATOR, new OptionMetaData(OptionValue.AccessibleScopes.SESSION_AND_QUERY, false, false)), - new OptionDefinition(ExecConstants.RM_QUEUES_WAIT_FOR_PREFERRED_NODES_VALIDATOR) + new OptionDefinition(ExecConstants.RM_QUEUES_WAIT_FOR_PREFERRED_NODES_VALIDATOR), + new OptionDefinition(ExecConstants.RETURN_RESULT_SET_FOR_DDL_VALIDATOR), Review comment: This one is already specified above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800739#comment-16800739 ] ASF GitHub Bot commented on DRILL-7048: --- vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268656547 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillCursor.java ## @@ -361,6 +361,9 @@ void close() { ExecConstants.JDBC_BATCH_QUEUE_THROTTLING_THRESHOLD ); resultsListener = new ResultsListener(this, batchQueueThrottlingThreshold); currentBatchHolder = new RecordBatchLoader(client.getAllocator()); + +// Set Query Timeout and MaxRows Review comment: Looks like `MaxRows` wasn't set here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800740#comment-16800740 ] ASF GitHub Bot commented on DRILL-7048: --- vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268656219 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/DrillStatement.java ## @@ -53,6 +53,28 @@ void setQueryTimeout( int seconds ) JdbcApiSqlException, SQLException; + /** + * @throws SQLException + *Any SQL exception + */ + @Override + int getMaxRows() throws SQLException; Review comment: The same question as above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800744#comment-16800744 ] ASF GitHub Bot commented on DRILL-7048: --- vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268660625 ## File path: exec/jdbc/src/test/java/org/apache/drill/jdbc/PreparedStatementTest.java ## @@ -72,12 +73,18 @@ public class PreparedStatementTest extends JdbcTestBase { private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(PreparedStatementTest.class); + private static final Random RANDOMIZER = new Random(20150304); private static final String SYS_VERSION_SQL = "select * from sys.version"; private static final String SYS_RANDOM_SQL = "SELECT cast(random() as varchar) as myStr FROM (VALUES(1)) " + "union SELECT cast(random() as varchar) as myStr FROM (VALUES(1)) " + "union SELECT cast(random() as varchar) as myStr FROM (VALUES(1)) "; + private static final String SYS_OPTIONS_SQL = "SELECT * FROM sys.options"; + private static final String SYS_OPTIONS_SQL_LIMIT_10 = "SELECT * FROM sys.options LIMIT 12"; + private static final String ALTER_SYS_OPTIONS_MAX_ROWS_LIMIT_X = "ALTER SYSTEM SET `"+ExecConstants.QUERY_MAX_ROWS+"`="; Review comment: ```suggestion private static final String ALTER_SYS_OPTIONS_MAX_ROWS_LIMIT_X = "ALTER SYSTEM SET `" + ExecConstants.QUERY_MAX_ROWS + "`="; ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800738#comment-16800738 ] ASF GitHub Bot commented on DRILL-7048: --- vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268655998 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/DrillPreparedStatement.java ## @@ -32,4 +33,25 @@ */ public interface DrillPreparedStatement extends PreparedStatement { + /** + * @throws SQLException + *Any SQL exception + */ + @Override + int getMaxRows() throws SQLException; Review comment: What was the reason for declaring this method and the method below?They both are already available in `Statement` interface. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800742#comment-16800742 ] ASF GitHub Bot commented on DRILL-7048: --- vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268658531 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ## @@ -270,4 +271,10 @@ public void setResultSet(AvaticaResultSet resultSet) { public void setUpdateCount(int value) { updateCount = value; } + + @Override + public void setLargeMaxRows(long maxRowCount) throws SQLException { +execute("ALTER SESSION SET `" + ExecConstants.QUERY_MAX_ROWS + "`="+maxRowCount); Review comment: And please fix spaces here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800736#comment-16800736 ] ASF GitHub Bot commented on DRILL-7048: --- vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268647133 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ## @@ -283,6 +298,21 @@ public RemoteFunctionRegistry getRemoteFunctionRegistry() { return drillbitContext.getRemoteFunctionRegistry(); } + /** + * Returns the maximum size of auto-limited resultset + * @return Maximum size of auto-limited resultSet + */ + public int getAutoLimitRowCount() { Review comment: Please remove this method and method below and replace their usage by `getOptions().getOption...` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7048) Implement JDBC Statement.setMaxRows() with System Option
[ https://issues.apache.org/jira/browse/DRILL-7048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800741#comment-16800741 ] ASF GitHub Bot commented on DRILL-7048: --- vvysotskyi commented on pull request #1714: DRILL-7048: Implement JDBC Statement.setMaxRows() with System Option URL: https://github.com/apache/drill/pull/1714#discussion_r268659959 ## File path: exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ## @@ -270,4 +271,10 @@ public void setResultSet(AvaticaResultSet resultSet) { public void setUpdateCount(int value) { updateCount = value; } + + @Override + public void setLargeMaxRows(long maxRowCount) throws SQLException { +execute("ALTER SESSION SET `" + ExecConstants.QUERY_MAX_ROWS + "`="+maxRowCount); +this.maxRowCount = maxRowCount; Review comment: Looks like `maxRowCount` is taken from avatica. Does it provide similar functionality we want to implement? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Implement JDBC Statement.setMaxRows() with System Option > > > Key: DRILL-7048 > URL: https://issues.apache.org/jira/browse/DRILL-7048 > Project: Apache Drill > Issue Type: New Feature > Components: Client - JDBC, Query Planning & Optimization >Affects Versions: 1.15.0 >Reporter: Kunal Khatua >Assignee: Kunal Khatua >Priority: Major > Labels: doc-impacting > Fix For: 1.17.0 > > > With DRILL-6960, the webUI will get an auto-limit on the number of results > fetched. > Since more of the plumbing is already there, it makes sense to provide the > same for the JDBC client. > In addition, it would be nice if the Server can have a pre-defined value as > well (default 0; i.e. no limit) so that an _admin_ would be able to ensure a > max limit on the resultset size as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7011) Allow hybrid model in the Row set-based scan framework
[ https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800645#comment-16800645 ] ASF GitHub Bot commented on DRILL-7011: --- arina-ielchiieva commented on pull request #1711: DRILL-7011: Support schema in scan framework URL: https://github.com/apache/drill/pull/1711#discussion_r268603780 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/text/compliant/TestCsvWithSchema.java ## @@ -82,6 +167,468 @@ public void testSchema() throws Exception { .addRow(10, new LocalDate(2019, 3, 20), "it works!", 1234.5D, 20L, "") .build(); RowSetUtilities.verify(expected, actual); +} finally { + resetV3(); + resetSchema(); +} + } + + + /** + * Use a schema with explicit projection to get a consistent view + * of the table schema, even if columns are missing, rows are ragged, + * and column order changes. + * + * Force the scans to occur in distinct fragments so the order of the + * file batches is random. + */ + @Test + public void testMultiFileSchema() throws Exception { +RowSet expected1 = null; +RowSet expected2 = null; +try { + enableV3(true); + enableSchema(true); + enableMultiScan(); + String tablePath = buildTwoFileTable("multiFileSchema", raggedMulti1Contents, reordered2Contents); + run(SCHEMA_SQL, tablePath); + + // Wildcard expands to union of schema + table. In this case + // all table columns appear in the schema (though not all schema + // columns appear in the table.) + + String sql = "SELECT id, `name`, `date`, gender, comment FROM " + tablePath; + TupleMetadata expectedSchema = new SchemaBuilder() + .add("id", MinorType.INT) + .add("name", MinorType.VARCHAR) + .addNullable("date", MinorType.DATE) + .add("gender", MinorType.VARCHAR) + .add("comment", MinorType.VARCHAR) + .buildSchema(); + expected1 = new RowSetBuilder(client.allocator(), expectedSchema) + .addRow(1, "arina", new LocalDate(2019, 1, 18), "female", "ABC") + .addRow(2, "javan", new LocalDate(2019, 1, 19), "male", "ABC") + .addRow(4, "albert", new LocalDate(2019, 5, 4), "", "ABC") + .build(); + expected2 = new RowSetBuilder(client.allocator(), expectedSchema) + .addRow(3, "bob", new LocalDate(2001, 1, 16), "NA", "ABC") + .build(); + + // Loop 10 times so that, as the two reader fragments read the two + // files, we end up with (acceptable) races that read the files in + // random order. + + for (int i = 0; i < 10; i++) { +boolean sawSchema = false; +boolean sawFile1 = false; +boolean sawFile2 = false; +Iterator iter = client.queryBuilder().sql(sql).rowSetIterator(); +while (iter.hasNext()) { + RowSet result = iter.next(); + if (result.rowCount() == 3) { +sawFile1 = true; +new RowSetComparison(expected1).verifyAndClear(result); + } else if (result.rowCount() == 1) { +sawFile2 = true; +new RowSetComparison(expected2).verifyAndClear(result); + } else { +assertEquals(0, result.rowCount()); +sawSchema = true; + } +} +assertTrue(sawSchema); +assertTrue(sawFile1); +assertTrue(sawFile2); + } +} finally { + expected1.clear(); + expected2.clear(); + client.resetSession(ExecConstants.ENABLE_V3_TEXT_READER_KEY); + client.resetSession(ExecConstants.STORE_TABLE_USE_SCHEMA_FILE); + client.resetSession(ExecConstants.MIN_READER_WIDTH_KEY); +} + } + + /** + * Test the schema we get in V2 when the table read order is random. + * Worst-case: the two files have different column counts and + * column orders. + * + * Though the results are random, we iterate 10 times which, in most runs, + * shows the random variation in schemas: + * + * Sometimes the first batch has three columns, sometimes four. + * Sometimes the column `id` is in position 0, sometimes in position 1 + * (correlated with the above). + * Due to the fact that sometimes the first file (with four columns) + * is returned first, sometimes the second file (with three columns) is + * returned first. + * + */ + @Test + public void testSchemaRaceV2() throws Exception { +try { + enableV3(false); + enableSchema(false); + enableMultiScan(); + String tablePath = buildTwoFileTable("schemaRaceV2", multi1Contents, reordered2Contents); + boolean sawFile1First = false; + boolean sawFile2First = false; + boolean sawFullSchema = false; + boolean sawPartialSchema = false; + boolean sawIdAsCol0 = false; + boolean sawIdAsCol1 = false; + String sql = "SELECT * FRO
[jira] [Commented] (DRILL-7011) Allow hybrid model in the Row set-based scan framework
[ https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800618#comment-16800618 ] ASF GitHub Bot commented on DRILL-7011: --- arina-ielchiieva commented on issue #1711: DRILL-7011: Support schema in scan framework URL: https://github.com/apache/drill/pull/1711#issuecomment-476166889 @paul-rogers Actually when I was presenting the schema provisioning design, there were a proposal to add schema property `drill.is_full_schema`. By default it’s `false`, thus we assume that schema is partial. If user wants to indicate that schema is strict and to ignore all columns except of those indicated in schema, he needs to create schema the following way: `create schema (col int) for table dfs.tmp.t. properties ('drill.is_full_schema' = 'true')` Since most of the `default` property problems are related to star queries, we can state the following: 1. For queries with defined list of columns (aka projection queries: `select id, name from t`), we apply schema consistently. 2. For star queries and when schema property `drill.is_full_schema` is set to `false`, we might get inconsistent results with default values but it's ok since we discover schema on the read. 3. For star queries and when schema property `drill.is_full_schema` is set to `true`, we project only those columns indicated in schema. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow hybrid model in the Row set-based scan framework > -- > > Key: DRILL-7011 > URL: https://issues.apache.org/jira/browse/DRILL-7011 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 >Reporter: Arina Ielchiieva >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > As part of schema provisioning project we want to allow hybrid model for Row > set-based scan framework, namely to allow to pass custom schema metadata > which can be partial. > Currently schema provisioning has SchemaContainer class that contains the > following information (can be obtained from metastore, schema file, table > function): > 1. schema represented by org.apache.drill.exec.record.metadata.TupleMetadata > 2. properties represented by Map, can contain information if > schema is strict or partial (default is partial) etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7011) Allow hybrid model in the Row set-based scan framework
[ https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800617#comment-16800617 ] ASF GitHub Bot commented on DRILL-7011: --- arina-ielchiieva commented on issue #1711: DRILL-7011: Support schema in scan framework URL: https://github.com/apache/drill/pull/1711#issuecomment-476166889 @paul-rogers Actually when I was presenting the schema provisioning design, there were a proposal to add schema property `drill.is_full_schema`. By default it’s `false`, thus we assume that schema is partial. If user wants to indicate that schema is strict and to ignore all columns except of those indicated in schema, he needs to create schema the following way: `create schema (col int) properties ('drill.is_full_schema' = 'true')` Since most of the `default` property problems are related to star queries, we can state the following: 1. For queries with defined list of columns (aka projection queries: `select id, name from t`), we apply schema consistently. 2. For star queries and when schema property `drill.is_full_schema` is set to `false`, we might get inconsistent results with default values but it's ok since we discover schema on the read. 3. For star queries and when schema property `drill.is_full_schema` is set to `true`, we project only those columns indicated in schema. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Allow hybrid model in the Row set-based scan framework > -- > > Key: DRILL-7011 > URL: https://issues.apache.org/jira/browse/DRILL-7011 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.15.0 >Reporter: Arina Ielchiieva >Assignee: Paul Rogers >Priority: Major > Fix For: 1.16.0 > > > As part of schema provisioning project we want to allow hybrid model for Row > set-based scan framework, namely to allow to pass custom schema metadata > which can be partial. > Currently schema provisioning has SchemaContainer class that contains the > following information (can be obtained from metastore, schema file, table > function): > 1. schema represented by org.apache.drill.exec.record.metadata.TupleMetadata > 2. properties represented by Map, can contain information if > schema is strict or partial (default is partial) etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7011) Allow hybrid model in the Row set-based scan framework
[ https://issues.apache.org/jira/browse/DRILL-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800610#comment-16800610 ] ASF GitHub Bot commented on DRILL-7011: --- arina-ielchiieva commented on pull request #1711: DRILL-7011: Support schema in scan framework URL: https://github.com/apache/drill/pull/1711#discussion_r268603780 ## File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/text/compliant/TestCsvWithSchema.java ## @@ -82,6 +167,468 @@ public void testSchema() throws Exception { .addRow(10, new LocalDate(2019, 3, 20), "it works!", 1234.5D, 20L, "") .build(); RowSetUtilities.verify(expected, actual); +} finally { + resetV3(); + resetSchema(); +} + } + + + /** + * Use a schema with explicit projection to get a consistent view + * of the table schema, even if columns are missing, rows are ragged, + * and column order changes. + * + * Force the scans to occur in distinct fragments so the order of the + * file batches is random. + */ + @Test + public void testMultiFileSchema() throws Exception { +RowSet expected1 = null; +RowSet expected2 = null; +try { + enableV3(true); + enableSchema(true); + enableMultiScan(); + String tablePath = buildTwoFileTable("multiFileSchema", raggedMulti1Contents, reordered2Contents); + run(SCHEMA_SQL, tablePath); + + // Wildcard expands to union of schema + table. In this case + // all table columns appear in the schema (though not all schema + // columns appear in the table.) + + String sql = "SELECT id, `name`, `date`, gender, comment FROM " + tablePath; + TupleMetadata expectedSchema = new SchemaBuilder() + .add("id", MinorType.INT) + .add("name", MinorType.VARCHAR) + .addNullable("date", MinorType.DATE) + .add("gender", MinorType.VARCHAR) + .add("comment", MinorType.VARCHAR) + .buildSchema(); + expected1 = new RowSetBuilder(client.allocator(), expectedSchema) + .addRow(1, "arina", new LocalDate(2019, 1, 18), "female", "ABC") + .addRow(2, "javan", new LocalDate(2019, 1, 19), "male", "ABC") + .addRow(4, "albert", new LocalDate(2019, 5, 4), "", "ABC") + .build(); + expected2 = new RowSetBuilder(client.allocator(), expectedSchema) + .addRow(3, "bob", new LocalDate(2001, 1, 16), "NA", "ABC") + .build(); + + // Loop 10 times so that, as the two reader fragments read the two + // files, we end up with (acceptable) races that read the files in + // random order. + + for (int i = 0; i < 10; i++) { +boolean sawSchema = false; +boolean sawFile1 = false; +boolean sawFile2 = false; +Iterator iter = client.queryBuilder().sql(sql).rowSetIterator(); +while (iter.hasNext()) { + RowSet result = iter.next(); + if (result.rowCount() == 3) { +sawFile1 = true; +new RowSetComparison(expected1).verifyAndClear(result); + } else if (result.rowCount() == 1) { +sawFile2 = true; +new RowSetComparison(expected2).verifyAndClear(result); + } else { +assertEquals(0, result.rowCount()); +sawSchema = true; + } +} +assertTrue(sawSchema); +assertTrue(sawFile1); +assertTrue(sawFile2); + } +} finally { + expected1.clear(); + expected2.clear(); + client.resetSession(ExecConstants.ENABLE_V3_TEXT_READER_KEY); + client.resetSession(ExecConstants.STORE_TABLE_USE_SCHEMA_FILE); + client.resetSession(ExecConstants.MIN_READER_WIDTH_KEY); +} + } + + /** + * Test the schema we get in V2 when the table read order is random. + * Worst-case: the two files have different column counts and + * column orders. + * + * Though the results are random, we iterate 10 times which, in most runs, + * shows the random variation in schemas: + * + * Sometimes the first batch has three columns, sometimes four. + * Sometimes the column `id` is in position 0, sometimes in position 1 + * (correlated with the above). + * Due to the fact that sometimes the first file (with four columns) + * is returned first, sometimes the second file (with three columns) is + * returned first. + * + */ + @Test + public void testSchemaRaceV2() throws Exception { +try { + enableV3(false); + enableSchema(false); + enableMultiScan(); + String tablePath = buildTwoFileTable("schemaRaceV2", multi1Contents, reordered2Contents); + boolean sawFile1First = false; + boolean sawFile2First = false; + boolean sawFullSchema = false; + boolean sawPartialSchema = false; + boolean sawIdAsCol0 = false; + boolean sawIdAsCol1 = false; + String sql = "SELECT * FRO
[jira] [Commented] (DRILL-7049) REST API returns the toString of byte arrays (VARBINARY types)
[ https://issues.apache.org/jira/browse/DRILL-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800538#comment-16800538 ] ASF GitHub Bot commented on DRILL-7049: --- vdiravka commented on pull request #1672: DRILL-7049 return VARBINARY as a string with escaped non printable bytes URL: https://github.com/apache/drill/pull/1672#discussion_r268565353 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/util/ValueVectorElementFormatter.java ## @@ -52,28 +52,47 @@ public ValueVectorElementFormatter(OptionManager options) { * @return the formatted value, null if failed */ public String format(Object value, TypeProtos.MinorType minorType) { +boolean handled = false; + String str = null; switch (minorType) { case TIMESTAMP: if (value instanceof LocalDateTime) { - return format((LocalDateTime) value, + handled = true; + str = format((LocalDateTime) value, options.getString(ExecConstants.WEB_DISPLAY_FORMAT_TIMESTAMP), (v, p) -> v.format(getTimestampFormatter(p))); } +break; case DATE: if (value instanceof LocalDate) { - return format((LocalDate) value, + handled = true; + str = format((LocalDate) value, options.getString(ExecConstants.WEB_DISPLAY_FORMAT_DATE), (v, p) -> v.format(getDateFormatter(p))); } +break; case TIME: if (value instanceof LocalTime) { - return format((LocalTime) value, + handled = true; + str = format((LocalTime) value, options.getString(ExecConstants.WEB_DISPLAY_FORMAT_TIME), (v, p) -> v.format(getTimeFormatter(p))); } - default: -return value.toString(); +break; + case VARBINARY: +if (value instanceof byte[]) { + handled = true; + byte[] bytes = (byte[]) value; + str = org.apache.drill.common.util.DrillStringUtils.toBinaryString(bytes); +} +break; +} + +if (!handled) { Review comment: It looks like current code execution is the same as in your PR. But logic from PR is more complex: additional flag `handled`, breaks in switch statements... I think we can leave current code from Drill master and add your `case VARBINARY`. Seems it will be enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > REST API returns the toString of byte arrays (VARBINARY types) > -- > > Key: DRILL-7049 > URL: https://issues.apache.org/jira/browse/DRILL-7049 > Project: Apache Drill > Issue Type: Bug > Components: Server, Web Server >Affects Versions: 1.15.0 >Reporter: jean-claude >Priority: Minor > Fix For: 1.16.0 > > > Doing a query using the REST API will return VARBINARY columns as a Java byte > array hashcode instead of the actual data of the VARBINARY. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7051) Upgrade jetty
[ https://issues.apache.org/jira/browse/DRILL-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800519#comment-16800519 ] ASF GitHub Bot commented on DRILL-7051: --- vdiravka commented on pull request #1681: DRILL-7051: Upgrade jetty URL: https://github.com/apache/drill/pull/1681#discussion_r268551237 ## File path: pom.xml ## @@ -85,6 +85,7 @@ 0.9.10 1.8.2 4.0.2 +9.4.15.v20190215 Review comment: Finally jetty 9.3 is chosen for Drill. Jetty dependencies are used in `java-exec pom.xml`, but I've left versions control in `dependencyManagement` block of root POM to avoid picking invalid jetty version by maven, in case when some libs will have other version. For instance we can't exclude jetty from `hadoop-common` and `hbase` dependencies. But they have different jetty minor versions: [9.3.24.v20180605](https://github.com/apache/hadoop/blob/trunk/hadoop-project/pom.xml#L38) for Hadoop [9.3.19.v20170502](https://github.com/apache/hbase/blob/rel/2.1.0/pom.xml#L1352) for HBase 2.1 and [9.3.25.v20180904](https://github.com/apache/hbase/blob/master/pom.xml#L1529) for master HBase version. I didn't find any API compatibility differences between these jetty minor versions (only 9.4 has it). Possibly in future we can consider shade Jetty version in Drill, [DRILL-7135](https://issues.apache.org/jira/browse/DRILL-7135). Not sure that is necessary for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Upgrade jetty > -- > > Key: DRILL-7051 > URL: https://issues.apache.org/jira/browse/DRILL-7051 > Project: Apache Drill > Issue Type: Improvement > Components: Web Server >Affects Versions: 1.15.0 >Reporter: Veera Naranammalpuram >Assignee: Vitalii Diravka >Priority: Major > Fix For: 1.16.0 > > > Is Drill using a version of jetty web server that's really old? The jar's > suggest it's using jetty 9.1 that was built sometime in 2014? > {noformat} > -rw-r--r-- 1 veeranaranammalpuram staff 15988 Nov 20 2017 > jetty-continuation-9.1.1.v20140108.jar > -rw-r--r-- 1 veeranaranammalpuram staff 103288 Nov 20 2017 > jetty-http-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 101519 Nov 20 2017 > jetty-io-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 95906 Nov 20 2017 > jetty-security-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 401593 Nov 20 2017 > jetty-server-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 110992 Nov 20 2017 > jetty-servlet-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 119215 Nov 20 2017 > jetty-servlets-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 341683 Nov 20 2017 > jetty-util-9.1.5.v20140505.jar > -rw-r--r-- 1 veeranaranammalpuram staff 38707 Dec 21 15:42 > jetty-util-ajax-9.3.19.v20170502.jar > -rw-r--r-- 1 veeranaranammalpuram staff 111466 Nov 20 2017 > jetty-webapp-9.1.1.v20140108.jar > -rw-r--r-- 1 veeranaranammalpuram staff 41763 Nov 20 2017 > jetty-xml-9.1.1.v20140108.jar {noformat} > This version is shown as deprecated: > [https://www.eclipse.org/jetty/documentation/current/what-jetty-version.html#d0e203] > Opening this to upgrade jetty to the latest stable supported version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7032) Ignore corrupt rows in a PCAP file
[ https://issues.apache.org/jira/browse/DRILL-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800459#comment-16800459 ] ASF GitHub Bot commented on DRILL-7032: --- arina-ielchiieva commented on pull request #1637: DRILL-7032: Ignore corrupt rows in a PCAP file URL: https://github.com/apache/drill/pull/1637#discussion_r268515281 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java ## @@ -324,7 +333,12 @@ public int getDst_port() { byte[] data = null; if (packetLength >= payloadDataStart) { data = new byte[packetLength - payloadDataStart]; - System.arraycopy(raw, ipOffset + payloadDataStart, data, 0, data.length); + try { +System.arraycopy(raw, ipOffset + payloadDataStart, data, 0, data.length); + } catch (Exception e) { +isCorrupt = true; +logger.info("Error while parsing PCAP data: ", e.getMessage()); Review comment: I think log info will produce error for each corrupt row and log file can grow enormously. I guess this should be debug, you can also include trace for the full exception: ``` String message = "Error while parsing PCAP data: {}"; logger.debug(message, e.getMessage()); logger.trace(message, e); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Ignore corrupt rows in a PCAP file > -- > > Key: DRILL-7032 > URL: https://issues.apache.org/jira/browse/DRILL-7032 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.15.0 > Environment: OS: Ubuntu 18.4 > Drill version: 1.15.0 > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) >Reporter: Giovanni Conte >Assignee: Charles Givre >Priority: Major > Fix For: 1.16.0 > > > Would be useful for Drill to have some ability to ignore corrupt rows in a > PCAP file instead of trow the java exception. > This is because there are many pcap files with corrupted lines and this > funcionality will avoid to do a pre-fixing of the packet-captures (example > attached file). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-7032) Ignore corrupt rows in a PCAP file
[ https://issues.apache.org/jira/browse/DRILL-7032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800453#comment-16800453 ] ASF GitHub Bot commented on DRILL-7032: --- arina-ielchiieva commented on pull request #1637: DRILL-7032: Ignore corrupt rows in a PCAP file URL: https://github.com/apache/drill/pull/1637#discussion_r268515281 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/pcap/decoder/Packet.java ## @@ -324,7 +333,12 @@ public int getDst_port() { byte[] data = null; if (packetLength >= payloadDataStart) { data = new byte[packetLength - payloadDataStart]; - System.arraycopy(raw, ipOffset + payloadDataStart, data, 0, data.length); + try { +System.arraycopy(raw, ipOffset + payloadDataStart, data, 0, data.length); + } catch (Exception e) { +isCorrupt = true; +logger.info("Error while parsing PCAP data: ", e.getMessage()); Review comment: I think log info will produce error for each corrupt row and log file can grow enormously. I guess this should be debug, you can also include trace for the full exception: ``` String message = "Error while parsing PCAP data: "; logger.debug(message, e.getMessage()); logger.trace(message, e); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Ignore corrupt rows in a PCAP file > -- > > Key: DRILL-7032 > URL: https://issues.apache.org/jira/browse/DRILL-7032 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Affects Versions: 1.15.0 > Environment: OS: Ubuntu 18.4 > Drill version: 1.15.0 > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) >Reporter: Giovanni Conte >Assignee: Charles Givre >Priority: Major > Fix For: 1.16.0 > > > Would be useful for Drill to have some ability to ignore corrupt rows in a > PCAP file instead of trow the java exception. > This is because there are many pcap files with corrupted lines and this > funcionality will avoid to do a pre-fixing of the packet-captures (example > attached file). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (DRILL-6970) Issue with LogRegex format plugin where drillbuf was overflowing
[ https://issues.apache.org/jira/browse/DRILL-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva updated DRILL-6970: Labels: ready-to-commit (was: ) > Issue with LogRegex format plugin where drillbuf was overflowing > - > > Key: DRILL-6970 > URL: https://issues.apache.org/jira/browse/DRILL-6970 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.15.0 >Reporter: jean-claude >Assignee: jean-claude >Priority: Major > Labels: ready-to-commit > Fix For: 1.16.0 > > > The log format plugin does re-allocate the drillbuf when it fills up. You can > query small log files but larger ones will fail with this error: > 0: jdbc:drill:zk=local> select * from dfs.root.`/prog/test.log`; > Error: INTERNAL_ERROR ERROR: index: 32724, length: 108 (expected: range(0, > 32768)) > Fragment 0:0 > Please, refer to logs for more information. > > I'm running drill-embeded. The log storage plugin is configured like so > {code:java} > "log": { > "type": "logRegex", > "regex": "(.+)", > "extension": "log", > "maxErrors": 10, > "schema": [ > { > "fieldName": "line" > } > ] > }, > {code} > The log files is very simple > {code:java} > jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk > jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk > jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk > jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk > jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk > jdsaljfldaksjfldsajfldasjflkjdsfldsjfljsdalfk > ...{code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)