[GitHub] kkhatua commented on issue #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
kkhatua commented on issue #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#issuecomment-460930334 Let me take a look at `WebUserConnection` . I want to make sure that we don't hold all the results indefinitely if we get a hint (the autoLimit) that the entire resultset will not be consumed. From the description of the class, it looks like it might be able to do that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cgivre commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin
cgivre commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin URL: https://github.com/apache/drill/pull/1530#discussion_r254135268 ## File path: contrib/format-syslog/src/main/resources/checkstyle-config.xml ## @@ -0,0 +1,40 @@ + + +http://www.puppycrawl.com/dtds/configuration_1_2.dtd";> + + Review comment: Removed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] cgivre commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin
cgivre commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin URL: https://github.com/apache/drill/pull/1530#discussion_r254135160 ## File path: contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogFormatConfig.java ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.syslog; + +import com.fasterxml.jackson.annotation.JsonTypeName; +import com.google.common.base.Objects; +import org.apache.drill.common.logical.FormatPluginConfig; + +import java.util.Arrays; +import java.util.List; +import java.util.ArrayList; + +@JsonTypeName("syslog") +public class SyslogFormatConfig implements FormatPluginConfig { + + public List extensions; + public int maxErrors = 10; + public boolean flattenStructuredData = false; + + public boolean getFlattenStructuredData() { +return flattenStructuredData; + } + + public int getMaxErrors() { +return maxErrors; + } + + public List getExtensions() { +return extensions; + } + + public void setExtensions(List ext) { +this.extensions = ext; + } + + public void setExtension(String ext) { +if (this.extensions == null) { + this.extensions = new ArrayList(); +} +this.extensions.add(ext); + } + + public void setMaxErrors(int errors) { +this.maxErrors = errors; + } + + public void setFlattenStructuredData(boolean flattenErrors) { +this.flattenStructuredData = flattenErrors; + } + + @Override + public boolean equals(Object obj) { +if (this == obj) { + return true; +} +if (obj == null || getClass() != obj.getClass()) { + return false; +} +SyslogFormatConfig other = (SyslogFormatConfig) obj; +return Objects.equal(extensions, other.extensions) && +Objects.equal(maxErrors, other.maxErrors) && +Objects.equal(flattenStructuredData, other.flattenStructuredData); + } + + @Override + public int hashCode() { +return Arrays.hashCode(new Object[]{maxErrors, flattenStructuredData, extensions}); + } + +} Review comment: I think I got them all this time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-7029) Move LogRegex Format Plugin to Contrib
Charles Givre created DRILL-7029: Summary: Move LogRegex Format Plugin to Contrib Key: DRILL-7029 URL: https://issues.apache.org/jira/browse/DRILL-7029 Project: Apache Drill Issue Type: Task Reporter: Charles Givre Assignee: Charles Givre -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] cgivre commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin
cgivre commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin URL: https://github.com/apache/drill/pull/1530#discussion_r254134574 ## File path: contrib/format-syslog/README.md ## @@ -0,0 +1,41 @@ +# Syslog Format Plugin Review comment: Hi @vdiravka , I meant `contrib`. Anyway, here's the link to the JIRA ticket: https://issues.apache.org/jira/browse/DRILL-7029. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r254118821 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ## @@ -84,9 +85,14 @@ private boolean closed = false; private DrillOperatorTable table; - public QueryContext(final UserSession session, final DrillbitContext drillbitContext, QueryId queryId) { + public QueryContext(UserSession session, DrillbitContext drillbitContext, QueryId queryId) { +this(session, drillbitContext, queryId, null); + } + + public QueryContext(UserSession session, DrillbitContext drillbitContext, QueryId queryId, Integer autoLimit) { Review comment: I'm effectively creating an additional constructor with a modified signature, so as to not change anything else. We cant go the "query option" solution, so let me see if I can do without it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r254115477 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java ## @@ -183,4 +183,20 @@ private static PhysicalPlan getQueryPlan(QueryContext context, String sql, Point return handler.getPlan(sqlNode); } + + //Wrap with Auto Limit + private static SqlNode wrapWithAutoLimit(SqlConverter parser, QueryContext context, String sql) { +SqlNode sqlNode = parser.parse(sql); Review comment: This helps a lot! I've no knowledge of the Calcite SqlNodes, so this snippet helps. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r254106628 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/profile/ProfileWrapper.java ## @@ -150,6 +150,14 @@ private long tallyMajorFragmentCost(List majorFragments) { return globalProcessNanos; } + public boolean hasAutoLimit() { +return profile.hasAutoLimit(); + } + + public int getAutoLimit() { +return profile.getAutoLimit(); Review comment: My statement above... I don't think the query option is the right thing to do, since this is a WebUI interaction feature. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r254106698 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/user/UserWorker.java ## @@ -73,9 +73,13 @@ private static QueryId queryIdGenerator() { } public QueryId submitWork(UserClientConnection connection, RunQuery query) { +return submitWork(connection, query, null); + } + + public QueryId submitWork(UserClientConnection connection, RunQuery query, Integer autoLimitRowCount) { Review comment: Agreed.. let me take a look at modifying RunQuery This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r254106312 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ## @@ -136,15 +136,30 @@ */ public Foreman(final WorkerBee bee, final DrillbitContext drillbitContext, final UserClientConnection connection, final QueryId queryId, final RunQuery queryRequest) { +this(bee, drillbitContext, connection, queryId, queryRequest, null); + } + + /** + * Constructor. Sets up the Foreman, but does not initiate any execution. + * + * @param bee work manager (runs fragments) + * @param drillbitContext drillbit context + * @param connection connection + * @param queryId the id for the query + * @param queryRequest the query to execute + * @param autoLimitRowCount the number of rows to limit in execution (used for WebRequests) + */ + public Foreman(final WorkerBee bee, final DrillbitContext drillbitContext, Review comment: I was reluctant to make too many changes to the protobuf, so I avoided modifying RunQuery. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r254106312 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ## @@ -136,15 +136,30 @@ */ public Foreman(final WorkerBee bee, final DrillbitContext drillbitContext, final UserClientConnection connection, final QueryId queryId, final RunQuery queryRequest) { +this(bee, drillbitContext, connection, queryId, queryRequest, null); + } + + /** + * Constructor. Sets up the Foreman, but does not initiate any execution. + * + * @param bee work manager (runs fragments) + * @param drillbitContext drillbit context + * @param connection connection + * @param queryId the id for the query + * @param queryRequest the query to execute + * @param autoLimitRowCount the number of rows to limit in execution (used for WebRequests) + */ + public Foreman(final WorkerBee bee, final DrillbitContext drillbitContext, Review comment: I was reluctant to make too many changes to the protobuf, so I avoided modifying RunQuery. I can try looking at the work in changing the protobuf for this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
kkhatua commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r254105360 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ## @@ -273,6 +279,29 @@ public RemoteFunctionRegistry getRemoteFunctionRegistry() { return drillbitContext.getRemoteFunctionRegistry(); } + /** + * Check if auto-limiting of resultset is enabled + * @return True if auto-limit is enabled + */ + public boolean isAutoLimitEnabled() { +return autoLimitRowCount != null; + } + + /** + * Returns the maximum size of auto-limited resultset + * @return Maximum size of auto-limited resultSet + */ + public Integer getAutoLimitRowCount() { Review comment: The feature is specifically for WebUI submitted queries. A query option doesn't make sense because WebUI queries are "session-less" and the only reason we want to wrap with a limit is to prevent the Drillbit from being overloaded with holding an entire result set that does not get completely consumed by the HTTP client. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-7028) Reduce the planning time of queries on large Parquet tables with large metadata cache files
Venkata Jyothsna Donapati created DRILL-7028: Summary: Reduce the planning time of queries on large Parquet tables with large metadata cache files Key: DRILL-7028 URL: https://issues.apache.org/jira/browse/DRILL-7028 Project: Apache Drill Issue Type: Improvement Components: Metadata Reporter: Venkata Jyothsna Donapati Assignee: Venkata Jyothsna Donapati Fix For: 1.16.0 If the Parquet table has a large number of small files, the metadata cache files grow larger and the planner tries to read the large metadata cache file which leads to the planning time overhead. Most of the time of execution is spent during the planning phase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-7027) TPCH Queries hit IOB when planner.enable_demux_exchange = true
Karthikeyan Manivannan created DRILL-7027: - Summary: TPCH Queries hit IOB when planner.enable_demux_exchange = true Key: DRILL-7027 URL: https://issues.apache.org/jira/browse/DRILL-7027 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Reporter: Karthikeyan Manivannan Assignee: Karthikeyan Manivannan Runing TPCH queries on SF100 dataset, a few queries (13, 14, and 19) hit IOB exception: {code} java.sql.SQLException: SYSTEM ERROR: IndexOutOfBoundsException: index 154 Fragment 7:0 Please, refer to logs for more information. [Error Id: e312dc77-0cad-4bc0-b90e-fb0d477ef272 on ucs-node2.perf.lab:31010] at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:528) at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:632) at org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217) at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:151) at PipSQueak.fetchRows(PipSQueak.java:346) at PipSQueak.runTest(PipSQueak.java:113) at PipSQueak.main(PipSQueak.java:477) Caused by: org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IndexOutOfBoundsException: index 154 Fragment 7:0 Please, refer to logs for more information. [Error Id: e312dc77-0cad-4bc0-b90e-fb0d477ef272 on ucs-node2.perf.lab:31010] at org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422) at org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96) at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273) at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) at io.netty.channel.nio.NioE
[GitHub] cgivre opened a new pull request #1635: DRILL-7021: HTTPD Throws NPE and Doesn't Recognize Timeformat
cgivre opened a new pull request #1635: DRILL-7021: HTTPD Throws NPE and Doesn't Recognize Timeformat URL: https://github.com/apache/drill/pull/1635 This PR addresses several issues with the HTTPD format plugin. 1. Several fields were missing in the implementation which were causing a NPE if you included them explicitly in a query 2. The format plugin was not parsing time stamps correctly and projecting the fields as timestamps 3. The `request_user-agent` field had a dash in it which, if not escaped correctly, would cause problems in queries. 4. The unit tests were not satisfactory 5. The default format specified in the `bootstrap.json` file omitted the user agent string which most HTTPD logs do include. This PR fixes all these issues and updates the HTTPD log parser to version 5.2 which will accept Nginx logs. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kfaraaz opened a new pull request #1634: DRILL-6979: Added autofocus attribute to username on login page, and to query tex…
kfaraaz opened a new pull request #1634: DRILL-6979: Added autofocus attribute to username on login page, and to query tex… URL: https://github.com/apache/drill/pull/1634 …tbox on Query tab @sohami @kkhatua Can you please review ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Apache Drill Hangout - 05 Feb, 2019
Hi Drillers, The bi-weekly Apache Drill hangout is scheduled for today, Tuesday, Feb 5th, at 10 AM PST. The original plan is for Sorabh & Hanumath to talk about Resource Management. If there are any other topics or questions, feel free to reply or raise during the hangout. The hangout link: https://meet.google.com/yki-iqdf-tai?authuser=3 P.s. Sorry for late notification.
[GitHub] vvysotskyi commented on issue #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
vvysotskyi commented on issue #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#issuecomment-460669449 Regarding this fix and fix for DRILL-6050. I think allowing web-client to rewrite the query is not the best approach. A user may want to submit the query to see its profile for example, but with this feature, the profile will not correspond to the original query. Besides that, this fix does a lot of changes for web client on the server side. It would be better to address this issue by limiting rows which should be obtained from the result set on the client side (see [WebUserConnection.sendData()](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/WebUserConnection.java#L138) method). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Re: [DISCUSS] Format plugins in contrib module
Absolutely agree with Arina. I think the core Format Plugins for Parquet, Json and CSV, TSV, PSV files (which are used for creating Drill tables) can be left in current config file and the rest ones should be factored out to the separate config files along with creating separate modules in Drill *contrib *module. Therefore the process of creating the new plugins will be more transparent. Kind regards Vitalii On Tue, Feb 5, 2019 at 3:12 PM Charles Givre wrote: > I’d concur with Arina’s suggestion. I do think this would be useful and > make it easier to make plugins “pluggable”. > In the meantime, should we recommend that developers of format-plugins > include their plugins in the bootstrap-storage-plugins.json? I was > thinking also that we might want to have some guidelines for unit tests for > format plugins. I’m doing some work on the HTTPD format plugin and found > some issues which cause it to throw NPEs. > — C > > > > On Feb 5, 2019, at 06:40, Arina Yelchiyeva > wrote: > > > > Hi all, > > > > Before we were adding new formats / plugins into the exec module. > Eventually we came up to the point that exec package size is growing and > adding plugin and format contributions is better to separate out in the > different module. > > Now we have contrib module where we add such contributions. Plugins are > pluggable, there are added automatically by means of having > drill-module.conf file which points to the scanning packages. > > Format plugins are using the same approach, the only problem is that > they are not added into bootstrap-storage-plugins.json. So when adding new > format plugin, in order for it to automatically appear in Drill Web UI, > developer has to update bootstrap file which is in the exec module. > > My suggestion we implement some functionality that would merge format > config with the bootstrap one. For example, each plugin would have to have > bootstrap-format.json file with the information to which plugin format > should be added (structure the same as in bootstrap-storage-plugins.json): > > Example: > > > > { > > "storage":{ > >dfs: { > > formats: { > >"psv" : { > > type: "msgpack", > > extensions: [ "mp" ] > >} > > } > >} > > } > > } > > > > Then during Drill start up such bootstrap-format.json files will be > merged with bootstrap-storage-plugins.json. > > > > > > Current open PR for adding new format plugins: > > Format plugin for LTSV files - https://github.com/apache/drill/pull/1627 > > SYSLOG (RFC-5424) Format Plugin - > https://github.com/apache/drill/pull/1530 > > Msgpack format reader - https://github.com/apache/drill/pull/1500 > > > > Any suggestions? > > > > Kind regards, > > Arina > >
[GitHub] vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r253866957 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/DrillSqlWorker.java ## @@ -183,4 +183,20 @@ private static PhysicalPlan getQueryPlan(QueryContext context, String sql, Point return handler.getPlan(sqlNode); } + + //Wrap with Auto Limit + private static SqlNode wrapWithAutoLimit(SqlConverter parser, QueryContext context, String sql) { +SqlNode sqlNode = parser.parse(sql); Review comment: Please replace this code with the following to avoid parsing the query several times. ```suggestion SqlNode sqlNode = parser.parse(sql); int fetch = context.getAutoLimitRowCount(); if (sqlNode.getKind().belongsTo(SqlKind.QUERY)) { if (sqlNode.getKind() == SqlKind.ORDER_BY) { // merge SqlOrderBy SqlOrderBy orderBy = (SqlOrderBy) sqlNode; if (orderBy.fetch != null) { fetch = Math.min(Integer.valueOf(orderBy.fetch.toString()), fetch); } sqlNode = new SqlOrderBy(orderBy.getParserPosition(), orderBy.query, orderBy.orderList, orderBy.offset, SqlLiteral.createExactNumeric(Integer.toString(fetch), SqlParserPos.ZERO)); } else { sqlNode = new SqlOrderBy(SqlParserPos.ZERO, sqlNode, SqlNodeList.EMPTY, null, SqlLiteral.createExactNumeric(Integer.toString(fetch), SqlParserPos.ZERO)); } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Re: [DISCUSS] Format plugins in contrib module
I’d concur with Arina’s suggestion. I do think this would be useful and make it easier to make plugins “pluggable”. In the meantime, should we recommend that developers of format-plugins include their plugins in the bootstrap-storage-plugins.json? I was thinking also that we might want to have some guidelines for unit tests for format plugins. I’m doing some work on the HTTPD format plugin and found some issues which cause it to throw NPEs. — C > On Feb 5, 2019, at 06:40, Arina Yelchiyeva wrote: > > Hi all, > > Before we were adding new formats / plugins into the exec module. Eventually > we came up to the point that exec package size is growing and adding plugin > and format contributions is better to separate out in the different module. > Now we have contrib module where we add such contributions. Plugins are > pluggable, there are added automatically by means of having drill-module.conf > file which points to the scanning packages. > Format plugins are using the same approach, the only problem is that they are > not added into bootstrap-storage-plugins.json. So when adding new format > plugin, in order for it to automatically appear in Drill Web UI, developer > has to update bootstrap file which is in the exec module. > My suggestion we implement some functionality that would merge format config > with the bootstrap one. For example, each plugin would have to have > bootstrap-format.json file with the information to which plugin format should > be added (structure the same as in bootstrap-storage-plugins.json): > Example: > > { > "storage":{ >dfs: { > formats: { >"psv" : { > type: "msgpack", > extensions: [ "mp" ] >} > } >} > } > } > > Then during Drill start up such bootstrap-format.json files will be merged > with bootstrap-storage-plugins.json. > > > Current open PR for adding new format plugins: > Format plugin for LTSV files - https://github.com/apache/drill/pull/1627 > SYSLOG (RFC-5424) Format Plugin - https://github.com/apache/drill/pull/1530 > Msgpack format reader - https://github.com/apache/drill/pull/1500 > > Any suggestions? > > Kind regards, > Arina
[jira] [Resolved] (DRILL-4813) Need better error message for table function when table does not exist
[ https://issues.apache.org/jira/browse/DRILL-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-4813. - Resolution: Fixed > Need better error message for table function when table does not exist > -- > > Key: DRILL-4813 > URL: https://issues.apache.org/jira/browse/DRILL-4813 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.8.0 >Reporter: Krystal >Priority: Major > Fix For: 1.12.0 > > > Need user friendly error for table function when specified table does not > exists. Example as shown below: > select * from table(`drill-3149/blah.txt`(type=>'text', > lineDelimiter=>'\r\n')); > Error: VALIDATION ERROR: null > SQL Query null > [Error Id: 6a021d48-bbec-4617-9b53-94bc116dce76] > (java.lang.NullPointerException) null > > org.apache.drill.exec.planner.logical.DrillTranslatableTable.getRowType():49 > org.apache.calcite.sql.validate.ProcedureNamespace.validateImpl():68 > . > . > . -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (DRILL-5673) NPE from planner when using a table function with an invalid table
[ https://issues.apache.org/jira/browse/DRILL-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-5673. - Resolution: Fixed > NPE from planner when using a table function with an invalid table > -- > > Key: DRILL-5673 > URL: https://issues.apache.org/jira/browse/DRILL-5673 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Paul Rogers >Priority: Major > Fix For: 1.12.0 > > > Create a CSV file, with headers and call it "data_3.csv." > Set up a storage plugin config with headers, delimiter of ",". Call it > "myws". Then read the file: > {code} > SELECT * FROM `dfs.myws`.`data_3.csv` > {code} > This query works. > Try the same with a table function: > {code} > SELECT * FROM table(dfs.myws.`data_3.csv` (type => 'text', fieldDelimiter => > ',' , extractHeader => true)) > {code} > This also works. > Now, let's use an incorrect name: > {code} > SELECT * FROM `dfs.myws`.`data_33.csv` > {code} > (Note the "33" in the name.) > This produces an error to the client: > Now the bug. Do the same thing with the table function: > {code} > SELECT * FROM table(dfs.myws.`data_33.csv` (type => 'text', fieldDelimiter => > ',' , extractHeader => true)) > {code} > This results in an NPE: > {code} > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: null > SQL Query null > [Error Id: cf151c28-9879-4ecc-893a-78d85a11c2f4 on 10.250.50.74:31010] > at > org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) > ... > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.planner.logical.DrillTranslatableTable.getRowType(DrillTranslatableTable.java:49) > ~[classes/:na] > at > org.apache.calcite.sql.validate.ProcedureNamespace.validateImpl(ProcedureNamespace.java:68) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:86) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:883) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:869) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:2806) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:2791) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3014) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at > org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at > org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:86) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:883) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at > org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:869) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:210) > ~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21] > ... > {code} > This bug causes much user confusion as the user cannot immediately tell that > this is "user error" vs. something terribly wrong with Drill. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] arina-ielchiieva commented on issue #1612: DRILL-6970: fix issue with logregex format plugin where drillbuf was overflowing
arina-ielchiieva commented on issue #1612: DRILL-6970: fix issue with logregex format plugin where drillbuf was overflowing URL: https://github.com/apache/drill/pull/1612#issuecomment-460618283 @jcmcote is there any update? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[DISCUSS] Format plugins in contrib module
Hi all, Before we were adding new formats / plugins into the exec module. Eventually we came up to the point that exec package size is growing and adding plugin and format contributions is better to separate out in the different module. Now we have contrib module where we add such contributions. Plugins are pluggable, there are added automatically by means of having drill-module.conf file which points to the scanning packages. Format plugins are using the same approach, the only problem is that they are not added into bootstrap-storage-plugins.json. So when adding new format plugin, in order for it to automatically appear in Drill Web UI, developer has to update bootstrap file which is in the exec module. My suggestion we implement some functionality that would merge format config with the bootstrap one. For example, each plugin would have to have bootstrap-format.json file with the information to which plugin format should be added (structure the same as in bootstrap-storage-plugins.json): Example: { "storage":{ dfs: { formats: { "psv" : { type: "msgpack", extensions: [ "mp" ] } } } } } Then during Drill start up such bootstrap-format.json files will be merged with bootstrap-storage-plugins.json. Current open PR for adding new format plugins: Format plugin for LTSV files - https://github.com/apache/drill/pull/1627 SYSLOG (RFC-5424) Format Plugin - https://github.com/apache/drill/pull/1530 Msgpack format reader - https://github.com/apache/drill/pull/1500 Any suggestions? Kind regards, Arina
[GitHub] vdiravka commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files
vdiravka commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files URL: https://github.com/apache/drill/pull/1627#discussion_r253824507 ## File path: contrib/format-ltsv/src/main/java/org/apache/drill/exec/store/ltsv/LTSVFormatPlugin.java ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.ltsv; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.proto.UserBitShared; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.RecordReader; +import org.apache.drill.exec.store.RecordWriter; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin; +import org.apache.drill.exec.store.dfs.easy.EasyWriter; +import org.apache.drill.exec.store.dfs.easy.FileWork; +import org.apache.hadoop.conf.Configuration; + +import java.io.IOException; +import java.util.List; + +public class LTSVFormatPlugin extends EasyFormatPlugin { + +private static final boolean IS_COMPRESSIBLE = false; +private static final String DEFAULT_NAME = "ltsv"; +private LTSVFormatPluginConfig config; + +private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LTSVFormatPlugin.class); + +public LTSVFormatPlugin(String name, DrillbitContext context, Configuration fsConf, StoragePluginConfig storageConfig) { +this(name, context, fsConf, storageConfig, new LTSVFormatPluginConfig()); +} + +public LTSVFormatPlugin(String name, DrillbitContext context, Configuration fsConf, StoragePluginConfig config, LTSVFormatPluginConfig formatPluginConfig) { +super(name, context, fsConf, config, formatPluginConfig, true, false, false, IS_COMPRESSIBLE, formatPluginConfig.getExtensions(), DEFAULT_NAME); +this.config = formatPluginConfig; +} + +@Override +public RecordReader getRecordReader(FragmentContext context, DrillFileSystem dfs, FileWork fileWork, +List columns, String userName) throws ExecutionSetupException { +return new LTSVRecordReader(context, fileWork.getPath(), dfs, columns, config); +} + + +@Override +public int getReaderOperatorType() { +// TODO Is it correct?? +return UserBitShared.CoreOperatorType.JSON_SUB_SCAN_VALUE; Review comment: It is used for Drill RPC. You can find more info about it here: https://github.com/paul-rogers/drill/wiki/RPC-Protocol In your case it is necessary for determining proper type of the reader for `EasySubScan` operator, once the last is transferred to some Drill fragment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r253814230 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/Foreman.java ## @@ -136,15 +136,30 @@ */ public Foreman(final WorkerBee bee, final DrillbitContext drillbitContext, final UserClientConnection connection, final QueryId queryId, final RunQuery queryRequest) { +this(bee, drillbitContext, connection, queryId, queryRequest, null); + } + + /** + * Constructor. Sets up the Foreman, but does not initiate any execution. + * + * @param bee work manager (runs fragments) + * @param drillbitContext drillbit context + * @param connection connection + * @param queryId the id for the query + * @param queryRequest the query to execute + * @param autoLimitRowCount the number of rows to limit in execution (used for WebRequests) + */ + public Foreman(final WorkerBee bee, final DrillbitContext drillbitContext, Review comment: I'm not sure that changing `Foreman` constructor signature or adding the new one is a good idea. Can `autoLimitRowCount` be set into `RunQuery` object and then obtained where needed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r253807881 ## File path: exec/java-exec/src/main/resources/rest/static/js/querySubmission.js ## @@ -53,7 +53,18 @@ function doSubmitQueryWithUserName() { //Wrap & Submit Query (invoked directly if impersonation is not enabled) function wrapAndSubmitQuery() { //Wrap if required -wrapQuery(); +var mustWrapWithLimit = $('input[name="forceLimit"]:checked').length > 0; +//Clear field when submitting if not mustWrapWithLimit +if (!mustWrapWithLimit) { + //Wipe out any numeric entry in the field before + $('#autoLimit').attr('value', ''); +} else { + let autoLimitValue=document.getElementById("autoLimit").value; + if (isNaN(autoLimitValue)) { +alert(autoLimitValue+ " is not a number. Please fill in a valid number"); Review comment: I disagree with leaving alert as it is. Can be shown the error message similar to the case when the wrong configuration for storage plugin is submitted? ![image](https://user-images.githubusercontent.com/20928429/52267594-177a9280-2942-11e9-8a55-c00353da8a81.png) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r253817340 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/work/user/UserWorker.java ## @@ -73,9 +73,13 @@ private static QueryId queryIdGenerator() { } public QueryId submitWork(UserClientConnection connection, RunQuery query) { +return submitWork(connection, query, null); + } + + public QueryId submitWork(UserClientConnection connection, RunQuery query, Integer autoLimitRowCount) { Review comment: For the case when `RunQuery` will contain `autoLimitRowCount`, there will be no need for adding a new method. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r253814733 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ## @@ -84,9 +85,14 @@ private boolean closed = false; private DrillOperatorTable table; - public QueryContext(final UserSession session, final DrillbitContext drillbitContext, QueryId queryId) { + public QueryContext(UserSession session, DrillbitContext drillbitContext, QueryId queryId) { +this(session, drillbitContext, queryId, null); + } + + public QueryContext(UserSession session, DrillbitContext drillbitContext, QueryId queryId, Integer autoLimit) { Review comment: With specifying query option, there is no need to change `QueryContext` constructor. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query
vvysotskyi commented on a change in pull request #1608: DRILL-6960: Auto Limit Wrapping should not apply to non-select query URL: https://github.com/apache/drill/pull/1608#discussion_r253798677 ## File path: exec/java-exec/src/main/java/org/apache/drill/exec/ops/QueryContext.java ## @@ -273,6 +279,29 @@ public RemoteFunctionRegistry getRemoteFunctionRegistry() { return drillbitContext.getRemoteFunctionRegistry(); } + /** + * Check if auto-limiting of resultset is enabled + * @return True if auto-limit is enabled + */ + public boolean isAutoLimitEnabled() { +return autoLimitRowCount != null; + } + + /** + * Returns the maximum size of auto-limited resultset + * @return Maximum size of auto-limited resultSet + */ + public Integer getAutoLimitRowCount() { Review comment: I don't think that it is a good idea to expand the interface of `QueryContext` for such a minor thing. Please consider adding and using query option instead. As an example please see a fix for DRILL-6834. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin URL: https://github.com/apache/drill/pull/1530#discussion_r253767075 ## File path: contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogRecordReader.java ## @@ -0,0 +1,403 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.syslog; + +import com.google.common.base.Charsets; +import io.netty.buffer.DrillBuf; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.expr.holders.VarCharHolder; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.dfs.easy.FileWork; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.drill.exec.vector.complex.writer.BaseWriter; +import org.apache.hadoop.fs.Path; +import org.realityforge.jsyslog.message.StructuredDataParameter; +import org.realityforge.jsyslog.message.SyslogMessage; + +import java.io.BufferedReader; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.text.SimpleDateFormat; +import java.util.List; +import java.util.Map; +import java.util.Iterator; + +public class SyslogRecordReader extends AbstractRecordReader { + + private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(SyslogRecordReader.class); + private static final int MAX_RECORDS_PER_BATCH = 4096; + + private final DrillFileSystem fileSystem; + private final FileWork fileWork; + private final String userName; + private BufferedReader reader; + private DrillBuf buffer; + private VectorContainerWriter writer; + private SyslogFormatConfig config; + private int maxErrors; + private boolean flattenStructuredData; + private int errorCount; + private int lineCount; + private List projectedColumns; + private String line; + + private SimpleDateFormat df; + + public SyslogRecordReader(FragmentContext context, +DrillFileSystem fileSystem, +FileWork fileWork, +List columns, +String userName, +SyslogFormatConfig config) throws OutOfMemoryException { + +this.fileSystem = fileSystem; +this.fileWork = fileWork; +this.userName = userName; +this.config = config; +this.maxErrors = config.getMaxErrors(); +this.df = getValidDateObject("-MM-dd'T'HH:mm:ss.SSS'Z'"); +this.errorCount = 0; +this.buffer = context.getManagedBuffer(2048); Review comment: The constant for 4096 is created, but 2048 - the new value is introduced and used? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin URL: https://github.com/apache/drill/pull/1530#discussion_r253763788 ## File path: contrib/format-syslog/README.md ## @@ -0,0 +1,41 @@ +# Syslog Format Plugin Review comment: Once you will create a Jira please post its number here. I think you meant `contrib`. It is not a strict division, but there are core components in `java-exec`. And almost all storage/format plugins are in the `contrib` module. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin
vdiravka commented on a change in pull request #1530: DRILL-6582: SYSLOG (RFC-5424) Format Plugin URL: https://github.com/apache/drill/pull/1530#discussion_r253784583 ## File path: contrib/format-syslog/src/main/java/org/apache/drill/exec/store/syslog/SyslogFormatConfig.java ## @@ -0,0 +1,86 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.syslog; + +import com.fasterxml.jackson.annotation.JsonTypeName; +import com.google.common.base.Objects; +import org.apache.drill.common.logical.FormatPluginConfig; + +import java.util.Arrays; +import java.util.List; +import java.util.ArrayList; + +@JsonTypeName("syslog") +public class SyslogFormatConfig implements FormatPluginConfig { + + public List extensions; + public int maxErrors = 10; + public boolean flattenStructuredData = false; + + public boolean getFlattenStructuredData() { +return flattenStructuredData; + } + + public int getMaxErrors() { +return maxErrors; + } + + public List getExtensions() { +return extensions; + } + + public void setExtensions(List ext) { +this.extensions = ext; + } + + public void setExtension(String ext) { +if (this.extensions == null) { + this.extensions = new ArrayList(); +} +this.extensions.add(ext); + } + + public void setMaxErrors(int errors) { +this.maxErrors = errors; + } + + public void setFlattenStructuredData(boolean flattenErrors) { +this.flattenStructuredData = flattenErrors; + } + + @Override + public boolean equals(Object obj) { +if (this == obj) { + return true; +} +if (obj == null || getClass() != obj.getClass()) { + return false; +} +SyslogFormatConfig other = (SyslogFormatConfig) obj; +return Objects.equal(extensions, other.extensions) && +Objects.equal(maxErrors, other.maxErrors) && +Objects.equal(flattenStructuredData, other.flattenStructuredData); + } + + @Override + public int hashCode() { +return Arrays.hashCode(new Object[]{maxErrors, flattenStructuredData, extensions}); + } + +} Review comment: Looks like the new line is still absent (in some other files too). ```suggestion } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shimamoto commented on issue #1627: DRILL-7014: Format plugin for LTSV files
shimamoto commented on issue #1627: DRILL-7014: Format plugin for LTSV files URL: https://github.com/apache/drill/pull/1627#issuecomment-460559287 > Also, IMHO, you should add the config to the default Drill config so that people can query `ltsv` files by default. Is it correct to add `bootstrap-storage-plugins.json` into format-ltsv/src/main/resources? I tried adding `bootstrap-storage-plugins.json`, but it didn't work. Like this. ``` { "storage": { dfs: { type: "ltsv", connection: "file:///", formats: { "ltsv": { "type": "ltsv", "extensions": [ "ltsv" ] }, enabled: true } } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files
shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files URL: https://github.com/apache/drill/pull/1627#discussion_r253772190 ## File path: contrib/format-ltsv/src/main/java/org/apache/drill/exec/store/ltsv/LTSVFormatPlugin.java ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.ltsv; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.proto.UserBitShared; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.RecordReader; +import org.apache.drill.exec.store.RecordWriter; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin; +import org.apache.drill.exec.store.dfs.easy.EasyWriter; +import org.apache.drill.exec.store.dfs.easy.FileWork; +import org.apache.hadoop.conf.Configuration; + +import java.io.IOException; +import java.util.List; + +public class LTSVFormatPlugin extends EasyFormatPlugin { + +private static final boolean IS_COMPRESSIBLE = false; +private static final String DEFAULT_NAME = "ltsv"; +private LTSVFormatPluginConfig config; + +private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LTSVFormatPlugin.class); + +public LTSVFormatPlugin(String name, DrillbitContext context, Configuration fsConf, StoragePluginConfig storageConfig) { +this(name, context, fsConf, storageConfig, new LTSVFormatPluginConfig()); +} + +public LTSVFormatPlugin(String name, DrillbitContext context, Configuration fsConf, StoragePluginConfig config, LTSVFormatPluginConfig formatPluginConfig) { +super(name, context, fsConf, config, formatPluginConfig, true, false, false, IS_COMPRESSIBLE, formatPluginConfig.getExtensions(), DEFAULT_NAME); +this.config = formatPluginConfig; +} + +@Override +public RecordReader getRecordReader(FragmentContext context, DrillFileSystem dfs, FileWork fileWork, +List columns, String userName) throws ExecutionSetupException { +return new LTSVRecordReader(context, fileWork.getPath(), dfs, columns, config); +} + + +@Override +public int getReaderOperatorType() { +// TODO Is it correct?? +return UserBitShared.CoreOperatorType.JSON_SUB_SCAN_VALUE; Review comment: Could you explain more about why I have to add the protobuf? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files
shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files URL: https://github.com/apache/drill/pull/1627#discussion_r253772028 ## File path: contrib/format-ltsv/src/main/java/org/apache/drill/exec/store/ltsv/LTSVFormatPlugin.java ## @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.ltsv; + +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.proto.UserBitShared; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.store.RecordReader; +import org.apache.drill.exec.store.RecordWriter; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin; +import org.apache.drill.exec.store.dfs.easy.EasyWriter; +import org.apache.drill.exec.store.dfs.easy.FileWork; +import org.apache.hadoop.conf.Configuration; + +import java.io.IOException; +import java.util.List; + +public class LTSVFormatPlugin extends EasyFormatPlugin { + +private static final boolean IS_COMPRESSIBLE = false; +private static final String DEFAULT_NAME = "ltsv"; +private LTSVFormatPluginConfig config; + +private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LTSVFormatPlugin.class); + +public LTSVFormatPlugin(String name, DrillbitContext context, Configuration fsConf, StoragePluginConfig storageConfig) { +this(name, context, fsConf, storageConfig, new LTSVFormatPluginConfig()); +} + +public LTSVFormatPlugin(String name, DrillbitContext context, Configuration fsConf, StoragePluginConfig config, LTSVFormatPluginConfig formatPluginConfig) { +super(name, context, fsConf, config, formatPluginConfig, true, false, false, IS_COMPRESSIBLE, formatPluginConfig.getExtensions(), DEFAULT_NAME); +this.config = formatPluginConfig; +} + +@Override +public RecordReader getRecordReader(FragmentContext context, DrillFileSystem dfs, FileWork fileWork, +List columns, String userName) throws ExecutionSetupException { +return new LTSVRecordReader(context, fileWork.getPath(), dfs, columns, config); +} + + +@Override +public int getReaderOperatorType() { +// TODO Is it correct?? +return UserBitShared.CoreOperatorType.JSON_SUB_SCAN_VALUE; +} + +@Override +public int getWriterOperatorType() { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files
shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files URL: https://github.com/apache/drill/pull/1627#discussion_r253771968 ## File path: contrib/format-ltsv/src/main/java/org/apache/drill/exec/store/ltsv/LTSVRecordReader.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.ltsv; + +import io.netty.buffer.DrillBuf; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.drill.exec.vector.complex.writer.BaseWriter; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.Path; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStreamReader; +import java.text.ParseException; +import java.util.List; + +public class LTSVRecordReader extends AbstractRecordReader { + +private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LTSVRecordReader.class); +private static final int MAX_RECORDS_PER_BATCH = 8096; + +private String inputPath; +private BufferedReader reader; +private DrillBuf buffer; +private VectorContainerWriter writer; +private LTSVFormatPluginConfig config; +private int lineCount; + +public LTSVRecordReader(FragmentContext fragmentContext, String inputPath, DrillFileSystem fileSystem, +List columns, LTSVFormatPluginConfig config) throws OutOfMemoryException { +try { +FSDataInputStream fsStream = fileSystem.open(new Path(inputPath)); +this.inputPath = inputPath; +this.lineCount = 0; +this.reader = new BufferedReader(new InputStreamReader(fsStream.getWrappedStream(), "UTF-8")); +this.config = config; +this.buffer = fragmentContext.getManagedBuffer(); +setColumns(columns); + +} catch(IOException e){ +logger.debug("LTSV Plugin: " + e.getMessage()); +} +} + +public void setup(final OperatorContext context, final OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output); +} + +public int next() { +this.writer.allocate(); +this.writer.reset(); + +int recordCount = 0; + +try { +BaseWriter.MapWriter map = this.writer.rootAsMap(); +String line = null; + +while(recordCount < MAX_RECORDS_PER_BATCH &&(line = this.reader.readLine()) != null){ +lineCount++; + +// Skip empty lines +if(line.trim().length() == 0){ +continue; +} + +this.writer.setPosition(recordCount); +map.start(); + +String[] fields = line.split("\t"); +for(String field: fields){ +int index = field.indexOf(":"); +if(index <= 0) { +throw new ParseException( +"Invalid LTSV format: " + inputPath + "\n" + lineCount + ":" + line, 0 +); +} + +String fieldName = field.substring(0, index); +String fieldValue = field.substring(index + 1); + +byte[] bytes = fieldValue.getBytes("UTF-8"); +this.buffer.setBytes(0, bytes, 0, bytes.length); +map.varChar(fieldName).writeVarChar(0, bytes.length, buffer); Review comment: done. Could you please check if my change makes sense? This is an automated mess
[GitHub] shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files
shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files URL: https://github.com/apache/drill/pull/1627#discussion_r253770309 ## File path: contrib/format-ltsv/src/main/java/org/apache/drill/exec/store/ltsv/LTSVRecordReader.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.ltsv; + +import io.netty.buffer.DrillBuf; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.drill.exec.vector.complex.writer.BaseWriter; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.Path; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStreamReader; +import java.text.ParseException; +import java.util.List; + +public class LTSVRecordReader extends AbstractRecordReader { + +private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LTSVRecordReader.class); +private static final int MAX_RECORDS_PER_BATCH = 8096; + +private String inputPath; +private BufferedReader reader; +private DrillBuf buffer; +private VectorContainerWriter writer; +private LTSVFormatPluginConfig config; +private int lineCount; + +public LTSVRecordReader(FragmentContext fragmentContext, String inputPath, DrillFileSystem fileSystem, +List columns, LTSVFormatPluginConfig config) throws OutOfMemoryException { +try { +FSDataInputStream fsStream = fileSystem.open(new Path(inputPath)); +this.inputPath = inputPath; +this.lineCount = 0; +this.reader = new BufferedReader(new InputStreamReader(fsStream.getWrappedStream(), "UTF-8")); +this.config = config; +this.buffer = fragmentContext.getManagedBuffer(); +setColumns(columns); + +} catch(IOException e){ +logger.debug("LTSV Plugin: " + e.getMessage()); +} +} + +public void setup(final OperatorContext context, final OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output); +} + +public int next() { +this.writer.allocate(); +this.writer.reset(); + +int recordCount = 0; + +try { +BaseWriter.MapWriter map = this.writer.rootAsMap(); +String line = null; + +while(recordCount < MAX_RECORDS_PER_BATCH &&(line = this.reader.readLine()) != null){ +lineCount++; + +// Skip empty lines +if(line.trim().length() == 0){ +continue; +} + +this.writer.setPosition(recordCount); +map.start(); + +String[] fields = line.split("\t"); +for(String field: fields){ +int index = field.indexOf(":"); +if(index <= 0) { +throw new ParseException( +"Invalid LTSV format: " + inputPath + "\n" + lineCount + ":" + line, 0 +); +} + +String fieldName = field.substring(0, index); +String fieldValue = field.substring(index + 1); + +byte[] bytes = fieldValue.getBytes("UTF-8"); +this.buffer.setBytes(0, bytes, 0, bytes.length); +map.varChar(fieldName).writeVarChar(0, bytes.length, buffer); +} + +map.end(); +recordCount++; +} + +this.writer.setValueCount(recordCount); +retu
[GitHub] shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files
shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files URL: https://github.com/apache/drill/pull/1627#discussion_r253770219 ## File path: contrib/format-ltsv/src/main/java/org/apache/drill/exec/store/ltsv/LTSVRecordReader.java ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.ltsv; + +import io.netty.buffer.DrillBuf; +import org.apache.drill.common.exceptions.ExecutionSetupException; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.OperatorContext; +import org.apache.drill.exec.physical.impl.OutputMutator; +import org.apache.drill.exec.store.AbstractRecordReader; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; +import org.apache.drill.exec.vector.complex.writer.BaseWriter; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.Path; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStreamReader; +import java.text.ParseException; +import java.util.List; + +public class LTSVRecordReader extends AbstractRecordReader { + +private static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(LTSVRecordReader.class); +private static final int MAX_RECORDS_PER_BATCH = 8096; + +private String inputPath; +private BufferedReader reader; +private DrillBuf buffer; +private VectorContainerWriter writer; +private LTSVFormatPluginConfig config; +private int lineCount; + +public LTSVRecordReader(FragmentContext fragmentContext, String inputPath, DrillFileSystem fileSystem, +List columns, LTSVFormatPluginConfig config) throws OutOfMemoryException { +try { +FSDataInputStream fsStream = fileSystem.open(new Path(inputPath)); +this.inputPath = inputPath; +this.lineCount = 0; +this.reader = new BufferedReader(new InputStreamReader(fsStream.getWrappedStream(), "UTF-8")); +this.config = config; +this.buffer = fragmentContext.getManagedBuffer(); +setColumns(columns); + +} catch(IOException e){ +logger.debug("LTSV Plugin: " + e.getMessage()); +} +} + +public void setup(final OperatorContext context, final OutputMutator output) throws ExecutionSetupException { +this.writer = new VectorContainerWriter(output); +} + +public int next() { +this.writer.allocate(); +this.writer.reset(); + +int recordCount = 0; + +try { +BaseWriter.MapWriter map = this.writer.rootAsMap(); +String line = null; + +while(recordCount < MAX_RECORDS_PER_BATCH &&(line = this.reader.readLine()) != null){ +lineCount++; + +// Skip empty lines +if(line.trim().length() == 0){ +continue; +} + +this.writer.setPosition(recordCount); +map.start(); + +String[] fields = line.split("\t"); +for(String field: fields){ +int index = field.indexOf(":"); +if(index <= 0) { +throw new ParseException( +"Invalid LTSV format: " + inputPath + "\n" + lineCount + ":" + line, 0 Review comment: This code is enclosed in a try block. When an exception occurs, the exception is caught and wrapped in Drill one. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files
shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files URL: https://github.com/apache/drill/pull/1627#discussion_r253769950 ## File path: contrib/format-ltsv/src/main/java/org/apache/drill/exec/store/ltsv/LTSVFormatPluginConfig.java ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.ltsv; + +import org.apache.drill.common.logical.FormatPluginConfig; +import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList; + +import com.fasterxml.jackson.annotation.JsonTypeName; +import com.fasterxml.jackson.annotation.JsonInclude; + +import java.util.List; + +@JsonTypeName("ltsv") +public class LTSVFormatPluginConfig implements FormatPluginConfig { +public List extensions; + +private static final List DEFAULT_EXTS = ImmutableList.of("ltsv"); + +@JsonInclude(JsonInclude.Include.NON_DEFAULT) +public List getExtensions() { +if (extensions == null) { +// when loading an old JSONFormatConfig that doesn't contain an "extensions" attribute +return DEFAULT_EXTS; +} +return extensions; +} + +@Override +public int hashCode() { +return 99; +} + Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files
shimamoto commented on a change in pull request #1627: DRILL-7014: Format plugin for LTSV files URL: https://github.com/apache/drill/pull/1627#discussion_r25376 ## File path: contrib/format-ltsv/src/main/java/org/apache/drill/exec/store/ltsv/LTSVFormatPluginConfig.java ## @@ -0,0 +1,59 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.ltsv; + +import org.apache.drill.common.logical.FormatPluginConfig; +import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList; + +import com.fasterxml.jackson.annotation.JsonTypeName; +import com.fasterxml.jackson.annotation.JsonInclude; + +import java.util.List; + +@JsonTypeName("ltsv") +public class LTSVFormatPluginConfig implements FormatPluginConfig { +public List extensions; + +private static final List DEFAULT_EXTS = ImmutableList.of("ltsv"); + +@JsonInclude(JsonInclude.Include.NON_DEFAULT) +public List getExtensions() { +if (extensions == null) { +// when loading an old JSONFormatConfig that doesn't contain an "extensions" attribute +return DEFAULT_EXTS; +} +return extensions; +} + +@Override +public int hashCode() { +return 99; +} + +@Override +public boolean equals(Object obj) { +if (this == obj) { +return true; +} else if (obj == null) { +return false; +} else if (getClass() == obj.getClass()) { +return true; Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services