[GitHub] [drill] arina-ielchiieva merged pull request #2059: DRILL-7704: Update Maven to 3.6.3
arina-ielchiieva merged pull request #2059: DRILL-7704: Update Maven to 3.6.3 URL: https://github.com/apache/drill/pull/2059 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on issue #2059: DRILL-7704: Update Maven to 3.6.3
arina-ielchiieva commented on issue #2059: DRILL-7704: Update Maven to 3.6.3 URL: https://github.com/apache/drill/pull/2059#issuecomment-615120696 @vvysotskyi / @paul-rogers thanks for the code review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on issue #2054: DRILL-6168: Revise format plugin table functions
arina-ielchiieva commented on issue #2054: DRILL-6168: Revise format plugin table functions URL: https://github.com/apache/drill/pull/2054#issuecomment-615128763 @paul-rogers changes look good to me. I have ran Functional / Advanced tests and there are four failures. Please take a look and make changes in code or suggest how tests should be updated. ``` Data Verification Failures: /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_2.q /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_1.q /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_4.q /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_13.q ``` Detailed output: ``` Data Verification Failures: Query: /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_2.q select * from table(`table_function/cr_lf.csv`(type=>'text', lineDelimiter=>'\r\n')) Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_2.e Expected number of rows: 4 Actual number of rows from Drill: 4 Number of matching rows: 0 Number of rows missing: 4 Number of rows unexpected: 4 These rows are not expected (first 10): ["1","aaa","bbb"] ["2","ccc","ddd"] ["3","eee",""] ["4","fff","ggg"] These rows are missing (first 10): ["3,eee,"] (1 occurence(s)) ["1,aaa,bbb"] (1 occurence(s)) ["2,ccc,ddd"] (1 occurence(s)) ["4,fff,ggg"] (1 occurence(s)) Query: /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_1.q select columns[0] from table(`table_function/cr_lf.csv`(type=>'text', lineDelimiter=>'\r\n')) Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_1.e Expected number of rows: 4 Actual number of rows from Drill: 4 Number of matching rows: 0 Number of rows missing: 4 Number of rows unexpected: 4 These rows are not expected (first 10): 1 2 3 4 These rows are missing (first 10): 3,eee, (1 occurence(s)) 1,aaa,bbb (1 occurence(s)) 2,ccc,ddd (1 occurence(s)) 4,fff,ggg (1 occurence(s)) Query: /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_4.q select * from table(`table_function/lf_cr.tsv`(type=>'text', lineDelimiter=>'\n\r')) Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_4.e Expected number of rows: 4 Actual number of rows from Drill: 4 Number of matching rows: 0 Number of rows missing: 4 Number of rows unexpected: 4 These rows are not expected (first 10): ["1","aaa","bbb"] ["2","ccc","ddd"] ["3","eee",""] ["4","fff","ggg"] These rows are missing (first 10): ["3\teee\t"] (1 occurence(s)) ["2\tccc\tddd"] (1 occurence(s)) ["1\taaa\tbbb"] (1 occurence(s)) ["4\tfff\tggg"] (1 occurence(s)) Query: /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_13.q select * from table(`table_function/chinese.txt`(type=>'text',lineDelimiter=>'电脑坏了')) Baseline: /root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_13.e Expected number of rows: 4 Actual number of rows from Drill: 4 Number of matching rows: 0 Number of rows missing: 4 Number of rows unexpected: 4 These rows are not expected (first 10): ["1","aaa","bbb"] ["2","ccc","ddd"] ["3","eee",""] ["4","fff","ggg"] These rows are missing (first 10): ["3,eee,"] (1 occurence(s)) ["1,aaa,bbb"] (1 occurence(s)) ["2,ccc,ddd"] (1 occurence(s)) ["4,fff,ggg"] (1 occurence(s)) ``` https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Functional/table_function/positive/data https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Datasources/table_function This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] arina-ielchiieva commented on issue #2058: DRILL-7703: Support for 3+D arrays in EVF JSON loader
arina-ielchiieva commented on issue #2058: DRILL-7703: Support for 3+D arrays in EVF JSON loader URL: https://github.com/apache/drill/pull/2058#issuecomment-615181989 +1, LGTM. Additionally ran Functional / Advanced tests, all passed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[NOTICE] Maven 3.6.3
Hi all, Starting from Drill 1.18.0 (and current master from commit 20ad3c9 [1]), Drill build will require Maven 3.6.3, otherwise build will fail. Please make sure you have Maven 3.6.3 installed on your environments. [1] https://github.com/apache/drill/commit/20ad3c9837e9ada149c246fc7a4ac1fe02de6fe8 Kind regards, Arina
[jira] [Created] (DRILL-7705) Update jQuery and Bootstrap libraries
Anton Gozhiy created DRILL-7705: --- Summary: Update jQuery and Bootstrap libraries Key: DRILL-7705 URL: https://issues.apache.org/jira/browse/DRILL-7705 Project: Apache Drill Issue Type: Improvement Affects Versions: 1.17.0 Reporter: Anton Gozhiy Assignee: Anton Gozhiy Fix For: 1.18.0 There are some vulnerabilities present in jQuery and Bootstrap libraries used in Drill: * jQuery before 3.4.0, as used in Drupal, Backdrop CMS, and other products, mishandles jQuery.extend(true, {}, ...) because of Object.prototype pollution. If an unsanitized source object contained an enumerable __proto__ property, it could extend the native Object.prototype. * In Bootstrap before 4.1.2, XSS is possible in the collapse data-parent attribute. * In Bootstrap before 4.1.2, XSS is possible in the data-container property of tooltip. * In Bootstrap before 3.4.0, XSS is possible in the affix configuration target property. * In Bootstrap before 3.4.1 and 4.3.x before 4.3.1, XSS is possible in the tooltip or popover data-template attribute. The following update is suggested to fix them: * jQuery: 3.2.1 -> 3.5.0 * Bootstrap: 3.1.1 -> 4.4.1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7706) Drill RDBMS Metastore
Arina Ielchiieva created DRILL-7706: --- Summary: Drill RDBMS Metastore Key: DRILL-7706 URL: https://issues.apache.org/jira/browse/DRILL-7706 Project: Apache Drill Issue Type: New Feature Affects Versions: 1.17.0 Reporter: Arina Ielchiieva Assignee: Arina Ielchiieva Fix For: 1.18.0 Currently Drill has only one Metastore implementation based on Iceberg tables. Iceberg tables are file based storage that supports concurrent writes / reads but required to be placed on distributed file system. This Jira aims to implement Drill RDBMS Metastore which will store Drill Metastore metadata in the database of the user's choice. Currently, PostgreSQL and MySQL databases are supported, others might work as well but no testing was done. Also out of box for demonstration / testing purposes Drill will setup SQLite file based embedded database but this is only applicable for Drill in embedded mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (DRILL-7707) Unable to analyze table metadata is it resides in non-writable workspace
Arina Ielchiieva created DRILL-7707: --- Summary: Unable to analyze table metadata is it resides in non-writable workspace Key: DRILL-7707 URL: https://issues.apache.org/jira/browse/DRILL-7707 Project: Apache Drill Issue Type: Bug Affects Versions: 1.17.0 Reporter: Arina Ielchiieva Unable to analyze table metadata is it resides in non-writable workspace: {noformat} apache drill> analyze table cp.`employee.json` refresh metadata; Error: VALIDATION ERROR: Unable to create or drop objects. Schema [cp] is immutable. {noformat} Stacktrace: {noformat} [Error Id: b7f233cd-f090-491e-a487-5fc4c25444a4 ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657) at org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToDrillSchemaInternal(SchemaUtilites.java:230) at org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToDrillSchema(SchemaUtilites.java:208) at org.apache.drill.exec.planner.sql.handlers.DrillTableInfo.getTableInfoHolder(DrillTableInfo.java:101) at org.apache.drill.exec.planner.sql.handlers.MetastoreAnalyzeTableHandler.getPlan(MetastoreAnalyzeTableHandler.java:108) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163) at org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:128) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93) at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:593) at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274) {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [drill] arina-ielchiieva opened a new pull request #2060: DRILL-7706: Implement Drill RDBMS Metastore
arina-ielchiieva opened a new pull request #2060: DRILL-7706: Implement Drill RDBMS Metastore URL: https://github.com/apache/drill/pull/2060 # [DRILL-7706](https://issues.apache.org/jira/browse/DRILL-7706): Implement Drill RDBMS Metastore ## Description Currently Drill has only one Metastore implementation based on Iceberg tables. Iceberg tables are file based storage that supports concurrent writes / reads but required to be placed on distributed file system. This PR aims to implement Drill RDBMS Metastore which will store Drill Metastore metadata in the database of the user's choice. Currently, PostgreSQL and MySQL databases are supported, others might work as well but no testing was done. Also out of box for demonstration / testing purposes Drill will setup SQLite file based embedded database but this is only applicable for Drill in embedded mode. 1. Fix issue with undeterministic execution of batch update / delete statements, now they will be executed in the same order as they were added. 2. Abstracted Metastore common test classes to be used by different Metastore implementations. 3. Added drill-metastore-override-example.conf with example of Drill Metastore configuration. 4. Replaced list of metadata types which are required to be passed during read / write operations with set to avoid possible duplicates. 5. Add RDBMS Metastore implementation, README.md and unit tests. ## Documentation RDBMS Metastore section should be added: http://drill.apache.org/docs/using-drill-metastore/ ## Testing Ran all unit tests, Functional & Advanced. Tested manually with SQLite, PostgreSQL, MySQL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[DISCUSS]: Masking Creds in Query Plans
Hello all, I was thinking about this, if a user were to execute an EXPLAIN PLAN FOR query, they get a lot of information about the storage plugin, including in some cases creds. The example below shows a query plan for the JDBC storage plugin. As you can see, the user creds are right there I'm wondering would it be advisable or possible to mask the creds in query plans so that users can't access this information? If masking it isn't an option, is there some other way to prevent users from seeing this information? In a multi-tenant environment, it seems like a rather large security hole. Thanks, -- C { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ ], "queue" : 0, "hasResourcePlan" : false, "resultMode" : "EXEC" }, "graph" : [ { "pop" : "jdbc-scan", "@id" : 5, "sql" : "SELECT *\nFROM `stats`.`batting`", "columns" : [ "`playerID`", "`yearID`", "`stint`", "`teamID`", "`lgID`", "`G`", "`AB`", "`R`", "`H`", "`2B`", "`3B`", "`HR`", "`RBI`", "`SB`", "`CS`", "`BB`", "`SO`", "`IBB`", "`HBP`", "`SH`", "`SF`", "`GIDP`" ], "config" : { "type" : "jdbc", "driver" : "com.mysql.cj.jdbc.Driver", "url" : "jdbc:mysql://localhost:3306/?serverTimezone=EST5EDT", "username" : "", "password" : "", "caseInsensitiveTableNames" : false, "sourceParameters" : { }, "enabled" : true }, "userName" : "", "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 100.0 } }, { "pop" : "limit", "@id" : 4, "child" : 5, "first" : 0, "last" : 10, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 10.0 } }, { "pop" : "limit", "@id" : 3,
Re: [NOTICE] Maven 3.6.3
Hi Arina, Thanks for keeping us up to date! As it turns out, I use Ubuntu (Linux Mint) for development. Maven is installed as a package using apt-get. Packages can lag behind a bit. The latest maven available via apt-get is 3.6.0. It is a nuisance to install a new version outside the package manager. I changed the Maven version in the root pom.xml to 3.6.0 and the build seemed to work. Any reason we need the absolute latest version rather than just 3.6.0 or later? The workaround for now is to manually edit the pom.xml file on each checkout, then revert the change before commit. Can we maybe adjust the "official" version instead? Thanks, - Paul On Friday, April 17, 2020, 5:09:49 AM PDT, Arina Ielchiieva wrote: Hi all, Starting from Drill 1.18.0 (and current master from commit 20ad3c9 [1]), Drill build will require Maven 3.6.3, otherwise build will fail. Please make sure you have Maven 3.6.3 installed on your environments. [1] https://github.com/apache/drill/commit/20ad3c9837e9ada149c246fc7a4ac1fe02de6fe8 Kind regards, Arina
[GitHub] [drill] paul-rogers commented on issue #2060: DRILL-7706: Implement Drill RDBMS Metastore
paul-rogers commented on issue #2060: DRILL-7706: Implement Drill RDBMS Metastore URL: https://github.com/apache/drill/pull/2060#issuecomment-615394559 @arina-ielchiieva, this is great! I wonder, will this PR handle a particular use case that I'm seeing? The metastore works for files. We are seeing more cases where users want to read other data sources: HTTP, Mongo, whatever. These sources deliver JSON, which can be ambiguous. We want to use a provided schema to resolve ambiguities. (That work is ongoing in updating the JSON reader, etc.) So, we need a place to store the provided schema. Recall that, for files, we put the provided schema in the same folder as the data. For the HTTP plugin, say, there is no directory to place the files. So, can we use the DB-backed metastore? Will there be a standard way for a plugin to map its concept of a table to an entry in the metastore which can hold the provided schema? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] paul-rogers commented on issue #2060: DRILL-7706: Implement Drill RDBMS Metastore
paul-rogers commented on issue #2060: DRILL-7706: Implement Drill RDBMS Metastore URL: https://github.com/apache/drill/pull/2060#issuecomment-615395578 @arina-ielchiieva, another related question. We currently use ZK to store things like plugin configs, dynamic UDFS, etc. Charles just asked about the problems storing credentials that way. I wonder if the DB support here is (or can be) generalized to allow a variety of stores (the way we have multiple "pstores" for ZK.) If so, then, over time, we could implement alternative, DB-based "pstores" for plugins, UDFs, security credentials and more. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table functions
paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table functions URL: https://github.com/apache/drill/pull/2054#issuecomment-615408581 @arina-ielchiieva, thanks much for running the tests! Looks like these failures are due to the verifying the incorrect prior behavior where the default field delimiter was newline (same as the line delimiter), not comma. This PR changes the default to comma. Let's consider an example. For this query: ``` select * from table(`table_function/cr_lf.csv`(type=>'text', lineDelimiter=>'\r\n')); ``` With this input: ``` 1,aaa,bbb 2,ccc,ddd 3,eee, 4,fff,ggg ``` We currently expect this output because the old default field delimiter is a newline (same as the line delimiter): ``` ["1,aaa,bbb"] ["2,ccc,ddd"] ["3,eee,"] ["4,fff,ggg"] ``` The correct expected results, with a default field delimiter of comma, is: ``` ["1","aaa","bbb"] ["2","ccc","ddd"] ["3","eee",""] ["4","fff","ggg"] ``` Will investigate how to fix the tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-7708) Downgrade maven from 3.6.3 to 3.6.0
Paul Rogers created DRILL-7708: -- Summary: Downgrade maven from 3.6.3 to 3.6.0 Key: DRILL-7708 URL: https://issues.apache.org/jira/browse/DRILL-7708 Project: Apache Drill Issue Type: Bug Affects Versions: 1.18.0 Reporter: Paul Rogers Assignee: Paul Rogers Fix For: 1.18.0 DRILL-7704 upgraded Drill's Maven version to 3.6.3. As it turns out, I use Ubuntu (Linux Mint) for development. Maven is installed as a package using apt-get. Packages can lag behind a bit. The latest maven available via apt-get is 3.6.0. It is a nuisance to install a new version outside the package manager. I changed the Maven version in the root pom.xml to 3.6.0 and the build seemed to work. Any reason we need the absolute latest version rather than just 3.6.0 or later? The workaround for now is to manually edit the pom.xml file on each checkout, then revert the change before commit. This ticket requests to adjust the "official" version to 3.6.0. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [drill] paul-rogers opened a new pull request #2061: DRILL-7708: Downgrade maven from 3.6.3 to 3.6.0
paul-rogers opened a new pull request #2061: DRILL-7708: Downgrade maven from 3.6.3 to 3.6.0 URL: https://github.com/apache/drill/pull/2061 # [DRILL-7708](https://issues.apache.org/jira/browse/DRILL-7708): Downgrade maven from 3.6.3 to 3.6.0 ## Description Thanks to @arina-ielchiieva for recently upgrading Drill's Maven version to 3.6.3. As it turns out, I use Ubuntu (Linux Mint) for development. Maven is installed as a package using apt-get. Packages can lag behind a bit. The latest maven available via apt-get is 3.6.0. It is a nuisance to install a new version outside the package manager. I changed the Maven version in the root pom.xml to 3.6.0 and the build seemed to work. This PR adjusts the required Maven version to 3.6.0. ## Documentation Maven 3.6.0 is required to build Drill. ## Testing Ran the full build. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] vvysotskyi commented on issue #2061: DRILL-7708: Downgrade maven from 3.6.3 to 3.6.0
vvysotskyi commented on issue #2061: DRILL-7708: Downgrade maven from 3.6.3 to 3.6.0 URL: https://github.com/apache/drill/pull/2061#issuecomment-615434184 I'm not sure that this is a good way to align the Maven version to the version supported by the system package manager. For example, for people who use Ubuntu 16.04LTS, only [Maven 3.3.9](https://packages.ubuntu.com/xenial/maven) was available, but people who already use Ubuntu 20.04 can install [Maven 3.6.3](https://packages.ubuntu.com/focal/maven). I don't remember time when I hadn't to install it manually from the binary archive on CentOS 6.X and some 7.X versions. For macOS, brew already provides Maven 3.6.3. But having a newer Maven version allows using newer plugin versions with their new features. As a side point, Apache Spark also uses Maven 3.6.3: https://github.com/apache/spark/blob/master/pom.xml#L118. For your case, I would recommend either adding new release mirror `deb http://cz.archive.ubuntu.com/ubuntu focal main universe` to `/etc/apt/sources.list` and installing newer version using the apt package manager or downloading the package from [there](https://packages.ubuntu.com/focal/all/maven/download) and installing it using `dpgk -i`. I have tried the first option on Ubuntu 18.04 and it worked fine for me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] paul-rogers closed pull request #2061: DRILL-7708: Downgrade maven from 3.6.3 to 3.6.0
paul-rogers closed pull request #2061: DRILL-7708: Downgrade maven from 3.6.3 to 3.6.0 URL: https://github.com/apache/drill/pull/2061 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table functions
paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table functions URL: https://github.com/apache/drill/pull/2054#issuecomment-615470669 A similar analysis applies to the other queries. For `drill-3149_1`: ``` select columns[0] from table(`table_function/cr_lf.csv`(type=>'text', lineDelimiter=>'\r\n')) ``` Expected (confusing: each line is one field, `columns[0]`): ``` 1,aaa,bbb 2,ccc,ddd 3,eee, 4,fff,ggg ``` New results with column as the field delimiter rather than newline: ``` 1 2 3 4 ``` Next query, `drill-3149_4`: ``` select * from table(`table_function/lf_cr.tsv`(type=>'text', lineDelimiter=>'\n\r')) ``` Expected: ``` ["1\taaa\tbbb"] ["2\tccc\tddd"] ["3\teee\t"] ["4\tfff\tggg"] ``` Actual: ``` ["1","aaa","bbb"] ["2","ccc","ddd"] ["3","eee",""] ["4","fff","ggg"] ``` Notice how the expected results are wrong. We are using a tsv (tab separated file) but we expect the query to treat the tab as a normal character. With this PR, we change only the line delimiter, not the field delimiter, which seems more accurate. Finally, `drill-3149_13`: ``` select * from table(`table_function/chinese.txt`(type=>'text',lineDelimiter=>'电脑坏了')) ``` Expected (all columns in a single field): ``` ["1,aaa,bbb"] ["2,ccc,ddd"] ["3,eee,"] ["4,fff,ggg"] ``` With this PR we get the correct results: ``` ["1","aaa","bbb"] ["2","ccc","ddd"] ["3","eee",""] ["4","fff","ggg"] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [drill] paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table functions
paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table functions URL: https://github.com/apache/drill/pull/2054#issuecomment-615472724 @arina-ielchiieva, the above analysis shows that this PR provides the correct results, the prior code and tests verified incorrect results. So, let's do this: 1. Approve and commit this PR. 2. Immediately update the four faulty expected results (`_e`) files in the test framework. 3. Rerun the selected tests to verify the test fixes. Note that we cannot reverse the order: if we change the tests first, all runs except this PR will fail. It would be ideal to have two sets of results: one before this PR, one after. Or, disable the four tests for commits before this PR. But, the test framework does not seem to have version awareness. Unfortunately, I'm not yet set up to run the test framework. I can, however, fix the result files. Should I do a PR against the test framework? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
Re: [DISCUSS]: Masking Creds in Query Plans
Hi Charles, Excellent point. The problem is deeper. Drill serializes plugin configs in the query plan which it sends to each worker (Drillbit.) Why? To avoid race conditions if you start a query then change the plugin config and thus different nodes see different versions of the config. Masking can't happen in the execution plan or the plan won't work. (I hope your password is not actually "***".) So, masking would have to happen in logs and in the EXPLAIN PLAN FOR. This would, in turn, require that we have code that understands each config well enough to make a copy of the config with the credentials masked so we can then serialize the copied plan to JSON. (Or, we'd have to edit the JSON after generated.) Both are pretty ugly and not very secure. What we need is some kind of "vault" interface: a config which is a key into a vault where Drill itself has been given the key, and the vault returns the actual credential value. As a security guy yourself, what would you recommend as our target? Should we create a generic API? Is there some system common enough on Hadoop systems that we should target that as our reference implementation? Also, can you perhaps file a JIRA ticket for this issue? Thanks, - Paul On Friday, April 17, 2020, 7:34:32 AM PDT, Charles Givre wrote: Hello all, I was thinking about this, if a user were to execute an EXPLAIN PLAN FOR query, they get a lot of information about the storage plugin, including in some cases creds. The example below shows a query plan for the JDBC storage plugin. As you can see, the user creds are right there I'm wondering would it be advisable or possible to mask the creds in query plans so that users can't access this information? If masking it isn't an option, is there some other way to prevent users from seeing this information? In a multi-tenant environment, it seems like a rather large security hole. Thanks, -- C { "head" : { "version" : 1, "generator" : { "type" : "ExplainHandler", "info" : "" }, "type" : "APACHE_DRILL_PHYSICAL", "options" : [ ], "queue" : 0, "hasResourcePlan" : false, "resultMode" : "EXEC" }, "graph" : [ { "pop" : "jdbc-scan", "@id" : 5, "sql" : "SELECT *\nFROM `stats`.`batting`", "columns" : [ "`playerID`", "`yearID`", "`stint`", "`teamID`", "`lgID`", "`G`", "`AB`", "`R`", "`H`", "`2B`", "`3B`", "`HR`", "`RBI`", "`SB`", "`CS`", "`BB`", "`SO`", "`IBB`", "`HBP`", "`SH`", "`SF`", "`GIDP`" ], "config" : { "type" : "jdbc", "driver" : "com.mysql.cj.jdbc.Driver", "url" : "jdbc:mysql://localhost:3306/?serverTimezone=EST5EDT", "username" : "", "password" : "", "caseInsensitiveTableNames" : false, "sourceParameters" : { }, "enabled" : true }, "userName" : "", "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 100.0 } }, { "pop" : "limit", "@id" : 4, "child" : 5, "first" : 0, "last" : 10, "initialAllocation" : 100, "maxAllocation" : 100, "cost" : { "memoryCost" : 1.6777216E7, "outputRowCount" : 10.0 } }, { "pop" : "limit", "@id" : 3,
[GitHub] [drill] paul-rogers merged pull request #2058: DRILL-7703: Support for 3+D arrays in EVF JSON loader
paul-rogers merged pull request #2058: DRILL-7703: Support for 3+D arrays in EVF JSON loader URL: https://github.com/apache/drill/pull/2058 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (DRILL-7709) CTAS as CSV creates files which the "csv" plugin can't read
Paul Rogers created DRILL-7709: -- Summary: CTAS as CSV creates files which the "csv" plugin can't read Key: DRILL-7709 URL: https://issues.apache.org/jira/browse/DRILL-7709 Project: Apache Drill Issue Type: Bug Affects Versions: 1.17.0 Reporter: Paul Rogers Change the output format to JSON and create a CSV file: {noformat} ALTER SESSION SET `store.format` = 'csv'; CREATE TABLE foo AS ... {noformat} You will end up with a directory "foo" that contains a CSV file: "0_0_0.csv". Now, try to query that file: {noformat} SELECT * FROM foo {noformat} The query will fail, or return incorrect results, because in Drill, the "csv" read format is CSV *without* headers. But, on write, "csv" is CSV *with* headers. The (very messy) workaround is to manually rename all the files to use the ".csvh" suffix, or to create a separate storage plugin config for that target with a new "csv" format plugin that does not have headers. Expected that if I create a file in Drill I should be able to immediately read that file without extra hokey-pokey. -- This message was sent by Atlassian Jira (v8.3.4#803005)