[GitHub] [drill] arina-ielchiieva merged pull request #2059: DRILL-7704: Update Maven to 3.6.3

2020-04-17 Thread GitBox
arina-ielchiieva merged pull request #2059: DRILL-7704: Update Maven to 3.6.3
URL: https://github.com/apache/drill/pull/2059
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #2059: DRILL-7704: Update Maven to 3.6.3

2020-04-17 Thread GitBox
arina-ielchiieva commented on issue #2059: DRILL-7704: Update Maven to 3.6.3
URL: https://github.com/apache/drill/pull/2059#issuecomment-615120696
 
 
   @vvysotskyi / @paul-rogers thanks for the code review.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #2054: DRILL-6168: Revise format plugin table functions

2020-04-17 Thread GitBox
arina-ielchiieva commented on issue #2054: DRILL-6168: Revise format plugin 
table functions
URL: https://github.com/apache/drill/pull/2054#issuecomment-615128763
 
 
   @paul-rogers changes look good to me. I have ran Functional / Advanced tests 
and there are four failures. Please take a look and make changes in code or 
suggest how tests should be updated.
   
   ```
   Data Verification Failures:
   
   
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_2.q
   
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_1.q
   
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_4.q
   
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_13.q
   ```
   
   Detailed output:
   ```
   Data Verification Failures:
   
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_2.q
   select * from table(`table_function/cr_lf.csv`(type=>'text', 
lineDelimiter=>'\r\n'))
   
   Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_2.e
Expected number of rows: 4
   Actual number of rows from Drill: 4
Number of matching rows: 0
 Number of rows missing: 4
  Number of rows unexpected: 4
   
   These rows are not expected (first 10):
   ["1","aaa","bbb"]
   ["2","ccc","ddd"]
   ["3","eee",""]
   ["4","fff","ggg"]
   
   These rows are missing (first 10):
   ["3,eee,"] (1 occurence(s))
   ["1,aaa,bbb"] (1 occurence(s))
   ["2,ccc,ddd"] (1 occurence(s))
   ["4,fff,ggg"] (1 occurence(s))
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_1.q
   select columns[0] from table(`table_function/cr_lf.csv`(type=>'text', 
lineDelimiter=>'\r\n'))
   
   Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_1.e
Expected number of rows: 4
   Actual number of rows from Drill: 4
Number of matching rows: 0
 Number of rows missing: 4
  Number of rows unexpected: 4
   
   These rows are not expected (first 10):
   1
   2
   3
   4
   
   These rows are missing (first 10):
   3,eee, (1 occurence(s))
   1,aaa,bbb (1 occurence(s))
   2,ccc,ddd (1 occurence(s))
   4,fff,ggg (1 occurence(s))
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_4.q
   select * from table(`table_function/lf_cr.tsv`(type=>'text', 
lineDelimiter=>'\n\r'))
   
   Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_4.e
Expected number of rows: 4
   Actual number of rows from Drill: 4
Number of matching rows: 0
 Number of rows missing: 4
  Number of rows unexpected: 4
   
   These rows are not expected (first 10):
   ["1","aaa","bbb"]
   ["2","ccc","ddd"]
   ["3","eee",""]
   ["4","fff","ggg"]
   
   These rows are missing (first 10):
   ["3\teee\t"] (1 occurence(s))
   ["2\tccc\tddd"] (1 occurence(s))
   ["1\taaa\tbbb"] (1 occurence(s))
   ["4\tfff\tggg"] (1 occurence(s))
   Query: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_13.q
   select * from 
table(`table_function/chinese.txt`(type=>'text',lineDelimiter=>'电脑坏了'))
   
   Baseline: 
/root/drillAutomation/drill-test-framework/framework/resources/Functional/table_function/positive/data/drill-3149_13.e
Expected number of rows: 4
   Actual number of rows from Drill: 4
Number of matching rows: 0
 Number of rows missing: 4
  Number of rows unexpected: 4
   
   These rows are not expected (first 10):
   ["1","aaa","bbb"]
   ["2","ccc","ddd"]
   ["3","eee",""]
   ["4","fff","ggg"]
   
   These rows are missing (first 10):
   ["3,eee,"] (1 occurence(s))
   ["1,aaa,bbb"] (1 occurence(s))
   ["2,ccc,ddd"] (1 occurence(s))
   ["4,fff,ggg"] (1 occurence(s))
   ```
   
   
https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Functional/table_function/positive/data
   
https://github.com/mapr/drill-test-framework/tree/master/framework/resources/Datasources/table_function


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] arina-ielchiieva commented on issue #2058: DRILL-7703: Support for 3+D arrays in EVF JSON loader

2020-04-17 Thread GitBox
arina-ielchiieva commented on issue #2058: DRILL-7703: Support for 3+D arrays 
in EVF JSON loader
URL: https://github.com/apache/drill/pull/2058#issuecomment-615181989
 
 
   +1, LGTM.
   Additionally ran Functional / Advanced tests, all passed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[NOTICE] Maven 3.6.3

2020-04-17 Thread Arina Ielchiieva
Hi all,

Starting from Drill 1.18.0 (and current master from commit 20ad3c9 [1]), Drill 
build will require Maven 3.6.3, otherwise build will fail.
Please make sure you have Maven 3.6.3 installed on your environments. 

[1] 
https://github.com/apache/drill/commit/20ad3c9837e9ada149c246fc7a4ac1fe02de6fe8

Kind regards,
Arina

[jira] [Created] (DRILL-7705) Update jQuery and Bootstrap libraries

2020-04-17 Thread Anton Gozhiy (Jira)
Anton Gozhiy created DRILL-7705:
---

 Summary: Update jQuery and Bootstrap libraries
 Key: DRILL-7705
 URL: https://issues.apache.org/jira/browse/DRILL-7705
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.17.0
Reporter: Anton Gozhiy
Assignee: Anton Gozhiy
 Fix For: 1.18.0


There are some vulnerabilities present in jQuery and Bootstrap libraries used 
in Drill:
* jQuery before 3.4.0, as used in Drupal, Backdrop CMS, and other products, 
mishandles jQuery.extend(true, {}, ...) because of Object.prototype pollution. 
If an unsanitized source object contained an enumerable __proto__ property, it 
could extend the native Object.prototype.
* In Bootstrap before 4.1.2, XSS is possible in the collapse data-parent 
attribute.
* In Bootstrap before 4.1.2, XSS is possible in the data-container property of 
tooltip.
* In Bootstrap before 3.4.0, XSS is possible in the affix configuration target 
property.
* In Bootstrap before 3.4.1 and 4.3.x before 4.3.1, XSS is possible in the 
tooltip or popover data-template attribute.

The following update is suggested to fix them:
* jQuery: 3.2.1 -> 3.5.0
* Bootstrap: 3.1.1 -> 4.4.1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7706) Drill RDBMS Metastore

2020-04-17 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7706:
---

 Summary: Drill RDBMS Metastore
 Key: DRILL-7706
 URL: https://issues.apache.org/jira/browse/DRILL-7706
 Project: Apache Drill
  Issue Type: New Feature
Affects Versions: 1.17.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.18.0


Currently Drill has only one Metastore implementation based on Iceberg tables. 
Iceberg tables are file based storage that supports concurrent writes / reads 
but required to be placed on distributed file system. 

This Jira aims to implement Drill RDBMS Metastore which will store Drill 
Metastore metadata in the database of the user's choice. Currently, PostgreSQL 
and MySQL databases are supported, others might work as well but no testing was 
done. Also out of box for demonstration / testing purposes Drill will setup 
SQLite file based embedded database but this is only applicable for Drill in 
embedded mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7707) Unable to analyze table metadata is it resides in non-writable workspace

2020-04-17 Thread Arina Ielchiieva (Jira)
Arina Ielchiieva created DRILL-7707:
---

 Summary: Unable to analyze table metadata is it resides in 
non-writable workspace
 Key: DRILL-7707
 URL: https://issues.apache.org/jira/browse/DRILL-7707
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Arina Ielchiieva


Unable to analyze table metadata is it resides in non-writable workspace:

{noformat}
apache drill> analyze table cp.`employee.json` refresh metadata;
Error: VALIDATION ERROR: Unable to create or drop objects. Schema [cp] is 
immutable.
{noformat}

Stacktrace:
{noformat}
[Error Id: b7f233cd-f090-491e-a487-5fc4c25444a4 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
at 
org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToDrillSchemaInternal(SchemaUtilites.java:230)
at 
org.apache.drill.exec.planner.sql.SchemaUtilites.resolveToDrillSchema(SchemaUtilites.java:208)
at 
org.apache.drill.exec.planner.sql.handlers.DrillTableInfo.getTableInfoHolder(DrillTableInfo.java:101)
at 
org.apache.drill.exec.planner.sql.handlers.MetastoreAnalyzeTableHandler.getPlan(MetastoreAnalyzeTableHandler.java:108)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:128)
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:593)
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [drill] arina-ielchiieva opened a new pull request #2060: DRILL-7706: Implement Drill RDBMS Metastore

2020-04-17 Thread GitBox
arina-ielchiieva opened a new pull request #2060: DRILL-7706: Implement Drill 
RDBMS Metastore
URL: https://github.com/apache/drill/pull/2060
 
 
   # [DRILL-7706](https://issues.apache.org/jira/browse/DRILL-7706): Implement 
Drill RDBMS Metastore
   
   ## Description
   
   Currently Drill has only one Metastore implementation based on Iceberg 
tables. Iceberg tables are file based storage that supports concurrent writes / 
reads but required to be placed on distributed file system.
   
   This PR aims to implement Drill RDBMS Metastore which will store Drill 
Metastore metadata in the database of the user's choice. Currently, PostgreSQL 
and MySQL databases are supported, others might work as well but no testing was 
done. Also out of box for demonstration / testing purposes Drill will setup 
SQLite file based embedded database but this is only applicable for Drill in 
embedded mode.
   
   1. Fix issue with undeterministic execution of batch update / delete 
statements, now they will be executed in the same order as they were added.
   2. Abstracted Metastore common test classes to be used by different 
Metastore implementations.
   3. Added drill-metastore-override-example.conf with example of Drill 
Metastore configuration.
   4. Replaced list of metadata types which are required to be passed during 
read / write operations with set to avoid possible duplicates.
   5. Add RDBMS Metastore implementation, README.md and unit tests.
   
   ## Documentation
   RDBMS Metastore section should be added: 
http://drill.apache.org/docs/using-drill-metastore/
   
   ## Testing
   Ran all unit tests, Functional & Advanced. Tested manually with SQLite, 
PostgreSQL, MySQL.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[DISCUSS]: Masking Creds in Query Plans

2020-04-17 Thread Charles Givre
Hello all, 
I was thinking about this, if a user were to execute an EXPLAIN PLAN FOR query, 
they get a lot of information about the storage plugin, including in some cases 
creds.
The example below shows a query plan for the JDBC storage plugin.   As you can 
see, the user creds are right there 

I'm wondering would it be advisable or possible to mask the creds in query 
plans so that users can't access this information?  If masking it isn't an 
option, is there some other way to prevent users from seeing this information?  
In a multi-tenant environment, it seems like a rather large security hole. 
Thanks,
-- C


{
  "head" : {
"version" : 1,
"generator" : {
  "type" : "ExplainHandler",
  "info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ ],
"queue" : 0,
"hasResourcePlan" : false,
"resultMode" : "EXEC"
  },
  "graph" : [ {
"pop" : "jdbc-scan",
"@id" : 5,
"sql" : "SELECT *\nFROM `stats`.`batting`",
"columns" : [ "`playerID`", "`yearID`", "`stint`", "`teamID`", "`lgID`", 
"`G`", "`AB`", "`R`", "`H`", "`2B`", "`3B`", "`HR`", "`RBI`", "`SB`", "`CS`", 
"`BB`", "`SO`", "`IBB`", "`HBP`", "`SH`", "`SF`", "`GIDP`" ],
"config" : {
  "type" : "jdbc",
  "driver" : "com.mysql.cj.jdbc.Driver",
  "url" : "jdbc:mysql://localhost:3306/?serverTimezone=EST5EDT",
  "username" : "",
  "password" : "",
  "caseInsensitiveTableNames" : false,
  "sourceParameters" : { },
  "enabled" : true
},
"userName" : "",
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 100.0
}
  }, {
"pop" : "limit",
"@id" : 4,
"child" : 5,
"first" : 0,
"last" : 10,
"initialAllocation" : 100,
"maxAllocation" : 100,
"cost" : {
  "memoryCost" : 1.6777216E7,
  "outputRowCount" : 10.0
}
  }, {
"pop" : "limit",
"@id" : 3,




Re: [NOTICE] Maven 3.6.3

2020-04-17 Thread Paul Rogers
Hi Arina,

Thanks for keeping us up to date!

As it turns out, I use Ubuntu (Linux Mint) for development. Maven is installed 
as a package using apt-get. Packages can lag behind a bit. The latest maven 
available via apt-get is 3.6.0.

It is a nuisance to install a new version outside the package manager. I 
changed the Maven version in the root pom.xml to 3.6.0 and the build seemed to 
work. Any reason we need the absolute latest version rather than just 3.6.0 or 
later?

The workaround for now is to manually edit the pom.xml file on each checkout, 
then revert the change before commit. Can we maybe adjust the "official" 
version instead?


Thanks,
- Paul

 

On Friday, April 17, 2020, 5:09:49 AM PDT, Arina Ielchiieva 
 wrote:  
 
 Hi all,

Starting from Drill 1.18.0 (and current master from commit 20ad3c9 [1]), Drill 
build will require Maven 3.6.3, otherwise build will fail.
Please make sure you have Maven 3.6.3 installed on your environments. 

[1] 
https://github.com/apache/drill/commit/20ad3c9837e9ada149c246fc7a4ac1fe02de6fe8

Kind regards,
Arina  

[GitHub] [drill] paul-rogers commented on issue #2060: DRILL-7706: Implement Drill RDBMS Metastore

2020-04-17 Thread GitBox
paul-rogers commented on issue #2060: DRILL-7706: Implement Drill RDBMS 
Metastore
URL: https://github.com/apache/drill/pull/2060#issuecomment-615394559
 
 
   @arina-ielchiieva, this is great! I wonder, will this PR handle a particular 
use case that I'm seeing?
   
   The metastore works for files. We are seeing more cases where users want to 
read other data sources: HTTP, Mongo, whatever. These sources deliver JSON, 
which can be ambiguous. We want to use a provided schema to resolve 
ambiguities. (That work is ongoing in updating the JSON reader, etc.) So, we 
need a place to store the provided schema.
   
   Recall that, for files, we put the provided schema in the same folder as the 
data. For the HTTP plugin, say, there is no directory to place the files.
   
   So, can we use the DB-backed metastore? Will there be a standard way for a 
plugin to map its concept of a table to an entry in the metastore which can 
hold the provided schema?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on issue #2060: DRILL-7706: Implement Drill RDBMS Metastore

2020-04-17 Thread GitBox
paul-rogers commented on issue #2060: DRILL-7706: Implement Drill RDBMS 
Metastore
URL: https://github.com/apache/drill/pull/2060#issuecomment-615395578
 
 
   @arina-ielchiieva, another related question. We currently use ZK to store 
things like plugin configs, dynamic UDFS, etc. Charles just asked about the 
problems storing credentials that way.
   
   I wonder if the DB support here is (or can be) generalized to allow a 
variety of stores (the way we have multiple "pstores" for ZK.) If so, then, 
over time, we could implement alternative, DB-based "pstores" for plugins, 
UDFs, security credentials and more.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table functions

2020-04-17 Thread GitBox
paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table 
functions
URL: https://github.com/apache/drill/pull/2054#issuecomment-615408581
 
 
   @arina-ielchiieva, thanks much for running the tests! Looks like these 
failures are due to the verifying the incorrect prior behavior where the 
default field delimiter was newline (same as the line delimiter), not comma. 
This PR changes the default to comma.
   
   Let's consider an example.  For this query:
   
   ```
   select * from table(`table_function/cr_lf.csv`(type=>'text', 
lineDelimiter=>'\r\n'));
   ```
   
   With this input:
   
   ```
   1,aaa,bbb
   2,ccc,ddd
   3,eee,
   4,fff,ggg
   ```
   
   We currently expect this output because the old default field delimiter is a 
newline (same as the line delimiter):
   
   ```
   ["1,aaa,bbb"]
   ["2,ccc,ddd"]
   ["3,eee,"]
   ["4,fff,ggg"]
   ```
   
   The correct expected results, with a default field delimiter of comma, is:
   
   ```
   ["1","aaa","bbb"]
   ["2","ccc","ddd"]
   ["3","eee",""]
   ["4","fff","ggg"]
   ```
   
   Will investigate how to fix the tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-7708) Downgrade maven from 3.6.3 to 3.6.0

2020-04-17 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7708:
--

 Summary: Downgrade maven from 3.6.3 to 3.6.0
 Key: DRILL-7708
 URL: https://issues.apache.org/jira/browse/DRILL-7708
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.18.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.18.0


DRILL-7704 upgraded Drill's Maven version to 3.6.3.


As it turns out, I use Ubuntu (Linux Mint) for development. Maven is installed 
as a package using apt-get. Packages can lag behind a bit. The latest maven 
available via apt-get is 3.6.0.


It is a nuisance to install a new version outside the package manager. I 
changed the Maven version in the root pom.xml to 3.6.0 and the build seemed to 
work. Any reason we need the absolute latest version rather than just 3.6.0 or 
later?


The workaround for now is to manually edit the pom.xml file on each checkout, 
then revert the change before commit. This ticket requests to adjust the 
"official" version to 3.6.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [drill] paul-rogers opened a new pull request #2061: DRILL-7708: Downgrade maven from 3.6.3 to 3.6.0

2020-04-17 Thread GitBox
paul-rogers opened a new pull request #2061: DRILL-7708: Downgrade maven from 
3.6.3 to 3.6.0
URL: https://github.com/apache/drill/pull/2061
 
 
   # [DRILL-7708](https://issues.apache.org/jira/browse/DRILL-7708): Downgrade 
maven from 3.6.3 to 3.6.0
   
   ## Description
   
   Thanks to @arina-ielchiieva for recently upgrading Drill's Maven version to 
3.6.3.
   
   As it turns out, I use Ubuntu (Linux Mint) for development. Maven is 
installed as a package using apt-get. Packages can lag behind a bit. The latest 
maven available via apt-get is 3.6.0.
   
   It is a nuisance to install a new version outside the package manager. I 
changed the Maven version in the root pom.xml to 3.6.0 and the build seemed to 
work. This PR adjusts the required Maven version to 3.6.0.
   
   ## Documentation
   
   Maven 3.6.0 is required to build Drill.
   
   ## Testing
   
   Ran the full build.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] vvysotskyi commented on issue #2061: DRILL-7708: Downgrade maven from 3.6.3 to 3.6.0

2020-04-17 Thread GitBox
vvysotskyi commented on issue #2061: DRILL-7708: Downgrade maven from 3.6.3 to 
3.6.0
URL: https://github.com/apache/drill/pull/2061#issuecomment-615434184
 
 
   I'm not sure that this is a good way to align the Maven version to the 
version supported by the system package manager.
   For example, for people who use Ubuntu 16.04LTS, only [Maven 
3.3.9](https://packages.ubuntu.com/xenial/maven) was available, but people who 
already use Ubuntu 20.04 can install [Maven 
3.6.3](https://packages.ubuntu.com/focal/maven). I don't remember time when I 
hadn't to install it manually from the binary archive on CentOS 6.X and some 
7.X versions. For macOS, brew already provides Maven 3.6.3.
   
   But having a newer Maven version allows using newer plugin versions with 
their new features.
   
   As a side point, Apache Spark also uses Maven 3.6.3: 
https://github.com/apache/spark/blob/master/pom.xml#L118.
   
   For your case, I would recommend either adding new release mirror `deb 
http://cz.archive.ubuntu.com/ubuntu focal main universe` to 
`/etc/apt/sources.list` and installing newer version using the apt package 
manager or downloading the package from 
[there](https://packages.ubuntu.com/focal/all/maven/download) and installing it 
using `dpgk -i`.
   I have tried the first option on Ubuntu 18.04 and it worked fine for me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers closed pull request #2061: DRILL-7708: Downgrade maven from 3.6.3 to 3.6.0

2020-04-17 Thread GitBox
paul-rogers closed pull request #2061: DRILL-7708: Downgrade maven from 3.6.3 
to 3.6.0
URL: https://github.com/apache/drill/pull/2061
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table functions

2020-04-17 Thread GitBox
paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table 
functions
URL: https://github.com/apache/drill/pull/2054#issuecomment-615470669
 
 
   A similar analysis applies to the other queries. For `drill-3149_1`:
   
   ```
   select columns[0] from table(`table_function/cr_lf.csv`(type=>'text', 
lineDelimiter=>'\r\n'))
   ```
   
   Expected (confusing: each line is one field, `columns[0]`):
   
   ```
   1,aaa,bbb
   2,ccc,ddd
   3,eee,
   4,fff,ggg
   ```
   
   New results with column as the field delimiter rather than newline:
   
   ```
   1
   2
   3
   4
   ```
   
   Next query, `drill-3149_4`:
   
   ```
   select * from table(`table_function/lf_cr.tsv`(type=>'text', 
lineDelimiter=>'\n\r'))
   ```
   
   Expected:
   
   ```
   ["1\taaa\tbbb"]
   ["2\tccc\tddd"]
   ["3\teee\t"]
   ["4\tfff\tggg"]
   ```
   
   Actual:
   
   ```
   ["1","aaa","bbb"]
   ["2","ccc","ddd"]
   ["3","eee",""]
   ["4","fff","ggg"]
   ```
   
   Notice how the expected results are wrong. We are using a tsv (tab separated 
file) but we expect the query to treat the tab as a normal character. With this 
PR, we change only the line delimiter, not the field delimiter, which seems 
more accurate.
   
   Finally, `drill-3149_13`:
   
   ```
   select * from 
table(`table_function/chinese.txt`(type=>'text',lineDelimiter=>'电脑坏了'))
   ```
   
   Expected (all columns in a single field):
   
   ```
   ["1,aaa,bbb"]
   ["2,ccc,ddd"]
   ["3,eee,"]
   ["4,fff,ggg"]
   ```
   
   With this PR we get the correct results:
   
   ```
   ["1","aaa","bbb"]
   ["2","ccc","ddd"]
   ["3","eee",""]
   ["4","fff","ggg"]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [drill] paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table functions

2020-04-17 Thread GitBox
paul-rogers commented on issue #2054: DRILL-6168: Revise format plugin table 
functions
URL: https://github.com/apache/drill/pull/2054#issuecomment-615472724
 
 
   @arina-ielchiieva, the above analysis shows that this PR provides the 
correct results, the prior code and tests verified incorrect results. So, let's 
do this:
   
   1. Approve and commit this PR.
   2. Immediately update the four faulty expected results (`_e`) files in the 
test framework.
   3. Rerun the selected tests to verify the test fixes.
   
   Note that we cannot reverse the order: if we change the tests first, all 
runs except this PR will fail.
   
   It would be ideal to have two sets of results: one before this PR, one 
after. Or, disable the four tests for commits before this PR. But, the test 
framework does not seem to have version awareness.
   
   Unfortunately, I'm not yet set up to run the test framework. I can, however, 
fix the result files. Should I do a PR against the test framework? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [DISCUSS]: Masking Creds in Query Plans

2020-04-17 Thread Paul Rogers
Hi Charles,

Excellent point. The problem is deeper. Drill serializes plugin configs in the 
query plan which it sends to each worker (Drillbit.) Why? To avoid race 
conditions if you start a query then change the plugin config and thus 
different nodes see different versions of the config.

Masking can't happen in the execution plan or the plan won't work. (I hope your 
password is not actually "***".) So, masking would have to happen in logs 
and in the EXPLAIN PLAN FOR. This would, in turn, require that we have code 
that understands each config well enough to make a copy of the config with the 
credentials masked so we can then serialize the copied plan to JSON. (Or, we'd 
have to edit the JSON after generated.) Both are pretty ugly and not very 
secure.

What we need is some kind of "vault" interface: a config which is a key into a 
vault where Drill itself has been given the key, and the vault returns the 
actual credential value. As a security guy yourself, what would you recommend 
as our target? Should we create a generic API? Is there some system common 
enough on Hadoop systems that we should target that as our reference 
implementation? Also, can you perhaps file a JIRA ticket for this issue?

Thanks,
- Paul

 

On Friday, April 17, 2020, 7:34:32 AM PDT, Charles Givre  
wrote:  
 
 Hello all, 
I was thinking about this, if a user were to execute an EXPLAIN PLAN FOR query, 
they get a lot of information about the storage plugin, including in some cases 
creds.
The example below shows a query plan for the JDBC storage plugin.  As you can 
see, the user creds are right there 

I'm wondering would it be advisable or possible to mask the creds in query 
plans so that users can't access this information?  If masking it isn't an 
option, is there some other way to prevent users from seeing this information?  
In a multi-tenant environment, it seems like a rather large security hole. 
Thanks,
-- C


{
  "head" : {
    "version" : 1,
    "generator" : {
      "type" : "ExplainHandler",
      "info" : ""
    },
    "type" : "APACHE_DRILL_PHYSICAL",
    "options" : [ ],
    "queue" : 0,
    "hasResourcePlan" : false,
    "resultMode" : "EXEC"
  },
  "graph" : [ {
    "pop" : "jdbc-scan",
    "@id" : 5,
    "sql" : "SELECT *\nFROM `stats`.`batting`",
    "columns" : [ "`playerID`", "`yearID`", "`stint`", "`teamID`", "`lgID`", 
"`G`", "`AB`", "`R`", "`H`", "`2B`", "`3B`", "`HR`", "`RBI`", "`SB`", "`CS`", 
"`BB`", "`SO`", "`IBB`", "`HBP`", "`SH`", "`SF`", "`GIDP`" ],
    "config" : {
      "type" : "jdbc",
      "driver" : "com.mysql.cj.jdbc.Driver",
      "url" : "jdbc:mysql://localhost:3306/?serverTimezone=EST5EDT",
      "username" : "",
      "password" : "",
      "caseInsensitiveTableNames" : false,
      "sourceParameters" : { },
      "enabled" : true
    },
    "userName" : "",
    "cost" : {
      "memoryCost" : 1.6777216E7,
      "outputRowCount" : 100.0
    }
  }, {
    "pop" : "limit",
    "@id" : 4,
    "child" : 5,
    "first" : 0,
    "last" : 10,
    "initialAllocation" : 100,
    "maxAllocation" : 100,
    "cost" : {
      "memoryCost" : 1.6777216E7,
      "outputRowCount" : 10.0
    }
  }, {
    "pop" : "limit",
    "@id" : 3,

  

[GitHub] [drill] paul-rogers merged pull request #2058: DRILL-7703: Support for 3+D arrays in EVF JSON loader

2020-04-17 Thread GitBox
paul-rogers merged pull request #2058: DRILL-7703: Support for 3+D arrays in 
EVF JSON loader
URL: https://github.com/apache/drill/pull/2058
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (DRILL-7709) CTAS as CSV creates files which the "csv" plugin can't read

2020-04-17 Thread Paul Rogers (Jira)
Paul Rogers created DRILL-7709:
--

 Summary: CTAS as CSV creates files which the "csv" plugin can't 
read
 Key: DRILL-7709
 URL: https://issues.apache.org/jira/browse/DRILL-7709
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Paul Rogers


Change the output format to JSON and create a CSV file:
{noformat}
ALTER SESSION SET `store.format` = 'csv';
CREATE TABLE foo AS ...
 {noformat}

You will end up with a directory "foo" that contains a CSV file: "0_0_0.csv". 
Now, try to query that file:

{noformat}
SELECT * FROM foo
{noformat}

The query will fail, or return incorrect results, because in Drill, the "csv" 
read format is CSV *without* headers. But, on write, "csv" is CSV *with* 
headers.

The (very messy) workaround is to manually rename all the files to use the 
".csvh" suffix, or to create a separate storage plugin config for that target 
with a new "csv" format plugin that does not have headers.

Expected that if I create a file in Drill I should be able to immediately read 
that file without extra hokey-pokey.
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)