[jira] [Commented] (DRILL-4143) REFRESH TABLE METADATA - Permission Issues with metadata files
[ https://issues.apache.org/jira/browse/DRILL-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294439#comment-15294439 ] ASF GitHub Bot commented on DRILL-4143: --- Github user chunhui-shi closed the pull request at: https://github.com/apache/drill/pull/470 > REFRESH TABLE METADATA - Permission Issues with metadata files > -- > > Key: DRILL-4143 > URL: https://issues.apache.org/jira/browse/DRILL-4143 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.3.0, 1.4.0 >Reporter: John Omernik >Assignee: Chunhui Shi > Labels: Metadata, Parquet, Permissions > Fix For: Future > > > Summary of Refresh Metadata Issues confirmed by two different users on Drill > User Mailing list. (Title: REFRESH TABLE METADATA - Access Denied) > This issue pertains to table METADATA and revolves around user > authentication. > Basically, when the drill bits are running as one user, and the data is owned > by another user, there can be access denied issues on subsequent queries > after issuing a REFRESH TABLE METADATA command. > To troubleshoot what is actually happening, I turned on MapR Auditing (This > is a handy feature) and found that when I run a query (that is giving me > access denied.. my query is select count(1) from testtable ) Per MapR the > user I am logged in as (dataowner) is trying to do a create operation on the > .drill.parquet_metadata file and it's failing with status: 17. Per Keys at > MapR, "status 17 means errno 17 which means EEXIST. Looks like Drill is > trying to create a file that already exists." This seems to indicate that > drill is perhaps trying to create the .drill.parquet_metadata on each select > as the dataowner user, but the permissions (as seen below) don't allow it. > Here are the steps to reproduce: > Enable Authentication. > Run all drill bits in the cluster as "drillbituser", then have the files > owned by "dataowner". Note the root of the table permissions are drwxrwxr-x > but as Drill loads each partition it loads them as drwxr-xr-x (all with > dataowner:dataowner ownership). That may be something too, the default > permissions when creating a table? Another note, in my setup, drillbituser > is in the group for dataowner. Thus, they should always have read access. > # Authenticated as dataowner (this should have full permissions to all the > data) > Enter username for jdbc:drill:zk=zknode1:5181: dataowner > Enter password for jdbc:drill:zk=zknode1:5181: ** > 0: jdbc:drill:zk=zknode1> use dfs.dev; > +---+--+ > | ok | summary| > +---+--+ > | true | Default schema changed to [dfs.dev] | > +---+--+ > 1 row selected (0.307 seconds) > # The query works fine with no table metadata > 0: jdbc:drill:zk=zknode1> select count(1) from `testtable`; > +---+ > | EXPR$0 | > +---+ > | 24565203 | > +---+ > 1 row selected (3.392 seconds) > # Refresh of metadata works under with no errors > 0: jdbc:drill:zk=zknode1> refresh table metadata `testtable`; > +---+---+ > | ok |summary| > +---+---+ > | true | Successfully updated metadata for table testtable. | > +---+---+ > 1 row selected (5.767 seconds) > > # Trying to run the same query, it returns a access denied issue. > 0: jdbc:drill:zk=zknode1> select count(1) from `testtable`; > Error: SYSTEM ERROR: IOException: 2127.7646.2950962 > /data/dev/testtable/2015-11-12/.drill.parquet_metadata (Permission denied) > > > [Error Id: 7bfce2e7-f78d-4fba-b047-f4c85b471de4 on node1:31010] > (state=,code=0) > > > # Note how all the files are owned by the drillbituser. Per discussion on > list, this is normal > > $ find ./ -type f -name ".drill.parquet_metadata" -exec ls -ls {} \; > 726 -rwxr-xr-x 1 drillbituser drillbituser 742837 Nov 30 14:27 > ./2015-11-12/.drill.parquet_metadata > 583 -rwxr-xr-x 1 drillbituser drillbituser 596146 Nov 30 14:27 > ./2015-11-29/.drill.parquet_metadata > 756 -rwxr-xr-x 1 drillbituser drillbituser 773811 Nov 30 14:27 > ./2015-11-11/.drill.parquet_metadata > 763 -rwxr-xr-x 1 drillbituser drillbituser 780829 Nov 30 14:27 > ./2015-11-04/.drill.parquet_metadata > 632 -rwxr-xr-x 1 drillbituser drillbituser 646851 Nov 30 14:27 > ./2015-11-08/.drill.parquet_metadata > 845 -rwxr-xr-x 1 drillbituser drillbituser 864421 Nov 30 14:27 > ./2015-11-05/.drill.parquet_metadata > 771 -rwxr-xr-x 1 drillbituser
[jira] [Commented] (DRILL-4143) REFRESH TABLE METADATA - Permission Issues with metadata files
[ https://issues.apache.org/jira/browse/DRILL-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294362#comment-15294362 ] ASF GitHub Bot commented on DRILL-4143: --- Github user amansinha100 commented on the pull request: https://github.com/apache/drill/pull/470#issuecomment-220730748 Committed in 3d92d2829 > REFRESH TABLE METADATA - Permission Issues with metadata files > -- > > Key: DRILL-4143 > URL: https://issues.apache.org/jira/browse/DRILL-4143 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.3.0, 1.4.0 >Reporter: John Omernik >Assignee: Chunhui Shi > Labels: Metadata, Parquet, Permissions > Fix For: Future > > > Summary of Refresh Metadata Issues confirmed by two different users on Drill > User Mailing list. (Title: REFRESH TABLE METADATA - Access Denied) > This issue pertains to table METADATA and revolves around user > authentication. > Basically, when the drill bits are running as one user, and the data is owned > by another user, there can be access denied issues on subsequent queries > after issuing a REFRESH TABLE METADATA command. > To troubleshoot what is actually happening, I turned on MapR Auditing (This > is a handy feature) and found that when I run a query (that is giving me > access denied.. my query is select count(1) from testtable ) Per MapR the > user I am logged in as (dataowner) is trying to do a create operation on the > .drill.parquet_metadata file and it's failing with status: 17. Per Keys at > MapR, "status 17 means errno 17 which means EEXIST. Looks like Drill is > trying to create a file that already exists." This seems to indicate that > drill is perhaps trying to create the .drill.parquet_metadata on each select > as the dataowner user, but the permissions (as seen below) don't allow it. > Here are the steps to reproduce: > Enable Authentication. > Run all drill bits in the cluster as "drillbituser", then have the files > owned by "dataowner". Note the root of the table permissions are drwxrwxr-x > but as Drill loads each partition it loads them as drwxr-xr-x (all with > dataowner:dataowner ownership). That may be something too, the default > permissions when creating a table? Another note, in my setup, drillbituser > is in the group for dataowner. Thus, they should always have read access. > # Authenticated as dataowner (this should have full permissions to all the > data) > Enter username for jdbc:drill:zk=zknode1:5181: dataowner > Enter password for jdbc:drill:zk=zknode1:5181: ** > 0: jdbc:drill:zk=zknode1> use dfs.dev; > +---+--+ > | ok | summary| > +---+--+ > | true | Default schema changed to [dfs.dev] | > +---+--+ > 1 row selected (0.307 seconds) > # The query works fine with no table metadata > 0: jdbc:drill:zk=zknode1> select count(1) from `testtable`; > +---+ > | EXPR$0 | > +---+ > | 24565203 | > +---+ > 1 row selected (3.392 seconds) > # Refresh of metadata works under with no errors > 0: jdbc:drill:zk=zknode1> refresh table metadata `testtable`; > +---+---+ > | ok |summary| > +---+---+ > | true | Successfully updated metadata for table testtable. | > +---+---+ > 1 row selected (5.767 seconds) > > # Trying to run the same query, it returns a access denied issue. > 0: jdbc:drill:zk=zknode1> select count(1) from `testtable`; > Error: SYSTEM ERROR: IOException: 2127.7646.2950962 > /data/dev/testtable/2015-11-12/.drill.parquet_metadata (Permission denied) > > > [Error Id: 7bfce2e7-f78d-4fba-b047-f4c85b471de4 on node1:31010] > (state=,code=0) > > > # Note how all the files are owned by the drillbituser. Per discussion on > list, this is normal > > $ find ./ -type f -name ".drill.parquet_metadata" -exec ls -ls {} \; > 726 -rwxr-xr-x 1 drillbituser drillbituser 742837 Nov 30 14:27 > ./2015-11-12/.drill.parquet_metadata > 583 -rwxr-xr-x 1 drillbituser drillbituser 596146 Nov 30 14:27 > ./2015-11-29/.drill.parquet_metadata > 756 -rwxr-xr-x 1 drillbituser drillbituser 773811 Nov 30 14:27 > ./2015-11-11/.drill.parquet_metadata > 763 -rwxr-xr-x 1 drillbituser drillbituser 780829 Nov 30 14:27 > ./2015-11-04/.drill.parquet_metadata > 632 -rwxr-xr-x 1 drillbituser drillbituser 646851 Nov 30 14:27 > ./2015-11-08/.drill.parquet_metadata > 845 -rwxr-xr-x 1 drillbituser drillbituser 864421 Nov 30 14:27 >
[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294360#comment-15294360 ] ASF GitHub Bot commented on DRILL-4679: --- Github user amansinha100 closed the pull request at: https://github.com/apache/drill/pull/504 > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Jinfeng Ni > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294359#comment-15294359 ] ASF GitHub Bot commented on DRILL-4679: --- Github user amansinha100 commented on the pull request: https://github.com/apache/drill/pull/504#issuecomment-220730151 Committed in 3d92d2829. > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Jinfeng Ni > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4571) Add link to local Drill logs from the web UI
[ https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal updated DRILL-4571: --- Attachment: drillbit_ui.log drillbit_download.log.gz Log files > Add link to local Drill logs from the web UI > > > Key: DRILL-4571 > URL: https://issues.apache.org/jira/browse/DRILL-4571 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.7.0 > > Attachments: display_log.JPG, drillbit_download.log.gz, > drillbit_ui.log, log_list.JPG > > > Now we have link to the profile from the web UI. > It will be handy for the users to have the link to local logs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4571) Add link to local Drill logs from the web UI
[ https://issues.apache.org/jira/browse/DRILL-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294169#comment-15294169 ] Krystal commented on DRILL-4571: Hi Arina, In verifying this feature, I see some issues as describe below: 1. When the log files are downloaded from the ui, the name of the downloaded file is "download". We should save the file with the same name as the log file (ie. drillbit.log) 2. For Chrome browser, the content of the drillbit.queries.json file is displayed all one 1 line, making it very hard to read. 3. The last 1 lines of the log file displayed in the web UI do not match the log file itself. For your reference, I downloaded the full log from the web UI (this matches the actual log file). I also copied the content of the same log file as shown in the web UI. Doing a diff between the 2 files show that many lines were skipped in the web UI. Attached are these 2 log files for your reference. I have 3 drillbits running. git.commit.id.abbrev=09b2627 > Add link to local Drill logs from the web UI > > > Key: DRILL-4571 > URL: https://issues.apache.org/jira/browse/DRILL-4571 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.7.0 > > Attachments: display_log.JPG, log_list.JPG > > > Now we have link to the profile from the web UI. > It will be handy for the users to have the link to local logs as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294142#comment-15294142 ] ASF GitHub Bot commented on DRILL-4679: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/504#discussion_r64103528 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java --- @@ -146,6 +159,27 @@ protected IterOutcome doWork() { if (next == IterOutcome.OUT_OF_MEMORY) { outOfMemory = true; return next; + } else if (next == IterOutcome.NONE) { +// since this is first batch and we already got a NONE, need to set up the schema + +//allocate vv in the allocationVectors. +for (final ValueVector v : this.allocationVectors) { --- End diff -- Updated PR to address review comment regarding doAlloc(). > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Jinfeng Ni > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4607) Add a split function that allows to separate string by a delimiter
[ https://issues.apache.org/jira/browse/DRILL-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294126#comment-15294126 ] ASF GitHub Bot commented on DRILL-4607: --- GitHub user aaas24 opened a pull request: https://github.com/apache/drill/pull/506 DRILL-4607: Add a split function that allows to separate string by a … …delimiter This patch allows to apply a split function by providing a string and a delimiter. Addressed the review comments from Sudheesh. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aaas24/drill DRILL-4607 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/506.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #506 commit c3bdde43f79c58e94e2fcdf32e782209486e8cca Author: Alicia AlvarezDate: 2016-04-15T18:07:47Z DRILL-4607: Add a split function that allows to separate string by a delimiter > Add a split function that allows to separate string by a delimiter > -- > > Key: DRILL-4607 > URL: https://issues.apache.org/jira/browse/DRILL-4607 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Affects Versions: 1.6.0 >Reporter: Alicia Alvarez > > Ex: Let's say I have records in a CSV file with the following schema > {noformat} > user_name, friend_list_separated_by_a_delimiter,other_fields > ali,sam;adi;tom,45,... > {noformat} > I want to run a query which returns the friend list files as a repeated value. > {noformat} > select user_name, split(friend_list, ';') friends from userdata; > {noformat} > This should return the records in the following format > {noformat} > - > | user_name | friends | > - > | ali | [sam, adi, tom] | > - > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294108#comment-15294108 ] ASF GitHub Bot commented on DRILL-4679: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/504#discussion_r64101122 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java --- @@ -136,6 +145,10 @@ public VectorContainer getOutgoingContainer() { @Override protected IterOutcome doWork() { +if (wasNone) { + return IterOutcome.NONE; +} + int incomingRecordCount = incoming.getRecordCount(); if (first && incomingRecordCount == 0) { --- End diff -- Actually, if the first batch was non-empty, the new changes wouldn't apply because of the following check: if (first && incomingRecordCount == 0) { ... } Then if the next incoming batch is empty, it should continue to work since we have already produced the schema from the first batch. On the other hand if the first batch is empty and we see a NONE iterator outcome, we want to make sure that a schema is produced but at the same time not call next() since a NONE outcome has already been seen. > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Jinfeng Ni > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4607) Add a split function that allows to separate string by a delimiter
[ https://issues.apache.org/jira/browse/DRILL-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15294103#comment-15294103 ] ASF GitHub Bot commented on DRILL-4607: --- Github user aaas24 closed the pull request at: https://github.com/apache/drill/pull/481 > Add a split function that allows to separate string by a delimiter > -- > > Key: DRILL-4607 > URL: https://issues.apache.org/jira/browse/DRILL-4607 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Affects Versions: 1.6.0 >Reporter: Alicia Alvarez > > Ex: Let's say I have records in a CSV file with the following schema > {noformat} > user_name, friend_list_separated_by_a_delimiter,other_fields > ali,sam;adi;tom,45,... > {noformat} > I want to run a query which returns the friend list files as a repeated value. > {noformat} > select user_name, split(friend_list, ';') friends from userdata; > {noformat} > This should return the records in the following format > {noformat} > - > | user_name | friends | > - > | ali | [sam, adi, tom] | > - > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293978#comment-15293978 ] ASF GitHub Bot commented on DRILL-4679: --- Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/504#discussion_r64091394 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java --- @@ -146,6 +159,27 @@ protected IterOutcome doWork() { if (next == IterOutcome.OUT_OF_MEMORY) { outOfMemory = true; return next; + } else if (next == IterOutcome.NONE) { +// since this is first batch and we already got a NONE, need to set up the schema + +//allocate vv in the allocationVectors. +for (final ValueVector v : this.allocationVectors) { --- End diff -- The doAlloc() was calling incoming.getRecordCount() which would fail for empty batches, so I did not use it but I can modify doAlloc to take the count parameter and have everyone call the modified version. > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Jinfeng Ni > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (DRILL-4523) Disallow using loopback address in distributed mode
[ https://issues.apache.org/jira/browse/DRILL-4523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krystal closed DRILL-4523. -- git.commit.id.abbrev=09b2627 Verified that bug is fixed. > Disallow using loopback address in distributed mode > --- > > Key: DRILL-4523 > URL: https://issues.apache.org/jira/browse/DRILL-4523 > Project: Apache Drill > Issue Type: Improvement > Components: Server >Affects Versions: 1.6.0 >Reporter: Arina Ielchiieva >Assignee: Arina Ielchiieva > Fix For: 1.7.0 > > > If we enable debug for org.apache.drill.exec.coord.zk in logback.xml, we only > get the hostname and ports information. For example: > {code} > 2015-11-04 19:47:02,927 [ServiceCache-0] DEBUG > o.a.d.e.c.zk.ZKClusterCoordinator - Cache changed, updating. > 2015-11-04 19:47:02,932 [ServiceCache-0] DEBUG > o.a.d.e.c.zk.ZKClusterCoordinator - Active drillbit set changed. Now > includes 2 total bits. New active drillbits: > h3.poc.com:31010:31011:31012 > h2.poc.com:31010:31011:31012 > {code} > We need to know the IP address of each hostname to do further troubleshooting. > Imagine if any drillbit registers itself as "localhost.localdomain" in > zookeeper, we will never know where it comes from. Enabling IP address > tracking can help this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293598#comment-15293598 ] ASF GitHub Bot commented on DRILL-4679: --- Github user jinfengni commented on the pull request: https://github.com/apache/drill/pull/504#issuecomment-220645472 LGTM. +1 > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Jinfeng Ni > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293596#comment-15293596 ] ASF GitHub Bot commented on DRILL-4679: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/504#discussion_r64064412 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java --- @@ -136,6 +145,10 @@ public VectorContainer getOutgoingContainer() { @Override protected IterOutcome doWork() { +if (wasNone) { + return IterOutcome.NONE; +} + int incomingRecordCount = incoming.getRecordCount(); if (first && incomingRecordCount == 0) { --- End diff -- The new logic will handle the case for Project's first outgoing batch. Not sure whether Drill works properly after the first batch getting data and building the schema, but the next incoming batch contains empty result. We may treat as a separate issue for further investigation. > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Jinfeng Ni > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293579#comment-15293579 ] ASF GitHub Bot commented on DRILL-4679: --- Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/504#discussion_r64063077 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java --- @@ -146,6 +159,27 @@ protected IterOutcome doWork() { if (next == IterOutcome.OUT_OF_MEMORY) { outOfMemory = true; return next; + } else if (next == IterOutcome.NONE) { +// since this is first batch and we already got a NONE, need to set up the schema + +//allocate vv in the allocationVectors. +for (final ValueVector v : this.allocationVectors) { --- End diff -- the allocation logic may use existing method doAlloc(). > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Jinfeng Ni > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4689) Need to support conversion from TIMESTAMP type to TIME type
[ https://issues.apache.org/jira/browse/DRILL-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15293474#comment-15293474 ] Julian Hyde commented on DRILL-4689: I support converting TIMESTAMP to TIME, but a TIME literal needs to be in the correct format, regardless of what PostgreSQL does. > Need to support conversion from TIMESTAMP type to TIME type > --- > > Key: DRILL-4689 > URL: https://issues.apache.org/jira/browse/DRILL-4689 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Affects Versions: 1.7.0 > Environment: CentOS cluster >Reporter: Khurram Faraaz > > According to ISO/IEC-2 9075 standard, TIMESTAMP type to TIME type conversion > is allowed and supported. > This does not seem to work on Drill 1.7.0 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> values(TIME '2050-2-3 10:11:12.1000'); > Error: PARSE ERROR: Illegal TIME literal '2050-2-3 10:11:12.1000': not in > format 'HH:mm:ss' > SQL Query values(TIME '2050-2-3 10:11:12.1000') >^ > [Error Id: 77168fe0-760f-4384-a7c6-682241675348 on centos-03.qa.lab:31010] > (state=,code=0) > 0: jdbc:drill:schema=dfs.tmp> values(cast('2050-2-3 10:11:12.1000' as time)); > Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "2050-2-3 > 10:11:12.1000" is malformed at "50-2-3 10:11:12.1000" > Fragment 0:0 > [Error Id: 5168dfe6-b5e5-4ce0-8570-02ea74da6367 on centos-03.qa.lab:31010] > (state=,code=0) > 0: jdbc:drill:schema=dfs.tmp> > {noformat} > The above two expressions are supported on Postgres 9.3 > {noformat} > postgres=# values(TIME '2050-2-3 10:11:12.1000'); > column1 > > 10:11:12.1 > (1 row) > postgres=# values(cast('2050-2-3 10:11:12.1000' as time)); > column1 > > 10:11:12.1 > (1 row) > postgres=# > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4689) Need to support conversion from TIMESTAMP type to TIME type
Khurram Faraaz created DRILL-4689: - Summary: Need to support conversion from TIMESTAMP type to TIME type Key: DRILL-4689 URL: https://issues.apache.org/jira/browse/DRILL-4689 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.7.0 Environment: CentOS cluster Reporter: Khurram Faraaz According to ISO/IEC-2 9075 standard, TIMESTAMP type to TIME type conversion is allowed and supported. This does not seem to work on Drill 1.7.0 {noformat} 0: jdbc:drill:schema=dfs.tmp> values(TIME '2050-2-3 10:11:12.1000'); Error: PARSE ERROR: Illegal TIME literal '2050-2-3 10:11:12.1000': not in format 'HH:mm:ss' SQL Query values(TIME '2050-2-3 10:11:12.1000') ^ [Error Id: 77168fe0-760f-4384-a7c6-682241675348 on centos-03.qa.lab:31010] (state=,code=0) 0: jdbc:drill:schema=dfs.tmp> values(cast('2050-2-3 10:11:12.1000' as time)); Error: SYSTEM ERROR: IllegalArgumentException: Invalid format: "2050-2-3 10:11:12.1000" is malformed at "50-2-3 10:11:12.1000" Fragment 0:0 [Error Id: 5168dfe6-b5e5-4ce0-8570-02ea74da6367 on centos-03.qa.lab:31010] (state=,code=0) 0: jdbc:drill:schema=dfs.tmp> {noformat} The above two expressions are supported on Postgres 9.3 {noformat} postgres=# values(TIME '2050-2-3 10:11:12.1000'); column1 10:11:12.1 (1 row) postgres=# values(cast('2050-2-3 10:11:12.1000' as time)); column1 10:11:12.1 (1 row) postgres=# {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-4679: -- Assignee: Jinfeng Ni (was: Aman Sinha) > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Jinfeng Ni > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4679) CONVERT_FROM() json format fails if 0 rows are received from upstream operator
[ https://issues.apache.org/jira/browse/DRILL-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292784#comment-15292784 ] ASF GitHub Bot commented on DRILL-4679: --- Github user amansinha100 commented on the pull request: https://github.com/apache/drill/pull/504#issuecomment-220525522 @jinfengni could you pls review ? > CONVERT_FROM() json format fails if 0 rows are received from upstream > operator > --- > > Key: DRILL-4679 > URL: https://issues.apache.org/jira/browse/DRILL-4679 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.6.0 >Reporter: Aman Sinha >Assignee: Aman Sinha > > CONVERT_FROM() json format fails as below if the underlying Filter produces 0 > rows: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'json') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > Error: SYSTEM ERROR: IllegalStateException: next() returned NONE without > first returning OK_NEW_SCHEMA [#16, ProjectRecordBatch] > Fragment 0:0 > {noformat} > If the conversion is applied as UTF8 format, the same query succeeds: > {noformat} > 0: jdbc:drill:zk=local> select convert_from('{"abc":"xyz"}', 'utf8') as x > from cp.`tpch/region.parquet` where r_regionkey = ; > ++ > | x | > ++ > ++ > No rows selected (0.241 seconds) > {noformat} > The reason for this is the special handling in the ProjectRecordBatch for > JSON. The output schema is not known for this until the run time and the > ComplexWriter in the Project relies on seeing the input data to determine the > output schema - this could be a MapVector or ListVector etc. > If the input data has 0 rows due to a filter condition, we should at least > produce a default output schema, e.g an empty MapVector ? Need to decide a > good default. Note that the CONVERT_FROM(x, 'json') could occur on 2 > branches of a UNION-ALL and if one input is empty while the other side is > not, it may still cause incompatibility. -- This message was sent by Atlassian JIRA (v6.3.4#6332)