[jira] [Updated] (HIVE-25575) Add support for JWT authentication
[ https://issues.apache.org/jira/browse/HIVE-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-25575: - Description: It would be good to support JWT auth mechanism in hive. In order to implement it, we would need the following - On HS2 side - 1. Accept JWT in Authorization: Bearer header. 2. Fetch JWKS from a public endpoint to verify JWT signature, to start with we can fetch on HS2 start up. 3. Verify JWT Signature. On JDBC Client side - 1. Hive jdbc client should be able to accept jwt in JDBC url. (will add more details) 2. Client should also be able to pick up JWT from an env var if it's defined. was: It would be good to support JWT auth mechanism in hive. In order to implement it, we would need the following - On HS2 side - 1. Accept JWT in Authorization: Bearer header. 2. Fetch JWKS from a public endpoint to verify JWT signature, to start with we can fetch on HS2 start up. 3. Verify JWT Signature. On JDBC Client side - 1. Hive jdbc client should be able to accept jwt in JDBC url. (will add more details) 2. Client should also be able to pick up JWT from a env var if it's defined. > Add support for JWT authentication > -- > > Key: HIVE-25575 > URL: https://issues.apache.org/jira/browse/HIVE-25575 > Project: Hive > Issue Type: New Feature > Components: HiveServer2, JDBC >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > It would be good to support JWT auth mechanism in hive. In order to implement > it, we would need the following - > On HS2 side - > 1. Accept JWT in Authorization: Bearer header. > 2. Fetch JWKS from a public endpoint to verify JWT signature, to start with > we can fetch on HS2 start up. > 3. Verify JWT Signature. > On JDBC Client side - > 1. Hive jdbc client should be able to accept jwt in JDBC url. (will add more > details) > 2. Client should also be able to pick up JWT from an env var if it's defined. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25575) Add support for JWT authentication
[ https://issues.apache.org/jira/browse/HIVE-25575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-25575: > Add support for JWT authentication > -- > > Key: HIVE-25575 > URL: https://issues.apache.org/jira/browse/HIVE-25575 > Project: Hive > Issue Type: New Feature > Components: HiveServer2, JDBC >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > It would be good to support JWT auth mechanism in hive. In order to implement > it, we would need the following - > On HS2 side - > 1. Accept JWT in Authorization: Bearer header. > 2. Fetch JWKS from a public endpoint to verify JWT signature, to start with > we can fetch on HS2 start up. > 3. Verify JWT Signature. > On JDBC Client side - > 1. Hive jdbc client should be able to accept jwt in JDBC url. (will add more > details) > 2. Client should also be able to pick up JWT from a env var if it's defined. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25449) datediff() gives wrong output when run in a tez task with some non-UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401456#comment-17401456 ] Shubham Chaurasia commented on HIVE-25449: -- [~abstractdog] [~klcopp] Can you please review ? > datediff() gives wrong output when run in a tez task with some non-UTC > timezone > --- > > Key: HIVE-25449 > URL: https://issues.apache.org/jira/browse/HIVE-25449 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Repro (thanks Qiaosong Dong) - > Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} > {code} > create external table test_dt(id string, dt date); > insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); > select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt > on dt1.id = dt.id; > +--+ > | _c0 | > +--+ > | 6| > | 7| > +--+ > {code} > Expected output - > {code} > +--+ > | _c0 | > +--+ > | 5| > | 6| > +--+ > {code} > *Cause* > This happens because in {{VectorUDFDateDiffColScalar}} class > 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to > parse the date strings which interprets it to be in local timezone. > 2. For first column we get a column vector which represents the date as epoch > day. This is always in UTC. > *Solution* > We need to check other variants of datediff UDFs as well and change the > parsing mechanism to always interpret date strings in UTC. > > I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. > {code} > - date.setTime(formatter.parse(new String(bytesValue, > "UTF-8")).getTime()); > - baseDate = DateWritableV2.dateToDays(date); > + org.apache.hadoop.hive.common.type.Date hiveDate > + = org.apache.hadoop.hive.common.type.Date.valueOf(new > String(bytesValue, "UTF-8")); > + date.setTime(hiveDate.toEpochMilli()); > + baseDate = hiveDate.toEpochDay(); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25449) datediff() gives wrong output when run in a tez task with some non-UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-25449: - Summary: datediff() gives wrong output when run in a tez task with some non-UTC timezone (was: datediff() gives wrong output when we add some non UTC timezone to tez.task.launch.cmd-opts) > datediff() gives wrong output when run in a tez task with some non-UTC > timezone > --- > > Key: HIVE-25449 > URL: https://issues.apache.org/jira/browse/HIVE-25449 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Repro (thanks Qiaosong Dong) - > Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} > {code} > create external table test_dt(id string, dt date); > insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); > select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt > on dt1.id = dt.id; > +--+ > | _c0 | > +--+ > | 6| > | 7| > +--+ > {code} > Expected output - > {code} > +--+ > | _c0 | > +--+ > | 5| > | 6| > +--+ > {code} > *Cause* > This happens because in {{VectorUDFDateDiffColScalar}} class > 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to > parse the date strings which interprets it to be in local timezone. > 2. For first column we get a column vector which represents the date as epoch > day. This is always in UTC. > *Solution* > We need to check other variants of datediff UDFs as well and change the > parsing mechanism to always interpret date strings in UTC. > > I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. > {code} > - date.setTime(formatter.parse(new String(bytesValue, > "UTF-8")).getTime()); > - baseDate = DateWritableV2.dateToDays(date); > + org.apache.hadoop.hive.common.type.Date hiveDate > + = org.apache.hadoop.hive.common.type.Date.valueOf(new > String(bytesValue, "UTF-8")); > + date.setTime(hiveDate.toEpochMilli()); > + baseDate = hiveDate.toEpochDay(); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25449) datediff() gives wrong output when we add some non UTC timezone to tez.task.launch.cmd-opts
[ https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-25449: - Summary: datediff() gives wrong output when we add some non UTC timezone to tez.task.launch.cmd-opts (was: datediff() gives wrong output when we set tez.task.launch.cmd-opts to some non UTC timezone) > datediff() gives wrong output when we add some non UTC timezone to > tez.task.launch.cmd-opts > --- > > Key: HIVE-25449 > URL: https://issues.apache.org/jira/browse/HIVE-25449 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Repro (thanks Qiaosong Dong) - > Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} > {code} > create external table test_dt(id string, dt date); > insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); > select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt > on dt1.id = dt.id; > +--+ > | _c0 | > +--+ > | 6| > | 7| > +--+ > {code} > Expected output - > {code} > +--+ > | _c0 | > +--+ > | 5| > | 6| > +--+ > {code} > *Cause* > This happens because in {{VectorUDFDateDiffColScalar}} class > 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to > parse the date strings which interprets it to be in local timezone. > 2. For first column we get a column vector which represents the date as epoch > day. This is always in UTC. > *Solution* > We need to check other variants of datediff UDFs as well and change the > parsing mechanism to always interpret date strings in UTC. > > I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. > {code} > - date.setTime(formatter.parse(new String(bytesValue, > "UTF-8")).getTime()); > - baseDate = DateWritableV2.dateToDays(date); > + org.apache.hadoop.hive.common.type.Date hiveDate > + = org.apache.hadoop.hive.common.type.Date.valueOf(new > String(bytesValue, "UTF-8")); > + date.setTime(hiveDate.toEpochMilli()); > + baseDate = hiveDate.toEpochDay(); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25449) datediff() gives wrong output when we set tez.task.launch.cmd-opts to some non UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-25449: - Description: Repro (thanks Qiaosong Dong) - Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} {code} create external table test_dt(id string, dt date); insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on dt1.id = dt.id; +--+ | _c0 | +--+ | 6| | 7| +--+ {code} Expected output - {code} +--+ | _c0 | +--+ | 5| | 6| +--+ {code} *Cause* This happens because in {{VectorUDFDateDiffColScalar}} class 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse the date strings which interprets it to be in local timezone. 2. For first column we get a column vector which represents the date as epoch day. This is always in UTC. *Solution* We need to check other variants of datediff UDFs as well and change the parsing mechanism to always interpret date strings in UTC. I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. {code} - date.setTime(formatter.parse(new String(bytesValue, "UTF-8")).getTime()); - baseDate = DateWritableV2.dateToDays(date); + org.apache.hadoop.hive.common.type.Date hiveDate + = org.apache.hadoop.hive.common.type.Date.valueOf(new String(bytesValue, "UTF-8")); + date.setTime(hiveDate.toEpochMilli()); + baseDate = hiveDate.toEpochDay(); {code} was: Repro (thanks Qiaosong Dong) - Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} {code} create external table test_dt(id string, dt date); insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on dt1.id = dt.id; +--+ | _c0 | +--+ | 6| | 7| +--+ {code} Expected output - {code} +--+ | _c0 | +--+ | 5| | 6| +--+ {code} *Cause* This happens because in {{VectorUDFDateDiffColScalar}} class 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse the date strings which interprets it to be local timezone. 2. For first column we get a column vector which represents the date as epoch day. This is always in UTC. *Solution* We need to check other variants of datediff UDFs as well and change the parsing mechanism to always interpret date strings in UTC. I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. {code} - date.setTime(formatter.parse(new String(bytesValue, "UTF-8")).getTime()); - baseDate = DateWritableV2.dateToDays(date); + org.apache.hadoop.hive.common.type.Date hiveDate + = org.apache.hadoop.hive.common.type.Date.valueOf(new String(bytesValue, "UTF-8")); + date.setTime(hiveDate.toEpochMilli()); + baseDate = hiveDate.toEpochDay(); {code} > datediff() gives wrong output when we set tez.task.launch.cmd-opts to some > non UTC timezone > --- > > Key: HIVE-25449 > URL: https://issues.apache.org/jira/browse/HIVE-25449 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Repro (thanks Qiaosong Dong) - > Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} > {code} > create external table test_dt(id string, dt date); > insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); > select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt > on dt1.id = dt.id; > +--+ > | _c0 | > +--+ > | 6| > | 7| > +--+ > {code} > Expected output - > {code} > +--+ > | _c0 | > +--+ > | 5| > | 6| > +--+ > {code} > *Cause* > This happens because in {{VectorUDFDateDiffColScalar}} class > 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to > parse the date strings which interprets it to be in local timezone. > 2. For first column we get a column vector which represents the date as epoch > day. This is always in UTC. > *Solution* > We need to check other variants of datediff UDFs as well and change the > parsing mechanism to always interpret date strings in UTC. > > I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. > {code} > - date.setTime(formatter.parse(new String(bytesValue, > "UTF-8")).getTime()); > - baseDate = DateWritableV2.dateToDays(date); > + org.apache.hadoop.hive.common.type.Date hiveDate > + = org.apache.hadoop.hive.common.type.Date.valueOf(new > String(bytesValue, "UTF-8")); > + d
[jira] [Updated] (HIVE-25449) datediff() gives wrong output when we set tez.task.launch.cmd-opts to some non UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-25449: - Description: Repro (thanks Qiaosong Dong) - Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} {code} create external table test_dt(id string, dt date); insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on dt1.id = dt.id; +--+ | _c0 | +--+ | 6| | 7| +--+ {code} Expected output - {code} +--+ | _c0 | +--+ | 5| | 6| +--+ {code} *Cause* This happens because in {{VectorUDFDateDiffColScalar}} class 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse the date strings which interprets it to be local timezone. 2. For first column we get a column vector which represents the date as epoch day. This is always in UTC. *Solution* We need to check other variants of datediff UDFs as well and change the parsing mechanism to always interpret date strings in UTC. I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. {code} - date.setTime(formatter.parse(new String(bytesValue, "UTF-8")).getTime()); - baseDate = DateWritableV2.dateToDays(date); + org.apache.hadoop.hive.common.type.Date hiveDate + = org.apache.hadoop.hive.common.type.Date.valueOf(new String(bytesValue, "UTF-8")); + date.setTime(hiveDate.toEpochMilli()); + baseDate = hiveDate.toEpochDay(); {code} was: Repro (thanks Qiaosong Dong) - 1. Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} {code} create external table test_dt(id string, dt date); insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on dt1.id = dt.id; +--+ | _c0 | +--+ | 6| | 7| +--+ {code} Expected output - {code} +--+ | _c0 | +--+ | 5| | 6| +--+ {code} *Cause* This happens because in {{VectorUDFDateDiffColScalar}} class 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse the date strings which interprets it to be local timezone. 2. For first column we get a column vector which represents the date as epoch day. This is always in UTC. *Solution* We need to check other variants of datediff UDFs as well and change the parsing mechanism to always interpret date strings in UTC. I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. {code} - date.setTime(formatter.parse(new String(bytesValue, "UTF-8")).getTime()); - baseDate = DateWritableV2.dateToDays(date); + org.apache.hadoop.hive.common.type.Date hiveDate + = org.apache.hadoop.hive.common.type.Date.valueOf(new String(bytesValue, "UTF-8")); + date.setTime(hiveDate.toEpochMilli()); + baseDate = hiveDate.toEpochDay(); {code} > datediff() gives wrong output when we set tez.task.launch.cmd-opts to some > non UTC timezone > --- > > Key: HIVE-25449 > URL: https://issues.apache.org/jira/browse/HIVE-25449 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Repro (thanks Qiaosong Dong) - > Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} > {code} > create external table test_dt(id string, dt date); > insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); > select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt > on dt1.id = dt.id; > +--+ > | _c0 | > +--+ > | 6| > | 7| > +--+ > {code} > Expected output - > {code} > +--+ > | _c0 | > +--+ > | 5| > | 6| > +--+ > {code} > *Cause* > This happens because in {{VectorUDFDateDiffColScalar}} class > 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to > parse the date strings which interprets it to be local timezone. > 2. For first column we get a column vector which represents the date as epoch > day. This is always in UTC. > *Solution* > We need to check other variants of datediff UDFs as well and change the > parsing mechanism to always interpret date strings in UTC. > > I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. > {code} > - date.setTime(formatter.parse(new String(bytesValue, > "UTF-8")).getTime()); > - baseDate = DateWritableV2.dateToDays(date); > + org.apache.hadoop.hive.common.type.Date hiveDate > + = org.apache.hadoop.hive.common.type.Date.valueOf(new > String(bytesValue, "UTF-8")); > + date
[jira] [Updated] (HIVE-25449) datediff() gives wrong output when we set tez.task.launch.cmd-opts to some non UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-25449: - Description: Repro (thanks Qiaosong Dong) - 1. Add -Duser.timezone=GMT+8 to {code} create external table test_dt(id string, dt date); insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on dt1.id = dt.id; +--+ | _c0 | +--+ | 6| | 7| +--+ {code} Expected output - {code} +--+ | _c0 | +--+ | 5| | 6| +--+ {code} *Cause* This happens because in {{VectorUDFDateDiffColScalar}} class 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse the date strings which interprets it to be local timezone. 2. For first column we get a column vector which represents the date as epoch day. This is always in UTC. *Solution* We need to check other variants of datediff UDFs as well and change the parsing mechanism to always interpret date strings in UTC. I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. {code} - date.setTime(formatter.parse(new String(bytesValue, "UTF-8")).getTime()); - baseDate = DateWritableV2.dateToDays(date); + org.apache.hadoop.hive.common.type.Date hiveDate + = org.apache.hadoop.hive.common.type.Date.valueOf(new String(bytesValue, "UTF-8")); + date.setTime(hiveDate.toEpochMilli()); + baseDate = hiveDate.toEpochDay(); {code} was: Repro (thanks Qiaosong Dong) - {code} create external table test_dt(id string, dt date); insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on dt1.id = dt.id; +--+ | _c0 | +--+ | 6| | 7| +--+ {code} Expected output - {code} +--+ | _c0 | +--+ | 5| | 6| +--+ {code} *Cause* This happens because in {{VectorUDFDateDiffColScalar}} class 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse the date strings which interprets it to be local timezone. 2. For first column we get a column vector which represents the date as epoch day. This is always in UTC. *Solution* We need to check other variants of datediff UDFs as well and change the parsing mechanism to always interpret date strings in UTC. I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. {code} - date.setTime(formatter.parse(new String(bytesValue, "UTF-8")).getTime()); - baseDate = DateWritableV2.dateToDays(date); + org.apache.hadoop.hive.common.type.Date hiveDate + = org.apache.hadoop.hive.common.type.Date.valueOf(new String(bytesValue, "UTF-8")); + date.setTime(hiveDate.toEpochMilli()); + baseDate = hiveDate.toEpochDay(); {code} > datediff() gives wrong output when we set tez.task.launch.cmd-opts to some > non UTC timezone > --- > > Key: HIVE-25449 > URL: https://issues.apache.org/jira/browse/HIVE-25449 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Repro (thanks Qiaosong Dong) - > 1. Add -Duser.timezone=GMT+8 to > {code} > create external table test_dt(id string, dt date); > insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); > select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt > on dt1.id = dt.id; > +--+ > | _c0 | > +--+ > | 6| > | 7| > +--+ > {code} > Expected output - > {code} > +--+ > | _c0 | > +--+ > | 5| > | 6| > +--+ > {code} > *Cause* > This happens because in {{VectorUDFDateDiffColScalar}} class > 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to > parse the date strings which interprets it to be local timezone. > 2. For first column we get a column vector which represents the date as epoch > day. This is always in UTC. > *Solution* > We need to check other variants of datediff UDFs as well and change the > parsing mechanism to always interpret date strings in UTC. > > I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. > {code} > - date.setTime(formatter.parse(new String(bytesValue, > "UTF-8")).getTime()); > - baseDate = DateWritableV2.dateToDays(date); > + org.apache.hadoop.hive.common.type.Date hiveDate > + = org.apache.hadoop.hive.common.type.Date.valueOf(new > String(bytesValue, "UTF-8")); > + date.setTime(hiveDate.toEpochMilli()); > + baseDate = hiveDate.toEpochDay(); > {code} -- This message wa
[jira] [Updated] (HIVE-25449) datediff() gives wrong output when we set tez.task.launch.cmd-opts to some non UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-25449: - Description: Repro (thanks Qiaosong Dong) - 1. Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} {code} create external table test_dt(id string, dt date); insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on dt1.id = dt.id; +--+ | _c0 | +--+ | 6| | 7| +--+ {code} Expected output - {code} +--+ | _c0 | +--+ | 5| | 6| +--+ {code} *Cause* This happens because in {{VectorUDFDateDiffColScalar}} class 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse the date strings which interprets it to be local timezone. 2. For first column we get a column vector which represents the date as epoch day. This is always in UTC. *Solution* We need to check other variants of datediff UDFs as well and change the parsing mechanism to always interpret date strings in UTC. I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. {code} - date.setTime(formatter.parse(new String(bytesValue, "UTF-8")).getTime()); - baseDate = DateWritableV2.dateToDays(date); + org.apache.hadoop.hive.common.type.Date hiveDate + = org.apache.hadoop.hive.common.type.Date.valueOf(new String(bytesValue, "UTF-8")); + date.setTime(hiveDate.toEpochMilli()); + baseDate = hiveDate.toEpochDay(); {code} was: Repro (thanks Qiaosong Dong) - 1. Add -Duser.timezone=GMT+8 to {code} create external table test_dt(id string, dt date); insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt on dt1.id = dt.id; +--+ | _c0 | +--+ | 6| | 7| +--+ {code} Expected output - {code} +--+ | _c0 | +--+ | 5| | 6| +--+ {code} *Cause* This happens because in {{VectorUDFDateDiffColScalar}} class 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to parse the date strings which interprets it to be local timezone. 2. For first column we get a column vector which represents the date as epoch day. This is always in UTC. *Solution* We need to check other variants of datediff UDFs as well and change the parsing mechanism to always interpret date strings in UTC. I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. {code} - date.setTime(formatter.parse(new String(bytesValue, "UTF-8")).getTime()); - baseDate = DateWritableV2.dateToDays(date); + org.apache.hadoop.hive.common.type.Date hiveDate + = org.apache.hadoop.hive.common.type.Date.valueOf(new String(bytesValue, "UTF-8")); + date.setTime(hiveDate.toEpochMilli()); + baseDate = hiveDate.toEpochDay(); {code} > datediff() gives wrong output when we set tez.task.launch.cmd-opts to some > non UTC timezone > --- > > Key: HIVE-25449 > URL: https://issues.apache.org/jira/browse/HIVE-25449 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Repro (thanks Qiaosong Dong) - > 1. Add -Duser.timezone=GMT+8 to {{tez.task.launch.cmd-opts}} > {code} > create external table test_dt(id string, dt date); > insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); > select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt > on dt1.id = dt.id; > +--+ > | _c0 | > +--+ > | 6| > | 7| > +--+ > {code} > Expected output - > {code} > +--+ > | _c0 | > +--+ > | 5| > | 6| > +--+ > {code} > *Cause* > This happens because in {{VectorUDFDateDiffColScalar}} class > 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to > parse the date strings which interprets it to be local timezone. > 2. For first column we get a column vector which represents the date as epoch > day. This is always in UTC. > *Solution* > We need to check other variants of datediff UDFs as well and change the > parsing mechanism to always interpret date strings in UTC. > > I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. > {code} > - date.setTime(formatter.parse(new String(bytesValue, > "UTF-8")).getTime()); > - baseDate = DateWritableV2.dateToDays(date); > + org.apache.hadoop.hive.common.type.Date hiveDate > + = org.apache.hadoop.hive.common.type.Date.valueOf(new > String(bytesValue, "UTF-8")); > + date.setTime(hiveDate.toEp
[jira] [Assigned] (HIVE-25449) datediff() gives wrong output when we set tez.task.launch.cmd-opts to some non UTC timezone
[ https://issues.apache.org/jira/browse/HIVE-25449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-25449: > datediff() gives wrong output when we set tez.task.launch.cmd-opts to some > non UTC timezone > --- > > Key: HIVE-25449 > URL: https://issues.apache.org/jira/browse/HIVE-25449 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Repro (thanks Qiaosong Dong) - > {code} > create external table test_dt(id string, dt date); > insert into test_dt values('11', '2021-07-06'), ('22', '2021-07-07'); > select datediff(dt1.dt, '2021-07-01') from test_dt dt1 left join test_dt dt > on dt1.id = dt.id; > +--+ > | _c0 | > +--+ > | 6| > | 7| > +--+ > {code} > Expected output - > {code} > +--+ > | _c0 | > +--+ > | 5| > | 6| > +--+ > {code} > *Cause* > This happens because in {{VectorUDFDateDiffColScalar}} class > 1. For second argument(scalar) , we use {{java.text.SimpleDateFormat}} to > parse the date strings which interprets it to be local timezone. > 2. For first column we get a column vector which represents the date as epoch > day. This is always in UTC. > *Solution* > We need to check other variants of datediff UDFs as well and change the > parsing mechanism to always interpret date strings in UTC. > > I did a quick change in {{VectorUDFDateDiffColScalar}} which fixes the issue. > {code} > - date.setTime(formatter.parse(new String(bytesValue, > "UTF-8")).getTime()); > - baseDate = DateWritableV2.dateToDays(date); > + org.apache.hadoop.hive.common.type.Date hiveDate > + = org.apache.hadoop.hive.common.type.Date.valueOf(new > String(bytesValue, "UTF-8")); > + date.setTime(hiveDate.toEpochMilli()); > + baseDate = hiveDate.toEpochDay(); > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null
[ https://issues.apache.org/jira/browse/HIVE-25243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17369256#comment-17369256 ] Shubham Chaurasia commented on HIVE-25243: -- Thanks for the review and merge [~maheshk114] > Llap external client - Handle nested values when the parent struct is null > -- > > Key: HIVE-25243 > URL: https://issues.apache.org/jira/browse/HIVE-25243 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Consider the following table in text format - > {code} > +---+ > | c8 | > +---+ > | NULL | > | {"r":null,"s":null,"t":null} | > | {"r":"a","s":9,"t":2.2} | > +---+ > {code} > When we query above table via llap external client, it throws following > exception - > {code:java} > Caused by: java.lang.NullPointerException: src > at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33) > at > io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199) > at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486) > at > io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34) > at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933) > at > org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191) > at > org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135) > {code} > Created a test to repro it - > {code:java} > /** > * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while > testing LLAP external client flow. > * The aim of turning off LLAP IO is - > * when we create table through this test, LLAP caches them and returns the > same > * when we do a read query, due to this we miss some code paths which may > have been hit otherwise. > */ > public class TestMiniLlapVectorArrowWithLlapIODisabled extends > BaseJdbcWithMiniLlap { > @BeforeClass > public static void beforeTest() throws Exception { > HiveConf conf = defaultConf(); > conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true); > > conf.setBoolVar(ConfVars.HIVE_VECTORIZATION_FILESINK_ARROW_NATIVE_ENABLED, > true); > conf.set(ConfVars.LLAP_IO_ENABLED.varname, "false"); > BaseJdbcWithMiniLlap.beforeTest(conf); > } > @Override > protected InputFormat getInputFormat() { > //For unit testing, no harm in hard-coding allocator ceiling to > LONG.MAX_VALUE > return new LlapArrowRowInputFormat(Long.MAX_VALUE); > } > @Test > public void testNullsInStructFields() throws Exception { > createDataTypesTable("datatypes"); > RowCollector2 rowCollector = new RowCollector2(); > // c8 struct > String query = "select c8 from datatypes"; > int rowCount = processQuery(query, 1, rowCollector); > assertEquals(3, rowCount); > } > } > {code} > Cause - As we see in the table above, first row of the table is NULL, and > correspondingly we get {{structVector.isNull[i]=true}} in arrow serializer > but we don't get {{isNull[i]=true}} for the fields of struct. And later the > code goes for setting such fields in arrow vector and we see above exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null
[ https://issues.apache.org/jira/browse/HIVE-25243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia resolved HIVE-25243. -- Fix Version/s: 4.0.0 Resolution: Fixed > Llap external client - Handle nested values when the parent struct is null > -- > > Key: HIVE-25243 > URL: https://issues.apache.org/jira/browse/HIVE-25243 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Consider the following table in text format - > {code} > +---+ > | c8 | > +---+ > | NULL | > | {"r":null,"s":null,"t":null} | > | {"r":"a","s":9,"t":2.2} | > +---+ > {code} > When we query above table via llap external client, it throws following > exception - > {code:java} > Caused by: java.lang.NullPointerException: src > at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33) > at > io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199) > at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486) > at > io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34) > at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933) > at > org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191) > at > org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135) > {code} > Created a test to repro it - > {code:java} > /** > * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while > testing LLAP external client flow. > * The aim of turning off LLAP IO is - > * when we create table through this test, LLAP caches them and returns the > same > * when we do a read query, due to this we miss some code paths which may > have been hit otherwise. > */ > public class TestMiniLlapVectorArrowWithLlapIODisabled extends > BaseJdbcWithMiniLlap { > @BeforeClass > public static void beforeTest() throws Exception { > HiveConf conf = defaultConf(); > conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true); > > conf.setBoolVar(ConfVars.HIVE_VECTORIZATION_FILESINK_ARROW_NATIVE_ENABLED, > true); > conf.set(ConfVars.LLAP_IO_ENABLED.varname, "false"); > BaseJdbcWithMiniLlap.beforeTest(conf); > } > @Override > protected InputFormat getInputFormat() { > //For unit testing, no harm in hard-coding allocator ceiling to > LONG.MAX_VALUE > return new LlapArrowRowInputFormat(Long.MAX_VALUE); > } > @Test > public void testNullsInStructFields() throws Exception { > createDataTypesTable("datatypes"); > RowCollector2 rowCollector = new RowCollector2(); > // c8 struct > String query = "select c8 from datatypes"; > int rowCount = processQuery(query, 1, rowCollector); > assertEquals(3, rowCount); > } > } > {code} > Cause - As we see in the table above, first row of the table is NULL, and > correspondingly we get {{structVector.isNull[i]=true}} in arrow serializer > but we don't get {{isNull[i]=true}} for the fields of struct. And later the > code goes for setting such fields in arrow vector and we see above exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null
[ https://issues.apache.org/jira/browse/HIVE-25243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-25243 started by Shubham Chaurasia. > Llap external client - Handle nested values when the parent struct is null > -- > > Key: HIVE-25243 > URL: https://issues.apache.org/jira/browse/HIVE-25243 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > Consider the following table in text format - > {code} > +---+ > | c8 | > +---+ > | NULL | > | {"r":null,"s":null,"t":null} | > | {"r":"a","s":9,"t":2.2} | > +---+ > {code} > When we query above table via llap external client, it throws following > exception - > {code:java} > Caused by: java.lang.NullPointerException: src > at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33) > at > io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199) > at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486) > at > io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34) > at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933) > at > org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191) > at > org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135) > {code} > Created a test to repro it - > {code:java} > /** > * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while > testing LLAP external client flow. > * The aim of turning off LLAP IO is - > * when we create table through this test, LLAP caches them and returns the > same > * when we do a read query, due to this we miss some code paths which may > have been hit otherwise. > */ > public class TestMiniLlapVectorArrowWithLlapIODisabled extends > BaseJdbcWithMiniLlap { > @BeforeClass > public static void beforeTest() throws Exception { > HiveConf conf = defaultConf(); > conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true); > > conf.setBoolVar(ConfVars.HIVE_VECTORIZATION_FILESINK_ARROW_NATIVE_ENABLED, > true); > conf.set(ConfVars.LLAP_IO_ENABLED.varname, "false"); > BaseJdbcWithMiniLlap.beforeTest(conf); > } > @Override > protected InputFormat getInputFormat() { > //For unit testing, no harm in hard-coding allocator ceiling to > LONG.MAX_VALUE > return new LlapArrowRowInputFormat(Long.MAX_VALUE); > } > @Test > public void testNullsInStructFields() throws Exception { > createDataTypesTable("datatypes"); > RowCollector2 rowCollector = new RowCollector2(); > // c8 struct > String query = "select c8 from datatypes"; > int rowCount = processQuery(query, 1, rowCollector); > assertEquals(3, rowCount); > } > } > {code} > Cause - As we see in the table above, first row of the table is NULL, and > correspondingly we get {{structVector.isNull[i]=true}} in arrow serializer > but we don't get {{isNull[i]=true}} for the fields of struct. And later the > code goes for setting such fields in arrow vector and we see above exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25243) Llap external client - Handle nested values when the parent struct is null
[ https://issues.apache.org/jira/browse/HIVE-25243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-25243: - Summary: Llap external client - Handle nested values when the parent struct is null (was: Llap external client - Handle nested values when parent struct is null) > Llap external client - Handle nested values when the parent struct is null > -- > > Key: HIVE-25243 > URL: https://issues.apache.org/jira/browse/HIVE-25243 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Consider the following table in text format - > {code} > +---+ > | c8 | > +---+ > | NULL | > | {"r":null,"s":null,"t":null} | > | {"r":"a","s":9,"t":2.2} | > +---+ > {code} > When we query above table via llap external client, it throws following > exception - > {code:java} > Caused by: java.lang.NullPointerException: src > at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33) > at > io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199) > at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486) > at > io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34) > at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933) > at > org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191) > at > org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135) > {code} > Created a test to repro it - > {code:java} > /** > * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while > testing LLAP external client flow. > * The aim of turning off LLAP IO is - > * when we create table through this test, LLAP caches them and returns the > same > * when we do a read query, due to this we miss some code paths which may > have been hit otherwise. > */ > public class TestMiniLlapVectorArrowWithLlapIODisabled extends > BaseJdbcWithMiniLlap { > @BeforeClass > public static void beforeTest() throws Exception { > HiveConf conf = defaultConf(); > conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true); > > conf.setBoolVar(ConfVars.HIVE_VECTORIZATION_FILESINK_ARROW_NATIVE_ENABLED, > true); > conf.set(ConfVars.LLAP_IO_ENABLED.varname, "false"); > BaseJdbcWithMiniLlap.beforeTest(conf); > } > @Override > protected InputFormat getInputFormat() { > //For unit testing, no harm in hard-coding allocator ceiling to > LONG.MAX_VALUE > return new LlapArrowRowInputFormat(Long.MAX_VALUE); > } > @Test > public void testNullsInStructFields() throws Exception { > createDataTypesTable("datatypes"); > RowCollector2 rowCollector = new RowCollector2(); > // c8 struct > String query = "select c8 from datatypes"; > int rowCount = processQuery(query, 1, rowCollector); > assertEquals(3, rowCount); > } > } > {code} > Cause - As we see in the table above, first row of the table is NULL, and > correspondingly we get {{structVector.isNull[i]=true}} in arrow serializer > but we don't get {{isNull[i]=true}} for the fields of struct. And later the > code goes for setting such fields in arrow vector and we see above exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25243) Llap external client - Handle nested values when parent struct is null
[ https://issues.apache.org/jira/browse/HIVE-25243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-25243: - Summary: Llap external client - Handle nested values when parent struct is null (was: Llap external client - Handle nested null values in struct vector in arrow serializer) > Llap external client - Handle nested values when parent struct is null > -- > > Key: HIVE-25243 > URL: https://issues.apache.org/jira/browse/HIVE-25243 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Consider the following table in text format - > {code} > +---+ > | c8 | > +---+ > | NULL | > | {"r":null,"s":null,"t":null} | > | {"r":"a","s":9,"t":2.2} | > +---+ > {code} > When we query above table via llap external client, it throws following > exception - > {code:java} > Caused by: java.lang.NullPointerException: src > at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33) > at > io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199) > at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486) > at > io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34) > at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933) > at > org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191) > at > org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135) > {code} > Created a test to repro it - > {code:java} > /** > * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while > testing LLAP external client flow. > * The aim of turning off LLAP IO is - > * when we create table through this test, LLAP caches them and returns the > same > * when we do a read query, due to this we miss some code paths which may > have been hit otherwise. > */ > public class TestMiniLlapVectorArrowWithLlapIODisabled extends > BaseJdbcWithMiniLlap { > @BeforeClass > public static void beforeTest() throws Exception { > HiveConf conf = defaultConf(); > conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true); > > conf.setBoolVar(ConfVars.HIVE_VECTORIZATION_FILESINK_ARROW_NATIVE_ENABLED, > true); > conf.set(ConfVars.LLAP_IO_ENABLED.varname, "false"); > BaseJdbcWithMiniLlap.beforeTest(conf); > } > @Override > protected InputFormat getInputFormat() { > //For unit testing, no harm in hard-coding allocator ceiling to > LONG.MAX_VALUE > return new LlapArrowRowInputFormat(Long.MAX_VALUE); > } > @Test > public void testNullsInStructFields() throws Exception { > createDataTypesTable("datatypes"); > RowCollector2 rowCollector = new RowCollector2(); > // c8 struct > String query = "select c8 from datatypes"; > int rowCount = processQuery(query, 1, rowCollector); > assertEquals(3, rowCount); > } > } > {code} > Cause - As we see in the table above, first row of the table is NULL, and > correspondingly we get {{structVector.isNull[i]=true}} in arrow serializer > but we don't get {{isNull[i]=true}} for the fields of struct. And later the > code goes for setting such fields in arrow vector and we see above exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25243) Llap external client - Handle nested null values in struct vector in arrow serializer
[ https://issues.apache.org/jira/browse/HIVE-25243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-25243: > Llap external client - Handle nested null values in struct vector in arrow > serializer > - > > Key: HIVE-25243 > URL: https://issues.apache.org/jira/browse/HIVE-25243 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Consider the following table in text format - > {code} > +---+ > | c8 | > +---+ > | NULL | > | {"r":null,"s":null,"t":null} | > | {"r":"a","s":9,"t":2.2} | > +---+ > {code} > When we query above table via llap external client, it throws following > exception - > {code:java} > Caused by: java.lang.NullPointerException: src > at io.netty.util.internal.ObjectUtil.checkNotNull(ObjectUtil.java:33) > at > io.netty.buffer.UnsafeByteBufUtil.setBytes(UnsafeByteBufUtil.java:537) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:199) > at io.netty.buffer.WrappedByteBuf.setBytes(WrappedByteBuf.java:486) > at > io.netty.buffer.UnsafeDirectLittleEndian.setBytes(UnsafeDirectLittleEndian.java:34) > at io.netty.buffer.ArrowBuf.setBytes(ArrowBuf.java:933) > at > org.apache.arrow.vector.BaseVariableWidthVector.setBytes(BaseVariableWidthVector.java:1191) > at > org.apache.arrow.vector.BaseVariableWidthVector.setSafe(BaseVariableWidthVector.java:1026) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.lambda$static$15(Serializer.java:834) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeGeneric(Serializer.java:777) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writePrimitive(Serializer.java:581) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:290) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:359) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:296) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:213) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:135) > {code} > Created a test to repro it - > {code:java} > /** > * TestMiniLlapVectorArrowWithLlapIODisabled - turns off llap io while > testing LLAP external client flow. > * The aim of turning off LLAP IO is - > * when we create table through this test, LLAP caches them and returns the > same > * when we do a read query, due to this we miss some code paths which may > have been hit otherwise. > */ > public class TestMiniLlapVectorArrowWithLlapIODisabled extends > BaseJdbcWithMiniLlap { > @BeforeClass > public static void beforeTest() throws Exception { > HiveConf conf = defaultConf(); > conf.setBoolVar(ConfVars.LLAP_OUTPUT_FORMAT_ARROW, true); > > conf.setBoolVar(ConfVars.HIVE_VECTORIZATION_FILESINK_ARROW_NATIVE_ENABLED, > true); > conf.set(ConfVars.LLAP_IO_ENABLED.varname, "false"); > BaseJdbcWithMiniLlap.beforeTest(conf); > } > @Override > protected InputFormat getInputFormat() { > //For unit testing, no harm in hard-coding allocator ceiling to > LONG.MAX_VALUE > return new LlapArrowRowInputFormat(Long.MAX_VALUE); > } > @Test > public void testNullsInStructFields() throws Exception { > createDataTypesTable("datatypes"); > RowCollector2 rowCollector = new RowCollector2(); > // c8 struct > String query = "select c8 from datatypes"; > int rowCount = processQuery(query, 1, rowCollector); > assertEquals(3, rowCount); > } > } > {code} > Cause - As we see in the table above, first row of the table is NULL, and > correspondingly we get {{structVector.isNull[i]=true}} in arrow serializer > but we don't get {{isNull[i]=true}} for the fields of struct. And later the > code goes for setting such fields in arrow vector and we see above exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25159) Remove support for ordered results in llap external client library
[ https://issues.apache.org/jira/browse/HIVE-25159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-25159: - Attachment: HIVE-25159.01.patch > Remove support for ordered results in llap external client library > -- > > Key: HIVE-25159 > URL: https://issues.apache.org/jira/browse/HIVE-25159 > Project: Hive > Issue Type: Bug > Components: Clients, Hive >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-25159.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Currently when querying via llap external client framework, in case of order > by queries - > 1. Due to the fact that spark-llap used to wrap actual query in a subquery as > mentioned in [HIVE-19794|https://issues.apache.org/jira/browse/HIVE-19794] > a) We had to detect order by like - > {code} > orderByQuery = plan.getQueryProperties().hasOrderBy() || > plan.getQueryProperties().hasOuterOrderBy(); > {code} > Due to this we recently saw an exception like below for one of the queries > that did not have an outer order by (It was having an order by in a subquery) > {code} > org.apache.hive.service.cli.HiveSQLException: java.io.IOException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.lang.IllegalStateException: Requested to generate single split. Paths > and fileStatuses are expected to be 1. Got paths: 1 fileStatuses: 7 > {code} > b) Also we had to disable following optimization - > {code} > HiveConf.setBoolVar(conf, ConfVars.HIVE_REMOVE_ORDERBY_IN_SUBQUERY, false); > {code} > 2. By default we have > {{hive.llap.external.splits.order.by.force.single.split=true}} which forces > us to generate single split leading to performance bottleneck. > We should remove ordering support altogether from llap external client repo > and let clients handle it at their end. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25159) Remove support for ordered results in llap external client library
[ https://issues.apache.org/jira/browse/HIVE-25159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-25159: > Remove support for ordered results in llap external client library > -- > > Key: HIVE-25159 > URL: https://issues.apache.org/jira/browse/HIVE-25159 > Project: Hive > Issue Type: Bug > Components: Clients, Hive >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Currently when querying via llap external client framework, in case of order > by queries - > 1. Due to the fact that spark-llap used to wrap actual query in a subquery as > mentioned in [HIVE-19794|https://issues.apache.org/jira/browse/HIVE-19794] > a) We had to detect order by like - > {code} > orderByQuery = plan.getQueryProperties().hasOrderBy() || > plan.getQueryProperties().hasOuterOrderBy(); > {code} > Due to this we recently saw an exception like below for one of the queries > that did not have an outer order by (It was having an order by in a subquery) > {code} > org.apache.hive.service.cli.HiveSQLException: java.io.IOException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > java.lang.IllegalStateException: Requested to generate single split. Paths > and fileStatuses are expected to be 1. Got paths: 1 fileStatuses: 7 > {code} > b) Also we had to disable following optimization - > {code} > HiveConf.setBoolVar(conf, ConfVars.HIVE_REMOVE_ORDERBY_IN_SUBQUERY, false); > {code} > 2. By default we have > {{hive.llap.external.splits.order.by.force.single.split=true}} which forces > us to generate single split leading to performance bottleneck. > We should remove ordering support altogether from llap external client repo > and let clients handle it at their end. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24563) Check if we can interchange client and server sides for umbilical for external client flow
[ https://issues.apache.org/jira/browse/HIVE-24563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-24563: - Description: Currently we open three tcp connections when llap external client communicates to llap. {noformat} llap-ext-client ... llap connection1: client ...>>... server (RPC for submitting fragments - say t1, t2, t3. llap-ext-client initiates connection) connection2: client ...>>... server (for reading the output of t1, t2, t3. llap-ext-client initiates connection) connection3: umbilical server ...<<... client (RPC for status updates/heartbeat of t1, t2, t3. llap Daemon initiates connection) {noformat} connection3 starts a umbilical(RPC) server at the client side to which llap daemon keeps sending the task statuses / heartbeats and node heartbeats. *The Problem* In cloud based deployment, we need to open tcp traffic. 1. For connection1 and connection2, we need to open incoming tcp traffic on the machines running llap from client. 2. For connection3, we need to open incoming tcp traffic on the machines where llap-ext-client is running, from llap daemon. Here clients also need to worry about opening traffic(from llap) at their end. *Possible Solution* This jira is to evaluate the possibility of interchanging Umbilical server and client sides i.e. umbilical server will run in llap only and llap-ext-client will act as client and initiate the connection. We can have umbilical address in llap splits (when get_splits is called by external client) which the client can later connect to. cc [~prasanth_j] [~harishjp] was: Currently we open three tcp connections when llap external client communicates to llap. {noformat} llap-ext-client ... llap connection1: client ...>>... server (RPC for submitting fragments - say t1, t2, t3. llap-ext-client initiates connection) connection2: client ...>>... server (for reading the output of t1, t2, t3. llap-ext-client initiates connection) connection3: umbilical server ...<<... client (RPC for status updates/heartbeat of t1, t2, t3. llap Daemon initiates connection) {noformat} connection3 starts a umbilical(RPC) server at the client side to which llap daemon keeps sending the task statuses / heartbeats and node heartbeats. *The Problem* In cloud based deployment, we need to open tcp traffic. 1. For connection1 and connection2, we need to open incoming tcp traffic on the machines running llap from client. 2. For connection3, we need to open incoming tcp traffic on the machines where llap-ext-client is running, from llap daemon. Here clients also need to worry about opening traffic(from llap) at their end. *Possible Solution* This jira is to evaluate the possibility of interchanging Umbilical server and client sides i.e. umbilical server will run in llap only and llap-ext-client will act as client and initiate the connection. We can have umbilical address in llap splits (when get_splits is called by external client) which the client can later connect to. cc [~prasanth_j] > Check if we can interchange client and server sides for umbilical for > external client flow > -- > > Key: HIVE-24563 > URL: https://issues.apache.org/jira/browse/HIVE-24563 > Project: Hive > Issue Type: Sub-task > Components: Hive, llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Currently we open three tcp connections when llap external client > communicates to llap. > {noformat} >llap-ext-client ... llap > connection1: client ...>>... server > (RPC for submitting fragments - say t1, t2, t3. llap-ext-client initiates > connection) > connection2: client ...>>... server > (for reading the output of t1, t2, t3. llap-ext-client initiates connection) > connection3: umbilical server ...<<... client > (RPC for status updates/heartbeat of t1, t2, t3. llap Daemon initiates > connection) > {noformat} > connection3 starts a umbilical(RPC) server at the client side to which llap > daemon keeps sending the task statuses / heartbeats and node heartbeats. > *The Problem* > In cloud based deployment, we need to open tcp traffic. > 1. For connection1 and connection2, we need to open incoming tcp traffic on > the machines running llap from client. > 2. For connection3, we need to open incoming tcp traffic on the machines > where llap-ext-client is running, from llap daemon. > Here clients also need to worry about opening traffic(from llap) at their > end. > *Possible Solution* >
[jira] [Updated] (HIVE-24563) Check if we can interchange client and server sides for umbilical for external client flow
[ https://issues.apache.org/jira/browse/HIVE-24563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-24563: - Description: Currently we open three tcp connections when llap external client communicates to llap. {noformat} llap-ext-client ... llap connection1: client ...>>... server (RPC for submitting fragments - say t1, t2, t3. llap-ext-client initiates connection) connection2: client ...>>... server (for reading the output of t1, t2, t3. llap-ext-client initiates connection) connection3: umbilical server ...<<... client (RPC for status updates/heartbeat of t1, t2, t3. llap Daemon initiates connection) {noformat} connection3 starts a umbilical(RPC) server at the client side to which llap daemon keeps sending the task statuses / heartbeats and node heartbeats. *The Problem* In cloud based deployment, we need to open tcp traffic. 1. For connection1 and connection2, we need to open incoming tcp traffic on the machines running llap from client. 2. For connection3, we need to open incoming tcp traffic on the machines where llap-ext-client is running, from llap daemon. Here clients also need to worry about opening traffic(from llap) at their end. *Possible Solution* This jira is to evaluate the possibility of interchanging Umbilical server and client sides i.e. umbilical server will run in llap only and llap-ext-client will act as client and initiate the connection. We can have umbilical address in llap splits (when get_splits is called by external client) which the client can later connect to. cc [~prasanth_j] was: Currently we open three tcp connections when llap external client communicates to llap. {noformat} llap-ext-client ... llap connection1: client ...>>... server (RPC for submitting fragments - say t1, t2, t3. llap-ext-client initiates connection) connection2: client ...>>... server (for reading the output of t1, t2, t3. llap-ext-client initiates connection) connection3: umbilical server ...<<... client (RPC for status updates/heartbeat of t1, t2, t3. llap Daemon initiates connection) {noformat} connection3 starts a umbilical(RPC) server at the client side to which llap daemon keeps sending the task statuses / heartbeats and node heartbeats. *The Problem* In cloud based deployment, we need to open tcp traffic. 1. For connection1 and connection2, we need to open incoming tcp traffic on the machines running llap from client. 2. For connection3, we need to open incoming tcp traffic on the machines where llap-ext-client is running, from llap daemon. Here clients also need to worry about opening traffic(from llap) at their end. *Possible Solution* This jira is to evaluate the possibility of interchanging Umbilical server and client sides i.e. umbilical server will run in llap only and llap-ext-client will act as client and initiate the connection. We can have umbilical address in llap splits (when get_splits is called by external client) which the client can later connect to. > Check if we can interchange client and server sides for umbilical for > external client flow > -- > > Key: HIVE-24563 > URL: https://issues.apache.org/jira/browse/HIVE-24563 > Project: Hive > Issue Type: Sub-task > Components: Hive, llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Currently we open three tcp connections when llap external client > communicates to llap. > {noformat} >llap-ext-client ... llap > connection1: client ...>>... server > (RPC for submitting fragments - say t1, t2, t3. llap-ext-client initiates > connection) > connection2: client ...>>... server > (for reading the output of t1, t2, t3. llap-ext-client initiates connection) > connection3: umbilical server ...<<... client > (RPC for status updates/heartbeat of t1, t2, t3. llap Daemon initiates > connection) > {noformat} > connection3 starts a umbilical(RPC) server at the client side to which llap > daemon keeps sending the task statuses / heartbeats and node heartbeats. > *The Problem* > In cloud based deployment, we need to open tcp traffic. > 1. For connection1 and connection2, we need to open incoming tcp traffic on > the machines running llap from client. > 2. For connection3, we need to open incoming tcp traffic on the machines > where llap-ext-client is running, from llap daemon. > Here clients also need to worry about opening traffic(from llap) at their > end. > *Possible Solution* > This jira is to evaluate th
[jira] [Updated] (HIVE-24563) Check if we can interchange client and server sides for umbilical for external client flow
[ https://issues.apache.org/jira/browse/HIVE-24563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-24563: - Description: Currently we open three tcp connections when llap external client communicates to llap. {noformat} llap-ext-client ... llap connection1: client ...>>... server (RPC for submitting fragments - say t1, t2, t3. llap-ext-client initiates connection) connection2: client ...>>... server (for reading the output of t1, t2, t3. llap-ext-client initiates connection) connection3: umbilical server ...<<... client (RPC for status updates/heartbeat of t1, t2, t3. llap Daemon initiates connection) {noformat} connection3 starts a umbilical(RPC) server at the client side to which llap daemon keeps sending the task statuses / heartbeats and node heartbeats. *The Problem* In cloud based deployment, we need to open tcp traffic. 1. For connection1 and connection2, we need to open incoming tcp traffic on the machines running llap from client. 2. For connection3, we need to open incoming tcp traffic on the machines where llap-ext-client is running, from llap daemon. Here clients also need to worry about opening traffic(from llap) at their end. *Possible Solution* This jira is to evaluate the possibility of interchanging Umbilical server and client sides i.e. umbilical server will run in llap only and llap-ext-client will act as client and initiate the connection. We can have umbilical address in llap splits (when get_splits is called by external client) which the client can later connect to. was: Currently we open three tcp connections when llap external client communicates to llap. {noformat} llap-ext-client ... llap connection1: client ...>>... server (RPC for submitting fragments - say t1, t2, t3. llap-ext-client initiates connection) connection2: client ...>>... server (for reading the output of t1, t2, t3. llap-ext-client initiates connection) connection3: umbilical server ...<<... client (RPC for status updates/heartbeat of t1, t2, t3. llap Daemon initiates connection) {noformat} connection3 starts a umbilical(RPC) server at the client side to which llap daemon keeps sending the task statuses / heartbeats and node heartbeats. *The Problem* In cloud based deployment, we need to open tcp traffic. 1. For connection1 and connection2, we need to open incoming tcp traffic on the machines running llap from client. 2. For connection3, we need to open incoming tcp traffic on the machines where llap-ext-client is running, from llap daemon. Here clients also need to worry about opening traffic(from llap) at their end. This jira is to evaluate the possibility of interchanging Umbilical server and client sides i.e. umbilical server will run in llap only and llap-ext-client will act as client and initiate the connection. We can have umbilical address in llap splits (when get_splits is called by external client) which the client can later connect to. > Check if we can interchange client and server sides for umbilical for > external client flow > -- > > Key: HIVE-24563 > URL: https://issues.apache.org/jira/browse/HIVE-24563 > Project: Hive > Issue Type: Sub-task > Components: Hive, llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Currently we open three tcp connections when llap external client > communicates to llap. > {noformat} >llap-ext-client ... llap > connection1: client ...>>... server > (RPC for submitting fragments - say t1, t2, t3. llap-ext-client initiates > connection) > connection2: client ...>>... server > (for reading the output of t1, t2, t3. llap-ext-client initiates connection) > connection3: umbilical server ...<<... client > (RPC for status updates/heartbeat of t1, t2, t3. llap Daemon initiates > connection) > {noformat} > connection3 starts a umbilical(RPC) server at the client side to which llap > daemon keeps sending the task statuses / heartbeats and node heartbeats. > *The Problem* > In cloud based deployment, we need to open tcp traffic. > 1. For connection1 and connection2, we need to open incoming tcp traffic on > the machines running llap from client. > 2. For connection3, we need to open incoming tcp traffic on the machines > where llap-ext-client is running, from llap daemon. > Here clients also need to worry about opening traffic(from llap) at their > end. > *Possible Solution* > This jira is to evaluate the possibility of interchanging Umbil
[jira] [Assigned] (HIVE-24563) Check if we can interchange client and server sides for umbilical for external client flow
[ https://issues.apache.org/jira/browse/HIVE-24563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-24563: > Check if we can interchange client and server sides for umbilical for > external client flow > -- > > Key: HIVE-24563 > URL: https://issues.apache.org/jira/browse/HIVE-24563 > Project: Hive > Issue Type: Sub-task > Components: Hive, llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Currently we open three tcp connections when llap external client > communicates to llap. > {noformat} >llap-ext-client ... llap > connection1: client ...>>... server > (RPC for submitting fragments - say t1, t2, t3. llap-ext-client initiates > connection) > connection2: client ...>>... server > (for reading the output of t1, t2, t3. llap-ext-client initiates connection) > connection3: umbilical server ...<<... client > (RPC for status updates/heartbeat of t1, t2, t3. llap Daemon initiates > connection) > {noformat} > connection3 starts a umbilical(RPC) server at the client side to which llap > daemon keeps sending the task statuses / heartbeats and node heartbeats. > *The Problem* > In cloud based deployment, we need to open tcp traffic. > 1. For connection1 and connection2, we need to open incoming tcp traffic on > the machines running llap from client. > 2. For connection3, we need to open incoming tcp traffic on the machines > where llap-ext-client is running, from llap daemon. > Here clients also need to worry about opening traffic(from llap) at their > end. > This jira is to evaluate the possibility of interchanging Umbilical server > and client sides i.e. umbilical server will run in llap only and > llap-ext-client will act as client and initiate the connection. > We can have umbilical address in llap splits (when get_splits is called by > external client) which the client can later connect to. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24138) Llap external client flow is broken due to netty shading
[ https://issues.apache.org/jira/browse/HIVE-24138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-24138: Assignee: Ayush Saxena (was: Shubham Chaurasia) > Llap external client flow is broken due to netty shading > > > Key: HIVE-24138 > URL: https://issues.apache.org/jira/browse/HIVE-24138 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Ayush Saxena >Priority: Critical > > We shaded netty in hive-exec in - > https://issues.apache.org/jira/browse/HIVE-23073 > This breaks LLAP external client flow on LLAP daemon side - > LLAP daemon stacktrace - > {code} > 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning > writer for: attempt_497418324441977_0004_0_00_00_0 > 2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] tez.MapRecordSource: > java.lang.NoSuchMethodError: > org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf; > at > org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57) > at > org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88) > at > org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130) > at > org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Arrow method signature mismatch mainly happens due to the fact that arrow > contains some classes which are packaged under {{io.netty.buffer.*}} - > {code} > io.netty.buffer.ArrowBuf > io.netty.buffer.ExpandableByteBuf > io.netty.buffer.LargeBuffer > io.netty.buffer.MutableWrappedByteBuf > io.nett
[jira] [Commented] (HIVE-24138) Llap external client flow is broken due to netty shading
[ https://issues.apache.org/jira/browse/HIVE-24138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194217#comment-17194217 ] Shubham Chaurasia commented on HIVE-24138: -- [~abstractdog] [~thejas] [~ashutoshc] Should we try to upgrade to [hadoop-3.1.4 which already is on 4.1.48.Final|https://github.com/apache/hadoop/blob/rel/release-3.1.4/hadoop-project/pom.xml#L790] and remove netty shading ? cc [~anishek] [~ayushtkn] > Llap external client flow is broken due to netty shading > > > Key: HIVE-24138 > URL: https://issues.apache.org/jira/browse/HIVE-24138 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Critical > > We shaded netty in hive-exec in - > https://issues.apache.org/jira/browse/HIVE-23073 > This breaks LLAP external client flow on LLAP daemon side - > LLAP daemon stacktrace - > {code} > 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning > writer for: attempt_497418324441977_0004_0_00_00_0 > 2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] tez.MapRecordSource: > java.lang.NoSuchMethodError: > org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf; > at > org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57) > at > org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88) > at > org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130) > at > org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Arrow method signature mismatch mainly ha
[jira] [Commented] (HIVE-24138) Llap external client flow is broken due to netty shading
[ https://issues.apache.org/jira/browse/HIVE-24138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193571#comment-17193571 ] Shubham Chaurasia commented on HIVE-24138: -- [~abstractdog] [~thejas] [~ashutoshc] Any suggestions on how to proceed on this ? > Llap external client flow is broken due to netty shading > > > Key: HIVE-24138 > URL: https://issues.apache.org/jira/browse/HIVE-24138 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Critical > > We shaded netty in hive-exec in - > https://issues.apache.org/jira/browse/HIVE-23073 > This breaks LLAP external client flow on LLAP daemon side - > LLAP daemon stacktrace - > {code} > 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning > writer for: attempt_497418324441977_0004_0_00_00_0 > 2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] tez.MapRecordSource: > java.lang.NoSuchMethodError: > org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf; > at > org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57) > at > org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88) > at > org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130) > at > org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Arrow method signature mismatch mainly happens due to the fact that arrow > contains some classes which are packaged under {{io.netty.buffer.*}} - > {code} > io.netty.buffer.ArrowBuf > io.netty.buffer.Expandable
[jira] [Assigned] (HIVE-24138) Llap external client flow is broken due to netty shading
[ https://issues.apache.org/jira/browse/HIVE-24138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-24138: Assignee: Shubham Chaurasia > Llap external client flow is broken due to netty shading > > > Key: HIVE-24138 > URL: https://issues.apache.org/jira/browse/HIVE-24138 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Critical > > We shaded netty in hive-exec in - > https://issues.apache.org/jira/browse/HIVE-23073 > This breaks LLAP external client flow on LLAP daemon side - > LLAP daemon stacktrace - > {code} > 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning > writer for: attempt_497418324441977_0004_0_00_00_0 > 2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 > (497418324441977_0004_0_00_00_0)] tez.MapRecordSource: > java.lang.NoSuchMethodError: > org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf; > at > org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74) > at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57) > at > org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89) > at > org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88) > at > org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130) > at > org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85) > at > org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46) > at > org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) > at > org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} > Arrow method signature mismatch mainly happens due to the fact that arrow > contains some classes which are packaged under {{io.netty.buffer.*}} - > {code} > io.netty.buffer.ArrowBuf > io.netty.buffer.ExpandableByteBuf > io.netty.buffer.LargeBuffer > io.netty.buffer.MutableWrappedByteBuf > io.netty.buffer.PooledB
[jira] [Updated] (HIVE-24138) Llap external client flow is broken due to netty shading
[ https://issues.apache.org/jira/browse/HIVE-24138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-24138: - Description: We shaded netty in hive-exec in - https://issues.apache.org/jira/browse/HIVE-23073 This breaks LLAP external client flow on LLAP daemon side - LLAP daemon stacktrace - {code} 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning writer for: attempt_497418324441977_0004_0_00_00_0 2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 (497418324441977_0004_0_00_00_0)] tez.MapRecordSource: java.lang.NoSuchMethodError: org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf; at org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96) at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74) at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57) at org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89) at org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88) at org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130) at org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102) at org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85) at org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46) at org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} Arrow method signature mismatch mainly happens due to the fact that arrow contains some classes which are packaged under {{io.netty.buffer.*}} - {code} io.netty.buffer.ArrowBuf io.netty.buffer.ExpandableByteBuf io.netty.buffer.LargeBuffer io.netty.buffer.MutableWrappedByteBuf io.netty.buffer.PooledByteBufAllocatorL io.netty.buffer.UnsafeDirectLittleEndian {code} Since we have relocated netty, these classes have also been relocated to {{org.apache.hive.io.netty.buffer.*}} and causing {{NoSuchMethodError}}. cc [~anishek] [~thejas] [~abstractdog] [~irashid] [~bruce.robbins] was: We shaded netty in hive-exec in - https://issues.apache.org/jira/browse/HIVE-23073 This breaks LLAP external client flow on LLAP daemon side - {code} 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 (497418324441977_0004_0_00_00_0)] lla
[jira] [Updated] (HIVE-24138) Llap external client flow is broken due to netty shading
[ https://issues.apache.org/jira/browse/HIVE-24138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-24138: - Description: We shaded netty in hive-exec in - https://issues.apache.org/jira/browse/HIVE-23073 This breaks LLAP external client flow on LLAP daemon side - {code} 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning writer for: attempt_497418324441977_0004_0_00_00_0 2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 (497418324441977_0004_0_00_00_0)] tez.MapRecordSource: java.lang.NoSuchMethodError: org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf; at org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96) at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74) at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57) at org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89) at org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88) at org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130) at org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102) at org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85) at org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46) at org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158) at org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} Arrow method signature mismatch mainly happens due to the fact that arrow contains some classes which are packaged under {{io.netty.buffer.*}} - {code} io.netty.buffer.ArrowBuf io.netty.buffer.ExpandableByteBuf io.netty.buffer.LargeBuffer io.netty.buffer.MutableWrappedByteBuf io.netty.buffer.PooledByteBufAllocatorL io.netty.buffer.UnsafeDirectLittleEndian {code} Since we have relocated netty, these classes have also been relocated to {{org.apache.hive.io.netty.buffer.*}} and causing {{NoSuchMethodError}}. was: We shaded netty in hive-exec in - https://issues.apache.org/jira/browse/HIVE-23073 This breaks LLAP external client flow on LLAP daemon side - {code} 2020-09-09T18:22:13,413 INFO [TezTR-222977_4_0_0_0_0 (497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning writer for: attempt_497418324441977_0004_0_00_00_0 2020-
[jira] [Updated] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-24059: - Attachment: HIVE-24059.01.patch > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-24059.01.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. > cc [~prasanth_j] [~jdere] [~anishek] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183964#comment-17183964 ] Shubham Chaurasia commented on HIVE-24059: -- Fixed tests, all green now - http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-1418/3/pipeline > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. > cc [~prasanth_j] [~jdere] [~anishek] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182746#comment-17182746 ] Shubham Chaurasia commented on HIVE-24059: -- [~prasanth_j] [~jdere] Can you please review ? > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. > cc [~prasanth_j] [~jdere] [~anishek] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-24059: - Description: Please see problem description in https://issues.apache.org/jira/browse/HIVE-24058 Initial changes include - 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) side. 2. Opening additional RPC port in LLAP Daemon. 3. JWT Based authentication on this port. cc [~prasanth_j] [~jdere] [~anishek] [~thejas] was: Please see problem description in https://issues.apache.org/jira/browse/HIVE-24058 Initial changes include - 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) side. 2. Opening additional RPC port in LLAP Daemon. 3. JWT Based authentication on this port. > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. > cc [~prasanth_j] [~jdere] [~anishek] [~thejas] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17182744#comment-17182744 ] Shubham Chaurasia commented on HIVE-24059: -- This patch uses two env variables - {{IS_CLOUD_DEPLOYMENT}} - if we HS2 and LLAP are running in cloud env. {{PUBLIC_HOSTNAME}} - public hostname which can be reached from outside cloud. Both these variables need to be set on HS2 and LLAP machines for this patch to work correctly. > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24059) Llap external client - Initial changes for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-24059: > Llap external client - Initial changes for running in cloud environment > --- > > Key: HIVE-24059 > URL: https://issues.apache.org/jira/browse/HIVE-24059 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Please see problem description in > https://issues.apache.org/jira/browse/HIVE-24058 > Initial changes include - > 1. Moving LLAP discovery logic from client side to server (HS2 / get_splits) > side. > 2. Opening additional RPC port in LLAP Daemon. > 3. JWT Based authentication on this port. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24058) Llap external client - Enhancements for running in cloud environment
[ https://issues.apache.org/jira/browse/HIVE-24058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-24058: > Llap external client - Enhancements for running in cloud environment > > > Key: HIVE-24058 > URL: https://issues.apache.org/jira/browse/HIVE-24058 > Project: Hive > Issue Type: Task > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > When we query using llap external client library, following happens currently > - > 1. We first need to get splits using > [LlapBaseInputFormat#getSplits()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L226], > this just needs Hive server JDBC url. > 2. We then submit those splits to llap and obtain record reader to read data > using > [LlapBaseInputFormat#getRecordReader()|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L140]. > In this step we need following at client side - > - {{hive.zookeeper.quorum}} > -{{hive.llap.daemon.service.hosts}} > We need to connect to zk to discover llap daemons. > 3. Record reader so obtained needs to [initiate a TCP connection from client > to LLAP Daemon to submit the > split|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L185]. > 4. It also needs to [initiate another TCP connection from client to output > format port in LLAP Daemon to read the > data|https://github.com/apache/hive/blob/rel/release-3.1.2/llap-ext-client/src/java/org/apache/hadoop/hive/llap/LlapBaseInputFormat.java#L201]. > In cloud based deployments, we may not be able to make direct connections to > Zk registry and LLAP daemons from client as it might run outside vpc. > For 2, we can move daemon discovery logic to get_splits UDF itself which will > run in HS2. > For scenarios like 3 and 4, we can expose additional ports on LLAP with > additional auth mechanism. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23339) SBA does not check permissions for DB location specified in Create or Alter database query
[ https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17168663#comment-17168663 ] Shubham Chaurasia commented on HIVE-23339: -- Thanks for the review and commit [~mgergely]. Closing it. Note - It changes API in {{HiveAuthorizationProvider}} from {code:java} public void authorize(Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException; {code} to {code:java} void authorizeDbLevelOperations(Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv, Collection inputs, Collection outputs) throws HiveException, AuthorizationException; {code} > SBA does not check permissions for DB location specified in Create or Alter > database query > -- > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0, 4.0.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23339.01.patch, HIVE-23339.02.patch, > HIVE-23339.03.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23339) SBA does not check permissions for DB location specified in Create or Alter database query
[ https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23339: - Resolution: Fixed Status: Resolved (was: Patch Available) > SBA does not check permissions for DB location specified in Create or Alter > database query > -- > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0, 4.0.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23339.01.patch, HIVE-23339.02.patch, > HIVE-23339.03.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23339) SBA does not check permissions for DB location specified in Create or Alter database query
[ https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23339: - Fix Version/s: 4.0.0 > SBA does not check permissions for DB location specified in Create or Alter > database query > -- > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0, 4.0.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-23339.01.patch, HIVE-23339.02.patch, > HIVE-23339.03.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23339) SBA does not check permissions for DB location specified in Create or Alter database query
[ https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23339: - Affects Version/s: 4.0.0 > SBA does not check permissions for DB location specified in Create or Alter > database query > -- > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0, 4.0.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23339.01.patch, HIVE-23339.02.patch, > HIVE-23339.03.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23339) SBA does not check permissions for DB location specified in Create or Alter database query
[ https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23339: - Attachment: HIVE-23339.03.patch > SBA does not check permissions for DB location specified in Create or Alter > database query > -- > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23339.01.patch, HIVE-23339.02.patch, > HIVE-23339.03.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23339) SBA does not check permissions for DB location specified in Create or Alter database query
[ https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23339: - Summary: SBA does not check permissions for DB location specified in Create or Alter database query (was: SBA does not check permissions for DB location specified in Create database query) > SBA does not check permissions for DB location specified in Create or Alter > database query > -- > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23339.01.patch, HIVE-23339.02.patch > > Time Spent: 20m > Remaining Estimate: 0h > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23339) SBA does not check permissions for DB location specified in Create database query
[ https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23339: - Attachment: HIVE-23339.02.patch > SBA does not check permissions for DB location specified in Create database > query > - > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23339.01.patch, HIVE-23339.02.patch > > Time Spent: 20m > Remaining Estimate: 0h > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23339) SBA does not check permissions for DB location specified in Create database query
[ https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104347#comment-17104347 ] Shubham Chaurasia commented on HIVE-23339: -- Thanks for the pointers [~rtrivedi12]. Thanks for the review [~mgergely]. Based on our discussion, I agree that it would be cleaner to have an API with authorizer inputs and outputs rather than passing the properties in HiveConf as the current patch does. For context, currently we are having below API in {{HiveAuthorizationProvider}} {code:java} public void authorize(Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv) throws HiveException, AuthorizationException; {code} Now in {{StorageBasedAuthorizationProvider}} we need some additional information, in this case the custom location of database from 'CREATE DATABASE' query. Current patch achieves this by passing the location via HiveConf. To be able to pass inputs and outputs explicitly we would need something like below - {code:java} public void authorize(Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv, Set inputs, Set outputs) throws HiveException, AuthorizationException; {code} But since {{HiveAuthorizationProvider}} is a public/pluggable interface, I am not sure about modifying it. [~hashutosh] [~thejas] [~mgergely] Does the above API look correct ? How to we usually modify authorizer APIs (or any public API) in hive ? Do we have a doc/guideline for this ? > SBA does not check permissions for DB location specified in Create database > query > - > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Labels: pull-request-available > Attachments: HIVE-23339.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23339) SBA does not check permissions for DB location specified in Create database query
[ https://issues.apache.org/jira/browse/HIVE-23339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23339: - Attachment: HIVE-23339.01.patch Status: Patch Available (was: Open) > SBA does not check permissions for DB location specified in Create database > query > - > > Key: HIVE-23339 > URL: https://issues.apache.org/jira/browse/HIVE-23339 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 >Reporter: Riju Trivedi >Assignee: Shubham Chaurasia >Priority: Critical > Attachments: HIVE-23339.01.patch > > > With doAs=true and StorageBasedAuthorization provider, create database with > specific location succeeds even if user doesn't have access to that path. > > {code:java} > hadoop fs -ls -d /tmp/cannot_write > drwx-- - hive hadoop 0 2020-04-01 22:53 /tmp/cannot_write > create a database under /tmp/cannot_write. We would expect it to fail, but is > actually created successfully with "hive" as the owner: > rtrivedi@bdp01:~> beeline -e "create database rtrivedi_1 location > '/tmp/cannot_write/rtrivedi_1'" > INFO : OK > No rows affected (0.116 seconds) > hive@hpchdd2e:~> hadoop fs -ls /tmp/cannot_write > Found 1 items > drwx-- - hive hadoop 0 2020-04-01 23:05 /tmp/cannot_write/rtrivedi_1 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23230) "get_splits" udf ignores limit constraint while creating splits
[ https://issues.apache.org/jira/browse/HIVE-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091203#comment-17091203 ] Shubham Chaurasia commented on HIVE-23230: -- [~adeshrao] HIVE-23230.2.patch looks good to me for fixing limit issue however these test failures seem related, all of them use get_splits(). I cannot access test report links above. Could you please check these locally ? and also reattach the same patch again. cc [~sankarh] > "get_splits" udf ignores limit constraint while creating splits > --- > > Key: HIVE-23230 > URL: https://issues.apache.org/jira/browse/HIVE-23230 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.1.0 >Reporter: Adesh Kumar Rao >Assignee: Adesh Kumar Rao >Priority: Major > Attachments: HIVE-23230.1.patch, HIVE-23230.2.patch, HIVE-23230.patch > > > Issue: Running the query {noformat}select * from limit n{noformat} > from spark via hive warehouse connector may return more rows than "n". > This happens because "get_splits" udf creates splits ignoring the limit > constraint. These splits when submitted to multiple llap daemons will return > "n" rows each. > How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on > llap with more that 1 llap daemons running. > run below commands via beeline to create and populate the table > > {noformat} > create table test (id int); > insert into table test values (1); > insert into table test values (2); > insert into table test values (3); > insert into table test values (4); > insert into table test values (5); > insert into table test values (6); > insert into table test values (7); > delete from test where id = 7;{noformat} > now running below query via spark-shell > {noformat} > import com.hortonworks.hwc.HiveWarehouseSession > val hive = HiveWarehouseSession.session(spark).build() > hive.executeQuery("select * from test limit 1").show() > {noformat} > will return more than 1 rows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22842) Timestamp/date vectors in Arrow serializer should use correct calendar for value representation
[ https://issues.apache.org/jira/browse/HIVE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22842: - Attachment: HIVE-22842.05.patch > Timestamp/date vectors in Arrow serializer should use correct calendar for > value representation > --- > > Key: HIVE-22842 > URL: https://issues.apache.org/jira/browse/HIVE-22842 > Project: Hive > Issue Type: Improvement >Reporter: Jesus Camacho Rodriguez >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22842.01.patch, HIVE-22842.02.patch, > HIVE-22842.03.patch, HIVE-22842.04.patch, HIVE-22842.05.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22842) Timestamp/date vectors in Arrow serializer should use correct calendar for value representation
[ https://issues.apache.org/jira/browse/HIVE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22842: - Attachment: HIVE-22842.04.patch > Timestamp/date vectors in Arrow serializer should use correct calendar for > value representation > --- > > Key: HIVE-22842 > URL: https://issues.apache.org/jira/browse/HIVE-22842 > Project: Hive > Issue Type: Improvement >Reporter: Jesus Camacho Rodriguez >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22842.01.patch, HIVE-22842.02.patch, > HIVE-22842.03.patch, HIVE-22842.04.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23070) LLAP external client does not propagate orc confs to LLAP
[ https://issues.apache.org/jira/browse/HIVE-23070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23070: - Summary: LLAP external client does not propagate orc confs to LLAP (was: Llap external client does not propagate orc confs to LLAP) > LLAP external client does not propagate orc confs to LLAP > - > > Key: HIVE-23070 > URL: https://issues.apache.org/jira/browse/HIVE-23070 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > When we query through llap external client, orc confs are not propagated > to(or not respected by) LLAP. > I was trying to pass this conf {{orc.proleptic.gregorian.default}} while > reading data but it was not taking effect. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22842) Timestamp/date vectors in Arrow serializer should use correct calendar for value representation
[ https://issues.apache.org/jira/browse/HIVE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065507#comment-17065507 ] Shubham Chaurasia commented on HIVE-22842: -- [~jcamachorodriguez] Thanks for the review. Added tests with combinations of date and timestamp for both new and legacy files for orc, parquet and avro. Also opened - https://issues.apache.org/jira/browse/HIVE-23070 > Timestamp/date vectors in Arrow serializer should use correct calendar for > value representation > --- > > Key: HIVE-22842 > URL: https://issues.apache.org/jira/browse/HIVE-22842 > Project: Hive > Issue Type: Improvement >Reporter: Jesus Camacho Rodriguez >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22842.01.patch, HIVE-22842.02.patch, > HIVE-22842.03.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23070) Llap external client does not propagate orc confs to LLAP
[ https://issues.apache.org/jira/browse/HIVE-23070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-23070: > Llap external client does not propagate orc confs to LLAP > - > > Key: HIVE-23070 > URL: https://issues.apache.org/jira/browse/HIVE-23070 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > When we query through llap external client, orc confs are not propagated > to(or not respected by) LLAP. > I was trying to pass this conf {{orc.proleptic.gregorian.default}} while > reading data but it was not taking effect. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22842) Timestamp/date vectors in Arrow serializer should use correct calendar for value representation
[ https://issues.apache.org/jira/browse/HIVE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22842: - Attachment: HIVE-22842.03.patch > Timestamp/date vectors in Arrow serializer should use correct calendar for > value representation > --- > > Key: HIVE-22842 > URL: https://issues.apache.org/jira/browse/HIVE-22842 > Project: Hive > Issue Type: Improvement >Reporter: Jesus Camacho Rodriguez >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22842.01.patch, HIVE-22842.02.patch, > HIVE-22842.03.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23034) Arrow serializer should not keep the reference of arrow offset and validity buffers
[ https://issues.apache.org/jira/browse/HIVE-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23034: - Attachment: HIVE-23034.01.patch Status: Patch Available (was: Open) > Arrow serializer should not keep the reference of arrow offset and validity > buffers > --- > > Key: HIVE-23034 > URL: https://issues.apache.org/jira/browse/HIVE-23034 > Project: Hive > Issue Type: Bug > Components: llap, Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23034.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, a part of writeList() method in arrow serializer is implemented > like - > {code:java} > final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer(); > int nextOffset = 0; > for (int rowIndex = 0; rowIndex < size; rowIndex++) { > int selectedIndex = rowIndex; > if (vectorizedRowBatch.selectedInUse) { > selectedIndex = vectorizedRowBatch.selected[rowIndex]; > } > if (hiveVector.isNull[selectedIndex]) { > offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset); > } else { > offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset); > nextOffset += (int) hiveVector.lengths[selectedIndex]; > arrowVector.setNotNull(rowIndex); > } > } > offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset); > {code} > 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = > arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and > offset vector. > Problem - > {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates > the offset and validity buffers when a threshold is crossed, updates the > references internally and also releases the old buffers (which decrements the > buffer reference count). Now the reference which we obtained in 1) becomes > obsolete. Furthermore if try to read or write old buffer, we see - > {code:java} > Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0 > at > io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413) > at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131) > at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162) > at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205) > {code} > > Solution - > This can be fixed by getting the buffers each time ( > {{arrowVector.getOffsetBuffer()}} ) we want to update them. > In our internal tests, this is very frequently seen on arrow 0.8.0 but not on > 0.10.0 but should be handled the same way for 0.10.0 too as it does the same > thing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22842) Timestamp/date vectors in Arrow serializer should use correct calendar for value representation
[ https://issues.apache.org/jira/browse/HIVE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22842: - Attachment: HIVE-22842.02.patch > Timestamp/date vectors in Arrow serializer should use correct calendar for > value representation > --- > > Key: HIVE-22842 > URL: https://issues.apache.org/jira/browse/HIVE-22842 > Project: Hive > Issue Type: Improvement >Reporter: Jesus Camacho Rodriguez >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22842.01.patch, HIVE-22842.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23034) Arrow serializer should not keep the reference of arrow offset and validity buffers
[ https://issues.apache.org/jira/browse/HIVE-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-23034: > Arrow serializer should not keep the reference of arrow offset and validity > buffers > --- > > Key: HIVE-23034 > URL: https://issues.apache.org/jira/browse/HIVE-23034 > Project: Hive > Issue Type: Bug > Components: llap, Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Currently, a part of writeList() method in arrow serializer is implemented > like - > {code:java} > final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer(); > int nextOffset = 0; > for (int rowIndex = 0; rowIndex < size; rowIndex++) { > int selectedIndex = rowIndex; > if (vectorizedRowBatch.selectedInUse) { > selectedIndex = vectorizedRowBatch.selected[rowIndex]; > } > if (hiveVector.isNull[selectedIndex]) { > offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset); > } else { > offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset); > nextOffset += (int) hiveVector.lengths[selectedIndex]; > arrowVector.setNotNull(rowIndex); > } > } > offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset); > {code} > 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = > arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and > offset vector. > Problem - > {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates > the offset and validity buffers when a threshold is crossed, updates the > references internally and also releases the old buffers (which decrements the > buffer reference count). Now the reference which we obtained in 1) becomes > obsolete. Furthermore if try to read or write old buffer, we see - > {code:java} > Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0 > at > io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413) > at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131) > at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162) > at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285) > at > org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205) > {code} > > Solution - > This can be fixed by getting the buffers each time ( > {{arrowVector.getOffsetBuffer()}} ) we want to update them. > In our internal tests, this is very frequently seen on arrow 0.8.0 but not on > 0.10.0 but should be handled the same way for 0.10.0 too as it does the same > thing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23022) Arrow deserializer should ensure size of hive vector equal to arrow vector
[ https://issues.apache.org/jira/browse/HIVE-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23022: - Attachment: HIVE-23022.01.patch Status: Patch Available (was: Open) > Arrow deserializer should ensure size of hive vector equal to arrow vector > -- > > Key: HIVE-23022 > URL: https://issues.apache.org/jira/browse/HIVE-23022 > Project: Hive > Issue Type: Bug > Components: llap, Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-23022.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in > some cases does not set the size of hive vector correctly. Size of hive > vector should be set at least equal to arrow vector to be able to read > (accommodate) it fully. > Following exception can be seen when we try to read (using > {{LlapArrowRowInputFormat}} ) some table which contains complex types (struct > nested in array to be specific) and number of rows in table is more than > default (1024) batch/vector size. > {code:java} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) > at > org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) > ... 23 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23022) Arrow deserializer should ensure size of hive vector equal to arrow vector
[ https://issues.apache.org/jira/browse/HIVE-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23022: - Description: Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in some cases does not set the size of hive vector correctly. Size of hive vector should be set at least equal to arrow vector to be able to read (accommodate) it fully. Following exception can be seen when we try to read (using {{LlapArrowRowInputFormat}} ) some table which contains complex types (struct nested in array to be specific) and number of rows in table is more than default (1024) batch/vector size. {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) at org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) ... 23 more {code} was: Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in some cases does not set the size of hive vector correctly. Size of hive vector should be set at least equal to arrow vector to be able to read (accommodate) it fully. Following exception can be seen when we try to read some table which contains complex types (struct nested in array to be specific) and number of rows in table is more than default (1024) batch/vector size. {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) at org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) ... 23 more {code} > Arrow deserializer should ensure size of hive vector equal to arrow vector > -- > > Key: HIVE-23022 > URL: https://issues.apache.org/jira/browse/HIVE-23022 > Project: Hive > Issue Type: Bug > Components: llap, Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in > some cases does not set the size of hive vector correctly. Size of hive > vector should be set at least equal to arrow vector to be able to read > (accommodate) it fully. > Following exception can be seen when we try to read (using > {{LlapArrowRowInputFormat}} ) some table which contains complex types (struct > nested in array to be specific) and number of rows in table is more than > default (1024) batch/vector size. > {code:java} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) > at > org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) > ... 23 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23022) Arrow deserializer should ensure size of hive vector equal to arrow vector
[ https://issues.apache.org/jira/browse/HIVE-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23022: - Description: Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in some cases does not set the size of hive vector correctly. Size of hive vector should be set at least equal to arrow vector to be able to read (accommodate) it fully. Following exception can be seen when we try to read some table which contains complex types (struct nested in array to be specific) and number of rows in table is more than default (1024) batch/vector size. {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) at org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) ... 23 more {code} was: Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in some cases does not set the size of hive vector correctly. Size of hive vector should be set at least equal to arrow vector to be able to read (accommodate) it fully. Following exception can be seen when we try to read some table which contains complex types (struct nested in list to be specific) and number of rows in table is more than default (1024) batch/vector size. {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) at org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) ... 23 more {code} > Arrow deserializer should ensure size of hive vector equal to arrow vector > -- > > Key: HIVE-23022 > URL: https://issues.apache.org/jira/browse/HIVE-23022 > Project: Hive > Issue Type: Bug > Components: llap, Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in > some cases does not set the size of hive vector correctly. Size of hive > vector should be set at least equal to arrow vector to be able to read > (accommodate) it fully. > Following exception can be seen when we try to read some table which contains > complex types (struct nested in array to be specific) and number of rows in > table is more than default (1024) batch/vector size. > {code:java} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) > at > org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) > ... 23 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23022) Arrow deserializer should ensure size of hive vector equal to arrow vector
[ https://issues.apache.org/jira/browse/HIVE-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-23022: - Description: Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in some cases does not set the size of hive vector correctly. Size of hive vector should be set at least equal to arrow vector to be able to read (accommodate) it fully. Following exception can be seen when we try to read some table which contains complex types (struct nested in list to be specific) and number of rows in table is more than default (1024) batch/vector size. {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) at org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) ... 23 more {code} was: Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in some cases does not set the size of hive vector correctly. Size of hive vector should be set at least equal to arrow vector to be able to read (accommodate) it fully. Following exception can be seen when we try to read some table which contains complex types (struct nested in list to be specific) and table size is more than default (1024) batch/vector size. {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) at org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) at org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) ... 23 more {code} > Arrow deserializer should ensure size of hive vector equal to arrow vector > -- > > Key: HIVE-23022 > URL: https://issues.apache.org/jira/browse/HIVE-23022 > Project: Hive > Issue Type: Bug > Components: llap, Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in > some cases does not set the size of hive vector correctly. Size of hive > vector should be set at least equal to arrow vector to be able to read > (accommodate) it fully. > Following exception can be seen when we try to read some table which contains > complex types (struct nested in list to be specific) and number of rows in > table is more than default (1024) batch/vector size. > {code:java} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) > at > org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) > ... 23 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23022) Arrow deserializer should ensure size of hive vector equal to arrow vector
[ https://issues.apache.org/jira/browse/HIVE-23022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-23022: > Arrow deserializer should ensure size of hive vector equal to arrow vector > -- > > Key: HIVE-23022 > URL: https://issues.apache.org/jira/browse/HIVE-23022 > Project: Hive > Issue Type: Bug > Components: llap, Serializers/Deserializers >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Arrow deserializer - {{org.apache.hadoop.hive.ql.io.arrow.Deserializer}} in > some cases does not set the size of hive vector correctly. Size of hive > vector should be set at least equal to arrow vector to be able to read > (accommodate) it fully. > Following exception can be seen when we try to read some table which contains > complex types (struct nested in list to be specific) and table size is more > than default (1024) batch/vector size. > {code:java} > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024 > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.readStruct(Deserializer.java:440) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:143) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.readList(Deserializer.java:394) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.read(Deserializer.java:137) > at > org.apache.hadoop.hive.ql.io.arrow.Deserializer.deserialize(Deserializer.java:122) > at > org.apache.hadoop.hive.ql.io.arrow.ArrowColumnarBatchSerDe.deserialize(ArrowColumnarBatchSerDe.java:284) > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:75) > ... 23 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22842) Timestamp/date vectors in Arrow serializer should use correct calendar for value representation
[ https://issues.apache.org/jira/browse/HIVE-22842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22842: - Attachment: HIVE-22842.01.patch Status: Patch Available (was: Open) > Timestamp/date vectors in Arrow serializer should use correct calendar for > value representation > --- > > Key: HIVE-22842 > URL: https://issues.apache.org/jira/browse/HIVE-22842 > Project: Hive > Issue Type: Improvement >Reporter: Jesus Camacho Rodriguez >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22842.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22973) Handle 0 length batches in LlapArrowRowRecordReader
[ https://issues.apache.org/jira/browse/HIVE-22973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053402#comment-17053402 ] Shubham Chaurasia commented on HIVE-22973: -- Thanks [~jdere] > Handle 0 length batches in LlapArrowRowRecordReader > --- > > Key: HIVE-22973 > URL: https://issues.apache.org/jira/browse/HIVE-22973 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22973.01.patch, HIVE-22973.02.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/HIVE-22856, we allowed > {{LlapArrowBatchRecordReader}} to permit 0 length arrow batches. > {{LlapArrowRowRecordReader}} which is a wrapper over > {{LlapArrowBatchRecordReader}} should also handle this. > On one of the systems (cannot be reproduced easily) where we were running > test {{TestJdbcWithMiniLlapVectorArrow}}, we saw following exception - > {code:java} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.173 s <<< > FAILURE! - in org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow > testLlapInputFormatEndToEnd(org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow) > Time elapsed: 6.476 s <<< ERROR! > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:80) > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:41) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:540) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:504) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.testLlapInputFormatEndToEnd(BaseJdbcWithMiniLlap.java:236) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:77) > ... 13 more > {code} > cc [~maheshk114] [~jdere] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22973) Handle 0 length batches in LlapArrowRowRecordReader
[ https://issues.apache.org/jira/browse/HIVE-22973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17050962#comment-17050962 ] Shubham Chaurasia commented on HIVE-22973: -- [~maheshk114] [~jdere] Can you please review ? > Handle 0 length batches in LlapArrowRowRecordReader > --- > > Key: HIVE-22973 > URL: https://issues.apache.org/jira/browse/HIVE-22973 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22973.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/HIVE-22856, we allowed > {{LlapArrowBatchRecordReader}} to permit 0 length arrow batches. > {{LlapArrowRowRecordReader}} which is a wrapper over > {{LlapArrowBatchRecordReader}} should also handle this. > On one of the systems (cannot be reproduced easily) where we were running > test {{TestJdbcWithMiniLlapVectorArrow}}, we saw following exception - > {code:java} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.173 s <<< > FAILURE! - in org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow > testLlapInputFormatEndToEnd(org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow) > Time elapsed: 6.476 s <<< ERROR! > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:80) > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:41) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:540) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:504) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.testLlapInputFormatEndToEnd(BaseJdbcWithMiniLlap.java:236) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:77) > ... 13 more > {code} > cc [~maheshk114] [~jdere] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22973) Handle 0 length batches in LlapArrowRowRecordReader
[ https://issues.apache.org/jira/browse/HIVE-22973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22973: - Description: In https://issues.apache.org/jira/browse/HIVE-22856, we allowed {{LlapArrowBatchRecordReader}} to permit 0 length arrow batches. {{LlapArrowRowRecordReader}} which is a wrapper over {{LlapArrowBatchRecordReader}} should also handle this. On one of the systems (cannot be reproduced easily) where we were running test {{TestJdbcWithMiniLlapVectorArrow}}, we saw following exception - {code:java} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.173 s <<< FAILURE! - in org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow testLlapInputFormatEndToEnd(org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow) Time elapsed: 6.476 s <<< ERROR! java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:80) at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:41) at org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:540) at org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:504) at org.apache.hive.jdbc.BaseJdbcWithMiniLlap.testLlapInputFormatEndToEnd(BaseJdbcWithMiniLlap.java:236) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:77) ... 13 more {code} cc [~maheshk114] [~jdere] was: In https://issues.apache.org/jira/browse/HIVE-22856, we allowed {{LlapArrowBatchRecordReader}} to permit 0 length arrow batches. {{LlapArrowRowRecordReader}} which is a wrapper over {{LlapArrowBatchRecordReader}} should also handle this. On one of the systems (cannot be reproduced easily) where we were running test {{TestJdbcWithMiniLlapVectorArrow}}, we saw following exception - {code:java} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.173 s <<< FAILURE! - in org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow testLlapInputFormatEndToEnd(org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow) Time elapsed: 6.476 s <<< ERROR! java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:80) at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:41) at org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:540) at org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:504) at org.apache.hive.jdbc.BaseJdbcWithMiniLlap.testLlapInputFormatEndToEnd(BaseJdbcWithMiniLlap.java:236) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:77) ... 13 more {code} > Handle 0 length batches in LlapArrowRowRecordReader > --- > > Key: HIVE-22973 > URL: https://issues.apache.org/jira/browse/HIVE-22973 > Project: Hi
[jira] [Updated] (HIVE-22973) Handle 0 length batches in LlapArrowRowRecordReader
[ https://issues.apache.org/jira/browse/HIVE-22973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22973: - Attachment: HIVE-22973.01.patch Status: Patch Available (was: Open) > Handle 0 length batches in LlapArrowRowRecordReader > --- > > Key: HIVE-22973 > URL: https://issues.apache.org/jira/browse/HIVE-22973 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22973.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In https://issues.apache.org/jira/browse/HIVE-22856, we allowed > {{LlapArrowBatchRecordReader}} to permit 0 length arrow batches. > {{LlapArrowRowRecordReader}} which is a wrapper over > {{LlapArrowBatchRecordReader}} should also handle this. > On one of the systems (cannot be reproduced easily) where we were running > test {{TestJdbcWithMiniLlapVectorArrow}}, we saw following exception - > {code:java} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.173 s <<< > FAILURE! - in org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow > testLlapInputFormatEndToEnd(org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow) > Time elapsed: 6.476 s <<< ERROR! > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:80) > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:41) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:540) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:504) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.testLlapInputFormatEndToEnd(BaseJdbcWithMiniLlap.java:236) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:77) > ... 13 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22973) Handle 0 length batches in LlapArrowRowRecordReader
[ https://issues.apache.org/jira/browse/HIVE-22973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-22973: > Handle 0 length batches in LlapArrowRowRecordReader > --- > > Key: HIVE-22973 > URL: https://issues.apache.org/jira/browse/HIVE-22973 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > In https://issues.apache.org/jira/browse/HIVE-22856, we allowed > {{LlapArrowBatchRecordReader}} to permit 0 length arrow batches. > {{LlapArrowRowRecordReader}} which is a wrapper over > {{LlapArrowBatchRecordReader}} should also handle this. > On one of the systems (cannot be reproduced easily) where we were running > test {{TestJdbcWithMiniLlapVectorArrow}}, we saw following exception - > {code:java} > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 30.173 s <<< > FAILURE! - in org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow > testLlapInputFormatEndToEnd(org.apache.hive.jdbc.TestJdbcWithMiniLlapVectorArrow) > Time elapsed: 6.476 s <<< ERROR! > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:80) > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:41) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:540) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.processQuery(BaseJdbcWithMiniLlap.java:504) > at > org.apache.hive.jdbc.BaseJdbcWithMiniLlap.testLlapInputFormatEndToEnd(BaseJdbcWithMiniLlap.java:236) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.hadoop.hive.llap.LlapArrowRowRecordReader.next(LlapArrowRowRecordReader.java:77) > ... 13 more > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048777#comment-17048777 ] Shubham Chaurasia commented on HIVE-22840: -- [~jcamachorodriguez] Oh you already committed. Thanks! > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22840.03.patch, HIVE-22840.04.patch, > HIVE-22840.05.patch, HIVE-22840.1.patch, HIVE-22840.2.patch, HIVE-22840.patch > > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048776#comment-17048776 ] Shubham Chaurasia commented on HIVE-22840: -- [~jcamachorodriguez] Tests are all green now. > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-22840.03.patch, HIVE-22840.04.patch, > HIVE-22840.05.patch, HIVE-22840.1.patch, HIVE-22840.2.patch, HIVE-22840.patch > > Time Spent: 20m > Remaining Estimate: 0h > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17048774#comment-17048774 ] Shubham Chaurasia commented on HIVE-22903: -- [~rameshkumar] Tests are all green now. Can you please have a look at the patch ? > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22903.01.patch, HIVE-22903.02.patch, > HIVE-22903.03.patch, HIVE-22903.04.patch, HIVE-22903.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22903: - Attachment: HIVE-22903.04.patch > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22903.01.patch, HIVE-22903.02.patch, > HIVE-22903.03.patch, HIVE-22903.04.patch, HIVE-22903.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Attachment: HIVE-22840.05.patch > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22840.03.patch, HIVE-22840.04.patch, > HIVE-22840.05.patch, HIVE-22840.1.patch, HIVE-22840.2.patch, HIVE-22840.patch > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22903: - Attachment: HIVE-22903.03.patch > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22903.01.patch, HIVE-22903.02.patch, > HIVE-22903.03.patch, HIVE-22903.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047201#comment-17047201 ] Shubham Chaurasia commented on HIVE-22903: -- Attaching the patch again as the tests didn't trigger. > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22903.01.patch, HIVE-22903.02.patch, > HIVE-22903.03.patch, HIVE-22903.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046793#comment-17046793 ] Shubham Chaurasia commented on HIVE-22903: -- [~rameshkumar] Thanks for the suggestions. Yes, it was not related to constants. It's related to batch size. Resetting evaluator only when isLastGroupBatch=true fixed all the cases. Fixed https://issues.apache.org/jira/browse/HIVE-22909 as well. I uploaded a new patch with this approach. {code:java} if (!isPartitionOrderBy) { // To keep the row counting correct, don't reset for row_number evaluator if it's not a isLastGroupBatch if (!isLastGroupBatch && isRowNumberFunction()) { return; } groupBatches.resetEvaluators(); } {code} However I think this can be safely generalized for all the functions like - {code:java} if (!isPartitionOrderBy && isLastGroupBatch) { groupBatches.resetEvaluators(); } {code} Will give this a try tomorrow. > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22903.01.patch, HIVE-22903.02.patch, > HIVE-22903.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22903: - Attachment: HIVE-22903.02.patch > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22903.01.patch, HIVE-22903.02.patch, > HIVE-22903.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Attachment: HIVE-22840.04.patch > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22840.03.patch, HIVE-22840.04.patch, > HIVE-22840.1.patch, HIVE-22840.2.patch, HIVE-22840.patch > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Attachment: HIVE-22840.03.patch > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22840.03.patch, HIVE-22840.1.patch, > HIVE-22840.2.patch, HIVE-22840.patch > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043154#comment-17043154 ] Shubham Chaurasia commented on HIVE-22840: -- [~jcamachorodriguez] Oh sorry, created a PR - https://github.com/apache/hive/pull/922 Latest patch was - https://issues.apache.org/jira/secure/attachment/12993999/HIVE-22840.patch > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22840.1.patch, HIVE-22840.2.patch, HIVE-22840.patch > > Time Spent: 10m > Remaining Estimate: 0h > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041011#comment-17041011 ] Shubham Chaurasia commented on HIVE-22840: -- [~abstractdog] [~jcamachorodriguez] Can you please review ? Moved {{CalendarUtils}} from hive-common to storage-api to prevent cyclic dependency (hive-common already depends on storage-api). > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Attachments: HIVE-22840.1.patch, HIVE-22840.2.patch, HIVE-22840.patch > > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Attachment: HIVE-22840.patch > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Attachments: HIVE-22840.1.patch, HIVE-22840.2.patch, HIVE-22840.patch > > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040666#comment-17040666 ] Shubham Chaurasia commented on HIVE-22903: -- [~rameshkumar] Thanks a lot for the review. {quote} We should probably loop through the groupBatches and skip reseting if it is a row_number and a constant(And probably this might fix https://issues.apache.org/jira/browse/HIVE-22909 too). {quote} Sorry I could not understand this. Currently, it's like {code:java} if (!isPartitionOrderBy && !skipResetEvaluatorsForRowNumber) { groupBatches.resetEvaluators(); } {code} Does looping though groupBatches (evaluators ? ) mean something like {code:java} public void resetEvaluators() { for (VectorPTFEvaluatorBase evaluator : evaluators) { if (!isPartitionOrderBy && !skipResetEvaluatorsForRowNumber) { evaluator.resetEvaluator(); } } } {code} I was confused because these flags isPartitionOrderBy and skipResetEvaluatorsForRowNumber are common to all the evaluators and would not change for a particular evaluator. > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22903.01.patch, HIVE-22903.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22903: - Attachment: HIVE-22903.patch > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22903.01.patch, HIVE-22903.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039686#comment-17039686 ] Shubham Chaurasia commented on HIVE-22903: -- Found one more bug with row_number() while testing this one. Keeping it separate: https://issues.apache.org/jira/browse/HIVE-22909 as it's entirely different thing. > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22903.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22909) Vectorized row_number() returns incorrect results in case it is called multiple times with different constant expressions
[ https://issues.apache.org/jira/browse/HIVE-22909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22909: - Summary: Vectorized row_number() returns incorrect results in case it is called multiple times with different constant expressions (was: Vectorized row_number() returns incorrect results in case it is called multiple times with different constant expression) > Vectorized row_number() returns incorrect results in case it is called > multiple times with different constant expressions > - > > Key: HIVE-22909 > URL: https://issues.apache.org/jira/browse/HIVE-22909 > Project: Hive > Issue Type: Bug >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Vectorized row_number() returns incorrect results in case it is called > multiple times in the same query with different constant expressions. > Example > {code} > select row_number() over(partition by 1) r1, row_number() over(partition by > 2) r2, t from over10k_n8 limit 1100; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22909) Vectorized row_number() returns incorrect results in case it is called multiple times with different constant expression
[ https://issues.apache.org/jira/browse/HIVE-22909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-22909: > Vectorized row_number() returns incorrect results in case it is called > multiple times with different constant expression > > > Key: HIVE-22909 > URL: https://issues.apache.org/jira/browse/HIVE-22909 > Project: Hive > Issue Type: Bug >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Vectorized row_number() returns incorrect results in case it is called > multiple times in the same query with different constant expressions. > Example > {code} > select row_number() over(partition by 1) r1, row_number() over(partition by > 2) r2, t from over10k_n8 limit 1100; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22903: - Attachment: HIVE-22903.01.patch Status: Patch Available (was: Open) > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: pull-request-available > Attachments: HIVE-22903.01.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22903) Vectorized row_number() resets the row number after one batch in case of constant expression in partition clause
[ https://issues.apache.org/jira/browse/HIVE-22903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-22903: > Vectorized row_number() resets the row number after one batch in case of > constant expression in partition clause > > > Key: HIVE-22903 > URL: https://issues.apache.org/jira/browse/HIVE-22903 > Project: Hive > Issue Type: Bug > Components: UDF, Vectorization >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > Vectorized row number implementation resets the row number when constant > expression is passed in partition clause. > Repro Query > {code} > select row_number() over(partition by 1) r1, t from over10k_n8; > Or > select row_number() over() r1, t from over10k_n8; > {code} > where table over10k_n8 contains more than 1024 records. > This happens because currently in VectorPTFOperator, we reset evaluators if > only partition clause is there. > {code:java} > // If we are only processing a PARTITION BY, reset our evaluators. > if (!isPartitionOrderBy) { > groupBatches.resetEvaluators(); > } > {code} > To resolve, we should also check if the entire partition clause is a constant > expression, if it is so then we should not do > {{groupBatches.resetEvaluators()}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService
[ https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17034096#comment-17034096 ] Shubham Chaurasia commented on HIVE-20312: -- Thanks [~jdere] > Allow arrow clients to use their own BufferAllocator with > LlapOutputFormatService > - > > Key: HIVE-20312 > URL: https://issues.apache.org/jira/browse/HIVE-20312 > Project: Hive > Issue Type: Improvement >Reporter: Eric Wohlstadter >Assignee: Eric Wohlstadter >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-20312.1.patch, HIVE-20312.2.patch, > HIVE-20312.3.patch > > > Clients should be able to provide their own BufferAllocator to > LlapBaseInputFormat if allocator operations depend on client-side logic. For > example, clients may want to manage the allocator hierarchy per client-side > task, thread, etc.. > Currently the client is forced to use one global RootAllocator per process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-20312) Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService
[ https://issues.apache.org/jira/browse/HIVE-20312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033757#comment-17033757 ] Shubham Chaurasia commented on HIVE-20312: -- [~jdere] [~maheshk114] Looks like this was not merged. Can you please have a look and merge ? cc [~anishek] [~thejas] > Allow arrow clients to use their own BufferAllocator with > LlapOutputFormatService > - > > Key: HIVE-20312 > URL: https://issues.apache.org/jira/browse/HIVE-20312 > Project: Hive > Issue Type: Improvement >Reporter: Eric Wohlstadter >Assignee: Eric Wohlstadter >Priority: Major > Attachments: HIVE-20312.1.patch, HIVE-20312.2.patch, > HIVE-20312.3.patch > > > Clients should be able to provide their own BufferAllocator to > LlapBaseInputFormat if allocator operations depend on client-side logic. For > example, clients may want to manage the allocator hierarchy per client-side > task, thread, etc.. > Currently the client is forced to use one global RootAllocator per process. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Attachment: HIVE-22840.2.patch > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Attachments: HIVE-22840.1.patch, HIVE-22840.2.patch > > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032114#comment-17032114 ] Shubham Chaurasia edited comment on HIVE-22840 at 2/7/20 6:16 AM: -- HIVE-22840.1.patch / HIVE-22840.2.patch depend on CalendarUtils class introduced in HIVE-22589. For now I have just added it. I will rebase the patch once HIVE-22589 is merged. cc [~jcamachorodriguez] was (Author: shubhamchaurasia): HIVE-22840.1.patch depends on CalendarUtils class introduced in HIVE-22589. For now I have just added it. I will rebase the patch once HIVE-22589 is merged. cc [~jcamachorodriguez] > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Attachments: HIVE-22840.1.patch, HIVE-22840.2.patch > > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Description: HIVE-22405 added support for proleptic calendar. It uses java's SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in some scenarios. As a result of those race conditions, we see some exceptions like {code:java} 1) java.lang.NumberFormatException: For input string: "" OR java.lang.NumberFormatException: For input string: ".821582E.821582E44" OR 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 at sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) at java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) {code} This issue is to address those thread-safety issues/race conditions. cc [~jcamachorodriguez] [~abstractdog] [~omalley] was: HIVE-22405 added support for proleptic calendar. It uses java's SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in some scenarios. As a result of those race conditions, we see some exceptions like {code:java} 1) java.lang.NumberFormatException: For input string: "" OR java.lang.NumberFormatException: For input string: ".821582E.821582E44" OR 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 at sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) at java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) {code} This issue is to address those thread-safety issues/race conditions. cc [~jcamachorodriguez] [~abstractdog] [~omalley] > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Attachments: HIVE-22840.1.patch > > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" > OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Description: HIVE-22405 added support for proleptic calendar. It uses java's SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in some scenarios. As a result of those race conditions, we see some exceptions like {code:java} 1) java.lang.NumberFormatException: For input string: "" OR java.lang.NumberFormatException: For input string: ".821582E.821582E44" OR 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 at sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) at java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) {code} This issue is to address those thread-safety issues/race conditions. cc [~jcamachorodriguez] [~abstractdog] [~omalley] was: HIVE-22405 added support for proleptic calendar. It uses java's SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in some scenarios. This issue is to address those thread-safety issues/race conditions. > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Attachments: HIVE-22840.1.patch > > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > As a result of those race conditions, we see some exceptions like > {code:java} > 1) java.lang.NumberFormatException: For input string: "" OR > java.lang.NumberFormatException: For input string: ".821582E.821582E44" > OR > 2) Caused by: java.lang.ArrayIndexOutOfBoundsException: -5325980 > at > sun.util.calendar.BaseCalendar.getCalendarDateFromFixedDate(BaseCalendar.java:453) > at > java.util.GregorianCalendar.computeFields(GregorianCalendar.java:2397) > {code} > This issue is to address those thread-safety issues/race conditions. > cc [~jcamachorodriguez] [~abstractdog] [~omalley] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Description: HIVE-22405 added support for proleptic calendar. It uses java's SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in some scenarios. This issue is to address those thread-safety issues/race conditions. > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Attachments: HIVE-22840.1.patch > > > HIVE-22405 added support for proleptic calendar. It uses java's > SimpleDateFormat/Calendar APIs which are not thread-safe and cause race in > some scenarios. > This issue is to address those thread-safety issues/race conditions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Component/s: storage-api > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Attachments: HIVE-22840.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032114#comment-17032114 ] Shubham Chaurasia commented on HIVE-22840: -- HIVE-22840.1.patch depends on CalendarUtils class introduced in HIVE-22589. For now I have just added it. I will rebase the patch once HIVE-22589 is merged. cc [~jcamachorodriguez] > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Attachments: HIVE-22840.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-22840: - Attachment: HIVE-22840.1.patch Status: Patch Available (was: Open) > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > Attachments: HIVE-22840.1.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22840) Race condition in formatters of TimestampColumnVector and DateColumnVector
[ https://issues.apache.org/jira/browse/HIVE-22840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-22840: Assignee: Shubham Chaurasia > Race condition in formatters of TimestampColumnVector and DateColumnVector > --- > > Key: HIVE-22840 > URL: https://issues.apache.org/jira/browse/HIVE-22840 > Project: Hive > Issue Type: Bug >Reporter: László Bodor >Assignee: Shubham Chaurasia >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-21641) Llap external client returns decimal columns in different precision/scale as compared to beeline
[ https://issues.apache.org/jira/browse/HIVE-21641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16982168#comment-16982168 ] Shubham Chaurasia commented on HIVE-21641: -- Thanks [~kgyrtkirk] [~jcamachorodriguez] I have opened https://issues.apache.org/jira/browse/HIVE-22541 > Llap external client returns decimal columns in different precision/scale as > compared to beeline > > > Key: HIVE-21641 > URL: https://issues.apache.org/jira/browse/HIVE-21641 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.1.1 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: Branch3Candidate, pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21641.1.patch, HIVE-21641.2.patch, > HIVE-21641.3.patch, HIVE-21641.4.patch, HIVE-21641.5.branch-3.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Llap external client gives different precision/scale as compared to when the > query is executed beeline. Consider the following results: > Query: > {code} > select avg(ss_ext_sales_price) my_avg from store_sales; > {code} > Result from Beeline > {code} > ++ > | my_avg | > ++ > | 37.8923531030581611189434 | > ++ > {code} > Result from Llap external client > {code} > +-+ > | my_avg| > +-+ > |37.892353| > +-+ > {code} > > This is due to Driver(beeline path) calls > [analyzeInternal()|https://github.com/apache/hive/blob/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L328] > for getting result set schema which initializes > [resultSchema|https://github.com/apache/hive/blob/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L333] > after some more transformations as compared to llap-ext-client which calls > [genLogicalPlan()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java#L561] > Replacing {{genLogicalPlan()}} by {{analyze()}} resolves this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-22541) Inconsistent decimal precision/scale of resultset schema in analyzer.genLogicalPlan() as compared to analyzer.analyze()
[ https://issues.apache.org/jira/browse/HIVE-22541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia reassigned HIVE-22541: > Inconsistent decimal precision/scale of resultset schema in > analyzer.genLogicalPlan() as compared to analyzer.analyze() > --- > > Key: HIVE-22541 > URL: https://issues.apache.org/jira/browse/HIVE-22541 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 4.0.0 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > > https://issues.apache.org/jira/browse/HIVE-21641 handles decimal > scale/precision inconsistencies when we query using llap external client. > [HIVE-21641 > changes|https://issues.apache.org/jira/secure/attachment/12968006/HIVE-21641.4.patch] > {{analyzer.genLogicalPlan(ast)}} to {{analyzer.analyze(ast, ctx)}} to handle > this. However we should fix {{analyzer.genLogicalPlan(ast)}} to return > correct decimal precision/scale. > Please see > [this|https://issues.apache.org/jira/browse/HIVE-21641?focusedCommentId=16981513&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16981513] > and > [this|https://issues.apache.org/jira/browse/HIVE-21641?focusedCommentId=16982053&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16982053] > comment for more. > cc [~jcamachorodriguez] [~kgyrtkirk] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-21641) Llap external client returns decimal columns in different precision/scale as compared to beeline
[ https://issues.apache.org/jira/browse/HIVE-21641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chaurasia updated HIVE-21641: - Resolution: Fixed Status: Resolved (was: Patch Available) > Llap external client returns decimal columns in different precision/scale as > compared to beeline > > > Key: HIVE-21641 > URL: https://issues.apache.org/jira/browse/HIVE-21641 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.1.1 >Reporter: Shubham Chaurasia >Assignee: Shubham Chaurasia >Priority: Major > Labels: Branch3Candidate, pull-request-available > Fix For: 4.0.0 > > Attachments: HIVE-21641.1.patch, HIVE-21641.2.patch, > HIVE-21641.3.patch, HIVE-21641.4.patch, HIVE-21641.5.branch-3.patch > > Time Spent: 10m > Remaining Estimate: 0h > > Llap external client gives different precision/scale as compared to when the > query is executed beeline. Consider the following results: > Query: > {code} > select avg(ss_ext_sales_price) my_avg from store_sales; > {code} > Result from Beeline > {code} > ++ > | my_avg | > ++ > | 37.8923531030581611189434 | > ++ > {code} > Result from Llap external client > {code} > +-+ > | my_avg| > +-+ > |37.892353| > +-+ > {code} > > This is due to Driver(beeline path) calls > [analyzeInternal()|https://github.com/apache/hive/blob/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java#L328] > for getting result set schema which initializes > [resultSchema|https://github.com/apache/hive/blob/rel/release-3.1.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L333] > after some more transformations as compared to llap-ext-client which calls > [genLogicalPlan()|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseUtils.java#L561] > Replacing {{genLogicalPlan()}} by {{analyze()}} resolves this. -- This message was sent by Atlassian Jira (v8.3.4#803005)