[jira] [Assigned] (HIVE-23757) Pushing TopN Key operator through MAPJOIN
[ https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-23757: Assignee: (was: Attila Magyar) > Pushing TopN Key operator through MAPJOIN > - > > Key: HIVE-23757 > URL: https://issues.apache.org/jira/browse/HIVE-23757 > Project: Hive > Issue Type: Improvement >Reporter: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > So far only MERGEJOIN + JOIN cases are handled. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-25519) Knox homepage service UI links missing when CM intermittently unavailable
[ https://issues.apache.org/jira/browse/HIVE-25519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-25519. -- Resolution: Invalid Wrong project. > Knox homepage service UI links missing when CM intermittently unavailable > - > > Key: HIVE-25519 > URL: https://issues.apache.org/jira/browse/HIVE-25519 > Project: Hive > Issue Type: Task >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25519) Knox homepage service UI links missing when CM intermittently unavailable
[ https://issues.apache.org/jira/browse/HIVE-25519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-25519: > Knox homepage service UI links missing when CM intermittently unavailable > - > > Key: HIVE-25519 > URL: https://issues.apache.org/jira/browse/HIVE-25519 > Project: Hive > Issue Type: Task >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-22960) Approximate TopN Key Operator
[ https://issues.apache.org/jira/browse/HIVE-22960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-22960. -- Resolution: Won't Fix > Approximate TopN Key Operator > - > > Key: HIVE-22960 > URL: https://issues.apache.org/jira/browse/HIVE-22960 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png > > > "Different from other operators, top n operator demonstrates the notable > “long tail” characteristics which makes it distinct from other operators like > join, group by and etc. will saturate very quickly. Update is pretty frequent > at the beginning and then diverges to a very slow update frequently. > The approximation can be implemented in two ways: one way is to stop the > array/heap update after certain percentage of the data is been read, for > example, 10% or 20%, if we know the table size. The other way is to set a > frequency threshold of the array/heap update. After the threshold is met, > then stop the top n processing" > [~rzhappy] > !Screen Shot 2020-03-02 at 4.55.46 PM.png|width=688,height=468! > Y: number of updates in every 100msec -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24338) HPL/SQL missing features
[ https://issues.apache.org/jira/browse/HIVE-24338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-24338. -- Resolution: Fixed > HPL/SQL missing features > > > Key: HIVE-24338 > URL: https://issues.apache.org/jira/browse/HIVE-24338 > Project: Hive > Issue Type: Improvement > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > There are some features which are supported by Oracle's PL/SQL but not by > HPL/SQL. This Jira is about to prioritize them and investigate the > feasibility of the implementation. > * ForAll syntax like: ForAll j in i..j save exceptions > * Bulk collect: : Fetch cursor Bulk Collect Into list Limit n; > * Type declartion: Type T_cab is TABLE of > * TABLE datatype > * GOTO and LABEL > * Global variables like $$PLSQL_UNIT and others > * Named parameters func(name1 => value1, name2 => value2); > * Built in functions: trunc, lpad, to_date, ltrim, rtrim, sysdate -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24427) HPL/SQL improvements
[ https://issues.apache.org/jira/browse/HIVE-24427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-24427. -- Resolution: Fixed > HPL/SQL improvements > > > Key: HIVE-24427 > URL: https://issues.apache.org/jira/browse/HIVE-24427 > Project: Hive > Issue Type: Improvement > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: epic > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen
[ https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-25242. -- Resolution: Fixed > Query performs extremely slow with hive.vectorized.adaptor.usage.mode = > chosen > --- > > Key: HIVE-25242 > URL: https://issues.apache.org/jira/browse/HIVE-25242 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h > Remaining Estimate: 0h > > If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are > vectorized through the vectorized adaptor. > Queries like this one, performs very slowly because the concat is not chosen > to be vectorized. > {code:java} > select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) > between to_date('2018-12-01') and to_date('2021-03-01'); {code} > The patch whitelists the concat udf so that it uses the vectorized adaptor in > chosen mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen
[ https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-25242: - Description: If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are vectorized through the vectorized adaptor. Queries like this one, performs very slowly because the concat is not chosen to be vectorized. {code:java} select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) between to_date('2018-12-01') and to_date('2021-03-01'); {code} The patch whitelists the concat udf so that it uses the vectorized adaptor in chosen mode. was: If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are vectorized through the vectorized adaptor. Queries like this one, performs very slowly because the concat is not chosen to be vectorized. {code:java} select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) between to_date('2018-12-01') and to_date('2021-03-01'); {code} > Query performs extremely slow with hive.vectorized.adaptor.usage.mode = > chosen > --- > > Key: HIVE-25242 > URL: https://issues.apache.org/jira/browse/HIVE-25242 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are > vectorized through the vectorized adaptor. > Queries like this one, performs very slowly because the concat is not chosen > to be vectorized. > {code:java} > select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) > between to_date('2018-12-01') and to_date('2021-03-01'); {code} > The patch whitelists the concat udf so that it uses the vectorized adaptor in > chosen mode. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen
[ https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-25242: - Description: If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are vectorized through the vectorized adaptor. Queries like this one, performs very slowly because the concat is not chosen to be vectorized. {code:java} select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) between to_date('2018-12-01') and to_date('2021-03-01'); {code} > Query performs extremely slow with hive.vectorized.adaptor.usage.mode = > chosen > --- > > Key: HIVE-25242 > URL: https://issues.apache.org/jira/browse/HIVE-25242 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are > vectorized through the vectorized adaptor. > Queries like this one, performs very slowly because the concat is not chosen > to be vectorized. > {code:java} > select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) > between to_date('2018-12-01') and to_date('2021-03-01'); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen
[ https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-25242: - Environment: (was: If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are vectorized through the vectorized adaptor. Queries like this one, performs very slowly because the concat is not chosen to be vectorized. {code:java} select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) between to_date('2018-12-01') and to_date('2021-03-01'); {code}) > Query performs extremely slow with hive.vectorized.adaptor.usage.mode = > chosen > --- > > Key: HIVE-25242 > URL: https://issues.apache.org/jira/browse/HIVE-25242 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25223) Select with limit returns no rows on non native table
[ https://issues.apache.org/jira/browse/HIVE-25223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-25223: > Select with limit returns no rows on non native table > - > > Key: HIVE-25223 > URL: https://issues.apache.org/jira/browse/HIVE-25223 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > Str: > {code:java} > CREATE EXTERNAL TABLE hht (key string, value int) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") > TBLPROPERTIES ("hbase.table.name" = "hht", "hbase.mapred.output.outputtable" > = "hht"); > insert into hht select uuid(), cast((rand() * 100) as int); > insert into hht select uuid(), cast((rand() * 100) as int) from hht; > insert into hht select uuid(), cast((rand() * 100) as int) from hht; > insert into hht select uuid(), cast((rand() * 100) as int) from hht; > insert into hht select uuid(), cast((rand() * 100) as int) from hht; > insert into hht select uuid(), cast((rand() * 100) as int) from hht; > insert into hht select uuid(), cast((rand() * 100) as int) from hht; > set hive.fetch.task.conversion=none; > select * from hht limit 10; > +--++ > | hht.key | hht.value | > +--++ > +--++ > No rows selected (5.22 seconds) {code} > > This is caused by GlobalLimitOptimizer. The table directory is always empty > with a non native table since the data is not managed by hive (but hbase in > this case). > The optimizer scans the directory and sets the file list to an empty list. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25033) HPL/SQL thrift call fails when returning null
[ https://issues.apache.org/jira/browse/HIVE-25033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-25033. -- Resolution: Fixed > HPL/SQL thrift call fails when returning null > - > > Key: HIVE-25033 > URL: https://issues.apache.org/jira/browse/HIVE-25033 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25033) HPL/SQL thrift call fails when returning null
[ https://issues.apache.org/jira/browse/HIVE-25033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-25033: > HPL/SQL thrift call fails when returning null > - > > Key: HIVE-25033 > URL: https://issues.apache.org/jira/browse/HIVE-25033 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline
[ https://issues.apache.org/jira/browse/HIVE-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-25004. -- Resolution: Fixed > HPL/SQL subsequent statements are failing after typing a malformed input in > beeline > --- > > Key: HIVE-25004 > URL: https://issues.apache.org/jira/browse/HIVE-25004 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > An error signal is stuck after evaluating the first expression. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode
[ https://issues.apache.org/jira/browse/HIVE-24997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24997: - Resolution: Fixed Status: Resolved (was: Patch Available) > HPL/SQL udf doesn't work in tez container mode > -- > > Key: HIVE-24997 > URL: https://issues.apache.org/jira/browse/HIVE-24997 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Since HIVE-24230 it assumes the UDF is evaluated on HS2 which is not true > in general. The SessionState is only available at compile time evaluation but > later on a new interpreter should be instantiated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24427) HPL/SQL improvements
[ https://issues.apache.org/jira/browse/HIVE-24427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24427 started by Attila Magyar. > HPL/SQL improvements > > > Key: HIVE-24427 > URL: https://issues.apache.org/jira/browse/HIVE-24427 > Project: Hive > Issue Type: Improvement > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: epic > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline
[ https://issues.apache.org/jira/browse/HIVE-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-25004: - Parent: HIVE-24427 Issue Type: Sub-task (was: Bug) > HPL/SQL subsequent statements are failing after typing a malformed input in > beeline > --- > > Key: HIVE-25004 > URL: https://issues.apache.org/jira/browse/HIVE-25004 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > An error signal is stuck after evaluating the first expression. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode
[ https://issues.apache.org/jira/browse/HIVE-24997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24997: - Status: Patch Available (was: Open) > HPL/SQL udf doesn't work in tez container mode > -- > > Key: HIVE-24997 > URL: https://issues.apache.org/jira/browse/HIVE-24997 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Since HIVE-24230 it assumes the UDF is evaluated on HS2 which is not true > in general. The SessionState is only available at compile time evaluation but > later on a new interpreter should be instantiated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline
[ https://issues.apache.org/jira/browse/HIVE-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-25004: Assignee: Attila Magyar > HPL/SQL subsequent statements are failing after typing a malformed input in > beeline > --- > > Key: HIVE-25004 > URL: https://issues.apache.org/jira/browse/HIVE-25004 > Project: Hive > Issue Type: Bug > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > An error signal is stuck after evaluating the first expression. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline
[ https://issues.apache.org/jira/browse/HIVE-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-25004: - Description: An error signal is stuck after evaluating the first expression. > HPL/SQL subsequent statements are failing after typing a malformed input in > beeline > --- > > Key: HIVE-25004 > URL: https://issues.apache.org/jira/browse/HIVE-25004 > Project: Hive > Issue Type: Bug > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > An error signal is stuck after evaluating the first expression. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24383) Add Table type to HPL/SQL
[ https://issues.apache.org/jira/browse/HIVE-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-24383. -- Resolution: Fixed > Add Table type to HPL/SQL > - > > Key: HIVE-24383 > URL: https://issues.apache.org/jira/browse/HIVE-24383 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode
[ https://issues.apache.org/jira/browse/HIVE-24997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24997: - Description: Since HIVE-24230 it assumes the UDF is evaluated on HS2 which is not true in general. The SessionState is only available at compile time evaluation but later on a new interpreter should be instantiated. > HPL/SQL udf doesn't work in tez container mode > -- > > Key: HIVE-24997 > URL: https://issues.apache.org/jira/browse/HIVE-24997 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > Since HIVE-24230 it assumes the UDF is evaluated on HS2 which is not true > in general. The SessionState is only available at compile time evaluation but > later on a new interpreter should be instantiated. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode
[ https://issues.apache.org/jira/browse/HIVE-24997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24997: > HPL/SQL udf doesn't work in tez container mode > -- > > Key: HIVE-24997 > URL: https://issues.apache.org/jira/browse/HIVE-24997 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24315) Improve validation and error handling in HPL/SQL
[ https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-24315. -- Resolution: Fixed > Improve validation and error handling in HPL/SQL > - > > Key: HIVE-24315 > URL: https://issues.apache.org/jira/browse/HIVE-24315 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > There are some known issues that need to be fixed. For example it seems that > arity of a function is not checked when calling it, and same is true for > parameter types. Calling an undefined function is evaluated to null and > sometimes it seems that incorrect syntax is silently ignored. > In cases like this a helpful error message would be expected, thought we > should also consider how PL/SQL works and maintain compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24838) Reduce FS creation in Warehouse::getDnsPath for object stores
[ https://issues.apache.org/jira/browse/HIVE-24838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24838: Assignee: Attila Magyar > Reduce FS creation in Warehouse::getDnsPath for object stores > - > > Key: HIVE-24838 > URL: https://issues.apache.org/jira/browse/HIVE-24838 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Major > Attachments: Screenshot 2021-03-02 at 11.09.01 AM.png > > > [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java#L143] > > Warehouse::getDnsPath gets invoked from multiple places (e.g getDatabase() > etc). In certain cases like dynamic partition loads, lot of calls FS > instantiation calls can be avoided for object stores. > It would be good to check for BlobStorages and if so, it should be possible > to avoid FS creation. > [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java#L33] > > !Screenshot 2021-03-02 at 11.09.01 AM.png|width=372,height=296! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24315) Improve validation and error handling in HPL/SQL
[ https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24315: - Summary: Improve validation and error handling in HPL/SQL (was: Improve validation and semantic analysis in HPL/SQL ) > Improve validation and error handling in HPL/SQL > - > > Key: HIVE-24315 > URL: https://issues.apache.org/jira/browse/HIVE-24315 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > There are some known issues that need to be fixed. For example it seems that > arity of a function is not checked when calling it, and same is true for > parameter types. Calling an undefined function is evaluated to null and > sometimes it seems that incorrect syntax is silently ignored. > In cases like this a helpful error message would be expected, thought we > should also consider how PL/SQL works and maintain compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24813) thrift regeneration is failing with cannot find symbol TABLE_IS_CTAS
[ https://issues.apache.org/jira/browse/HIVE-24813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24813: > thrift regeneration is failing with cannot find symbol TABLE_IS_CTAS > > > Key: HIVE-24813 > URL: https://issues.apache.org/jira/browse/HIVE-24813 > Project: Hive > Issue Type: Bug > Components: Standalone Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > {code:java} > [ERROR] > /Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:[2145,34] > cannot find symbol > [ERROR] symbol: variable TABLE_IS_CTAS > [ERROR] location: class org.apache.hadoop.hive.metastore.HMSHandler > [ERROR] > /Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java:[591,58] > cannot find symbol > [ERROR] symbol: variable TABLE_IS_CTAS > [ERROR] location: class > org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer > [ERROR] -> [Help 1] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24715: - Resolution: Fixed Status: Resolved (was: Patch Available) > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Bucket Id range increase.pdf > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24715: - Attachment: Bucket Id range increase.pdf > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: Bucket Id range increase.pdf > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24715: - Status: Patch Available (was: Open) > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277208#comment-17277208 ] Attila Magyar edited comment on HIVE-24584 at 2/2/21, 3:39 PM: --- Hi Syed, thanks for trying it out. I think it should have worked that way. Anyway I created a test that reproduces the issue, depending on what value you set for IS_METASTORE_REMOTE. {code:java} $ cd itests/hive-unit $ mvn test -Dtest=TestMsck $ grep "Failed to deserialize the expression" target/tmp/log/hive.log {code} The test doesn't fail in any case (that's a testing issue) but you can see the Kryo related error in the log when you run it with IS_METASTORE_REMOTE=true. [^msckrepro.patch] was (Author: amagyar): Hi Syed, thanks for trying it out. I think it should have worked that way. Anyway I created a test that reproduces the issue, depending on what value you set for IS_METASTORE_REMOTE. {code:java} $ cd itests/hive-unit $ mvn test -Dtest=TestMsck $ grep "Failed to deserialize the expression" target/tmp/log/hive.log {code} The test doesn't fail in any case but you can see (that's a testing issue) but you can see the Kry related error when you run it with IS_METASTORE_REMOTE=true. [^msckrepro.patch] > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Attachments: msckrepro.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24584: - Attachment: msckrepro.patch > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Attachments: msckrepro.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277208#comment-17277208 ] Attila Magyar commented on HIVE-24584: -- Hi Syed, thanks for trying it out. I think it should have worked that way. Anyway I created a test that reproduces the issue, depending on what value you set for IS_METASTORE_REMOTE. {code:java} $ cd itests/hive-unit $ mvn test -Dtest=TestMsck $ grep "Failed to deserialize the expression" target/tmp/log/hive.log {code} The test doesn't fail in any case but you can see (that's a testing issue) but you can see the Kry related error when you run it with IS_METASTORE_REMOTE=true. [^msckrepro.patch] > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Attachments: msckrepro.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276424#comment-17276424 ] Attila Magyar edited comment on HIVE-24715 at 2/1/21, 6:02 PM: --- Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks than 4095 it overflows. See TEZ-4271 and TEZ-4130 for more context. {code:java} * Represents format of "bucket" property in Hive 3.0. * top 3 bits - version code. * next 1 bit - reserved for future * next 12 bits - the bucket ID * next 4 bits reserved for future {code} Simply increasing the range would have an undesired effect on compaction efficiency. If hundred thousands of tasks are started than we would and up having hundred thousands of files and since compaction works across statement ids it wouldn't merge those. Instead of increasing the range, the proposed solution is to let bucket id overflow into the statement id, so that the 4096th bucket will bucket_0 and it will look like it was created by max_statement_id+1. This way compaction will be able to merge the same buckets that belong to different statements. The change is backward compatible with the prior implementation while upsizing the range wouldn't. was (Author: amagyar): Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks than 4095 it overflows. See TEZ-4271 and TEZ-4130 for more context. {code:java} * Represents format of "bucket" property in Hive 3.0. * top 3 bits - version code. * next 1 bit - reserved for future * next 12 bits - the bucket ID * next 4 bits reserved for future {code} Simply increasing the range would have an undesired effect on compaction efficiency. If hundred thousands of tasks are started than we would and up having hundred thousands of files and since compaction works across statement ids it wouldn't merge those. Instead of increasing the range, the proposed solution is to let bucket id overflow into the statement id, so that the 4096th bucket will bucket_0 and it will look like it was created by statement_id+1. This way compaction will be able to merge the same buckets that belong to different statements. > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276424#comment-17276424 ] Attila Magyar edited comment on HIVE-24715 at 2/1/21, 4:04 PM: --- Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks than 4095 it overflows. See TEZ-4271 for more context. {code:java} * Represents format of "bucket" property in Hive 3.0. * top 3 bits - version code. * next 1 bit - reserved for future * next 12 bits - the bucket ID * next 4 bits reserved for future {code} Simply increasing the range would have an undesired effect on compaction efficiency. If hundred thousands of tasks are started than we would and up having hundred thousands of files and since compaction works across statement ids it wouldn't merge those. Instead of increasing the range, the proposed solution is to let bucket id overflow into the statement id, so that the 4096th bucket will bucket_0 and it will look like it was created by statement_id+1. This way compaction will be able to merge the same buckets that belong to different statements. was (Author: amagyar): Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks than 4095 it overflows. {code:java} * Represents format of "bucket" property in Hive 3.0. * top 3 bits - version code. * next 1 bit - reserved for future * next 12 bits - the bucket ID * next 4 bits reserved for future {code} Simply increasing the range would have an undesired effect on compaction efficiency. If hundred thousands of tasks are started than we would and up having hundred thousands of files and since compaction works across statement ids it wouldn't merge those. Instead of increasing the range, the proposed solution is to let bucket id overflow into the statement id, so that the 4096th bucket will bucket_0 and it will look like it was created by statement_id+1. This way compaction will be able to merge the same buckets that belong to different statements. > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276424#comment-17276424 ] Attila Magyar edited comment on HIVE-24715 at 2/1/21, 4:04 PM: --- Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks than 4095 it overflows. See TEZ-4271 and TEZ-4130 for more context. {code:java} * Represents format of "bucket" property in Hive 3.0. * top 3 bits - version code. * next 1 bit - reserved for future * next 12 bits - the bucket ID * next 4 bits reserved for future {code} Simply increasing the range would have an undesired effect on compaction efficiency. If hundred thousands of tasks are started than we would and up having hundred thousands of files and since compaction works across statement ids it wouldn't merge those. Instead of increasing the range, the proposed solution is to let bucket id overflow into the statement id, so that the 4096th bucket will bucket_0 and it will look like it was created by statement_id+1. This way compaction will be able to merge the same buckets that belong to different statements. was (Author: amagyar): Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks than 4095 it overflows. See TEZ-4271 for more context. {code:java} * Represents format of "bucket" property in Hive 3.0. * top 3 bits - version code. * next 1 bit - reserved for future * next 12 bits - the bucket ID * next 4 bits reserved for future {code} Simply increasing the range would have an undesired effect on compaction efficiency. If hundred thousands of tasks are started than we would and up having hundred thousands of files and since compaction works across statement ids it wouldn't merge those. Instead of increasing the range, the proposed solution is to let bucket id overflow into the statement id, so that the 4096th bucket will bucket_0 and it will look like it was created by statement_id+1. This way compaction will be able to merge the same buckets that belong to different statements. > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276424#comment-17276424 ] Attila Magyar commented on HIVE-24715: -- Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks than 4095 it overflows. {code:java} * Represents format of "bucket" property in Hive 3.0. * top 3 bits - version code. * next 1 bit - reserved for future * next 12 bits - the bucket ID * next 4 bits reserved for future {code} Simply increasing the range would have an undesired effect on compaction efficiency. If hundred thousands of tasks are started than we would and up having hundred thousands of files and since compaction works across statement ids it wouldn't merge those. Instead of increasing the range, the proposed solution is to let bucket id overflow into the statement id, so that the 4096th bucket will bucket_0 and it will look like it was created by statement_id+1. This way compaction will be able to merge the same buckets that belong to different statements. > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24715) Increase bucketId range
[ https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24715: > Increase bucketId range > --- > > Key: HIVE-24715 > URL: https://issues.apache.org/jira/browse/HIVE-24715 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276184#comment-17276184 ] Attila Magyar commented on HIVE-24584: -- Hi [~srahman], did you manage to reproduce it? > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HIVE-24696) Drop procedure and drop package syntax for HPLSQL
[ https://issues.apache.org/jira/browse/HIVE-24696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar resolved HIVE-24696. -- Assignee: Attila Magyar Resolution: Fixed Fixed as part of HIVE-24346. > Drop procedure and drop package syntax for HPLSQL > - > > Key: HIVE-24696 > URL: https://issues.apache.org/jira/browse/HIVE-24696 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273439#comment-17273439 ] Attila Magyar commented on HIVE-24584: -- [~srahman] If you have a hive with a remote, non embedded HMS then: 1.) create external table t1 (c1 int, c2 int) partitioned by (c3 int) location 'hdfs:///warehouse/tablespace/external/hive/t1'; 2.) insert into t1 partition(c3=1) values (1,1); insert into t1 partition(c3=2) values (2,2); insert into t1 partition(c3=3) values (3,3); 3.) hdfs dfs -rm -r hdfs:///warehouse/tablespace/external/hive/t1/c3=3 4.) msck repair table t1 sync partitions; > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263996#comment-17263996 ] Attila Magyar commented on HIVE-24584: -- Hi [~srahman], Thanks for the input. My understanding is that PartitionExpressionForMetastore is the default value of "metastore.expression.proxy" (In HiveConf.java/MetaStoreConf.java). Msck attempts to override this by creating a HiveMetaStoreClient with a modified config object. However unless HS2 and HMS are running inside the same process (or Msck is called within HMS via the periodically running PartitionManagementTask) this doesn't work. In case of a remote HMS, Msck should have called msc.setMetaConf() or something that modifies the config via thrift. > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory
[ https://issues.apache.org/jira/browse/HIVE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24625: - Description: MetastoreDefaultTransformer in HMS converts a managed non transactional table to external table. MoveTask still uses the managed path when loading the data, resulting an always empty table. {code:java} create table tbl1 TBLPROPERTIES ('transactional'='false') as select * from other;{code} After the conversion the table location points to an external directory: Location: | hdfs://c670-node2.coelab.cloudera.com:8020/warehouse/tablespace/external/hive/tbl1 Move task uses the managed location" {code:java} INFO : Moving data to directory hdfs://...:8020/warehouse/tablespace/managed/hive/tbl1 from hdfs://...:8020/warehouse/tablespace/managed/hive/.hive-staging_hive_2021-01-05_16-10-39_973_41005081081760609-4/-ext-1000 {code} > CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect > directory > - > > Key: HIVE-24625 > URL: https://issues.apache.org/jira/browse/HIVE-24625 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > MetastoreDefaultTransformer in HMS converts a managed non transactional table > to external table. MoveTask still uses the managed path when loading the > data, resulting an always empty table. > {code:java} > create table tbl1 TBLPROPERTIES ('transactional'='false') as select * from > other;{code} > After the conversion the table location points to an external directory: > Location: | > hdfs://c670-node2.coelab.cloudera.com:8020/warehouse/tablespace/external/hive/tbl1 > Move task uses the managed location" > {code:java} > INFO : Moving data to directory > hdfs://...:8020/warehouse/tablespace/managed/hive/tbl1 from > hdfs://...:8020/warehouse/tablespace/managed/hive/.hive-staging_hive_2021-01-05_16-10-39_973_41005081081760609-4/-ext-1000 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory
[ https://issues.apache.org/jira/browse/HIVE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24625: > CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect > directory > - > > Key: HIVE-24625 > URL: https://issues.apache.org/jira/browse/HIVE-24625 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258905#comment-17258905 ] Attila Magyar commented on HIVE-24584: -- cc: [~srahman] > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-24584 started by Attila Magyar. > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair
[ https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24584: > IndexOutOfBoundsException from Kryo when running msck repair > > > Key: HIVE-24584 > URL: https://issues.apache.org/jira/browse/HIVE-24584 > Project: Hive > Issue Type: Bug >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > The following exception is coming when running "msck repair table t1 sync > partitions". > {code:java} > java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) > [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
[ https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258298#comment-17258298 ] Attila Magyar commented on HIVE-23851: -- Hey [~srahman], it looks like the same exception can be thrown from filterPartitionsByExpr() as well. This patch addresses only convertExprToFilter(). {code:java} java.lang.IndexOutOfBoundsException: Index: 97, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232] at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232] at org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116) [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88) [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code} I see the original issue was fixed by skipping the deserialization and returning the exprBytes as string when the deserialization would fail. But filterPartitionsByExpr returns a boolean. Would returning false be an acceptable fix when deserialization fails in filterPartitionsByExpr? cc: [~kgyrtkirk] > MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions > > > Key: HIVE-23851 > URL: https://issues.apache.org/jira/browse/HIVE-23851 > Project: Hive > Issue Type: Bug >Affects Versions: 4.0.0 >Reporter: Syed Shameerur Rahman >Assignee: Syed Shameerur Rahman >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > *Steps to reproduce:* > # Create external table > # Run msck command to sync all the partitions with metastore > # Remove one of the partition path > # Run msck repair with partition filtering > *Stack Trace:* > {code:java} > 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] > ppr.PartitionExpressionForMetastore: Failed to deserialize the expression > java.lang.IndexOutOfBoundsException: Index: 110, Size: 0 > at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192] > at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192] > at > org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775) > ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52) > [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.metastore.PartF
[jira] [Updated] (HIVE-24383) Add Table type to HPL/SQL
[ https://issues.apache.org/jira/browse/HIVE-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24383: - Parent: HIVE-24427 Issue Type: Sub-task (was: Improvement) > Add Table type to HPL/SQL > - > > Key: HIVE-24383 > URL: https://issues.apache.org/jira/browse/HIVE-24383 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL
[ https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24315: - Parent: HIVE-24427 Issue Type: Sub-task (was: Improvement) > Improve validation and semantic analysis in HPL/SQL > > > Key: HIVE-24315 > URL: https://issues.apache.org/jira/browse/HIVE-24315 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > There are some known issues that need to be fixed. For example it seems that > arity of a function is not checked when calling it, and same is true for > parameter types. Calling an undefined function is evaluated to null and > sometimes it seems that incorrect syntax is silently ignored. > In cases like this a helpful error message would be expected, thought we > should also consider how PL/SQL works and maintain compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24346) Store HPL/SQL packages into HMS
[ https://issues.apache.org/jira/browse/HIVE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24346: - Parent: HIVE-24427 Issue Type: Sub-task (was: New Feature) > Store HPL/SQL packages into HMS > --- > > Key: HIVE-24346 > URL: https://issues.apache.org/jira/browse/HIVE-24346 > Project: Hive > Issue Type: Sub-task > Components: hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24217: - Parent: HIVE-24427 Issue Type: Sub-task (was: Bug) > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Sub-task > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Attachments: HPL_SQL storedproc HMS storage.pdf > > Time Spent: 4h 50m > Remaining Estimate: 0h > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > This is an incremental step towards having fully capable stored procedures in > Hive. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24230) Integrate HPL/SQL into HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24230: - Parent: HIVE-24427 Issue Type: Sub-task (was: Bug) > Integrate HPL/SQL into HiveServer2 > -- > > Key: HIVE-24230 > URL: https://issues.apache.org/jira/browse/HIVE-24230 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > HPL/SQL is a standalone command line program that can store and load scripts > from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL > depends on Hive and not the other way around. > Changing the dependency order between HPL/SQL and HiveServer would open up > some possibilities which are currently not feasable to implement. For example > one might want to use a third party SQL tool to run selects on stored > procedure (or rather function in this case) outputs. > {code:java} > SELECT * from myStoredProcedure(1, 2); {code} > HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not > work with the current architecture. > Another important factor is performance. Declarative SQL commands are sent to > Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC > and use HiveSever’s internal API for compilation and execution. > The third factor is that existing tools like Beeline or Hue cannot be used > with HPL/SQL since it has its own, separated CLI. > > To make it easier to implement, we keep things separated in the inside at > first, by introducing a hive session level JDBC parameter. > {code:java} > jdbc:hive2://localhost:1/default;hplsqlMode=true {code} > > The hplsqlMode indicates that we are in procedural SQL mode where the user > can create and call stored procedures. HPLSQL allows you to write any kind of > procedural statement at the top level. This patch doesn't limit this but it > might be better to eventually restrict what statements are allowed outside of > stored procedures. > > Since HPLSQL and Hive are running in the same process there is no need to use > the JDBC driver between them. The patch adds an abstraction with 2 different > implementations, one for executing queries on JDBC (for keeping the existing > behaviour) and another one for directly calling Hive's compiler. In HPLSQL > mode the latter is used. > In the inside a new operation (HplSqlOperation) and operation type > (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it > uses the hplsql interpreter to execute arbitrary scripts. This operation > might spawns new SQLOpertions. > For example consider the following statement: > {code:java} > FOR i in 1..10 LOOP > SELECT * FROM table > END LOOP;{code} > We send this to beeline while we'er in hplsql mode. Hive will create a hplsql > interpreter and store it in the session state. A new HplSqlOperation is > created to run the script on the interpreter. > HPLSQL knows how to execute the for loop, but i'll call Hive to run the > select expression. The HplSqlOperation is notified when the select reads a > row and accumulates the rows into a RowSet (memory consumption need to be > considered here) which can be retrieved via thrift from the client side. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24427) HPL/SQL improvements
[ https://issues.apache.org/jira/browse/HIVE-24427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24427: > HPL/SQL improvements > > > Key: HIVE-24427 > URL: https://issues.apache.org/jira/browse/HIVE-24427 > Project: Hive > Issue Type: Improvement > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24383) Add Table type to HPL/SQL
[ https://issues.apache.org/jira/browse/HIVE-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24383: > Add Table type to HPL/SQL > - > > Key: HIVE-24383 > URL: https://issues.apache.org/jira/browse/HIVE-24383 > Project: Hive > Issue Type: Improvement > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24346) Store HPL/SQL packages into HMS
[ https://issues.apache.org/jira/browse/HIVE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24346: > Store HPL/SQL packages into HMS > --- > > Key: HIVE-24346 > URL: https://issues.apache.org/jira/browse/HIVE-24346 > Project: Hive > Issue Type: New Feature > Components: hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24338) HPL/SQL missing features
[ https://issues.apache.org/jira/browse/HIVE-24338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223661#comment-17223661 ] Attila Magyar commented on HIVE-24338: -- The ForAll is mainly an optimization to avoid sending statements line by line to the DB. Named parameters are not that widely used as far as I know. Goto and label is mostly used for error handling. Bulk collect can be useful for selecting into an array but it requires an Table or Array type first and type declarations first. > HPL/SQL missing features > > > Key: HIVE-24338 > URL: https://issues.apache.org/jira/browse/HIVE-24338 > Project: Hive > Issue Type: Improvement > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > There are some features which are supported by Oracle's PL/SQL but not by > HPL/SQL. This Jira is about to prioritize them and investigate the > feasibility of the implementation. > * ForAll syntax like: ForAll j in i..j save exceptions > * Bulk collect: : Fetch cursor Bulk Collect Into list Limit n; > * Type declartion: Type T_cab is TABLE of > * TABLE datatype > * GOTO and LABEL > * Global variables like $$PLSQL_UNIT and others > * Named parameters func(name1 => value1, name2 => value2); > * Built in functions: trunc, lpad, to_date, ltrim, rtrim, sysdate -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24338) HPL/SQL missing features
[ https://issues.apache.org/jira/browse/HIVE-24338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24338: > HPL/SQL missing features > > > Key: HIVE-24338 > URL: https://issues.apache.org/jira/browse/HIVE-24338 > Project: Hive > Issue Type: Improvement > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > There are some features which are supported by Oracle's PL/SQL but not by > HPL/SQL. This Jira is about to prioritize them and investigate the > feasibility of the implementation. > * ForAll syntax like: ForAll j in i..j save exceptions > * Bulk collect: : Fetch cursor Bulk Collect Into list Limit n; > * Type declartion: Type T_cab is TABLE of > * TABLE datatype > * GOTO and LABEL > * Global variables like $$PLSQL_UNIT and others > * Named parameters func(name1 => value1, name2 => value2); > * Built in functions: trunc, lpad, to_date, ltrim, rtrim, sysdate -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24230) Integrate HPL/SQL into HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24230: - Description: HPL/SQL is a standalone command line program that can store and load scripts from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL depends on Hive and not the other way around. Changing the dependency order between HPL/SQL and HiveServer would open up some possibilities which are currently not feasable to implement. For example one might want to use a third party SQL tool to run selects on stored procedure (or rather function in this case) outputs. {code:java} SELECT * from myStoredProcedure(1, 2); {code} HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not work with the current architecture. Another important factor is performance. Declarative SQL commands are sent to Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC and use HiveSever’s internal API for compilation and execution. The third factor is that existing tools like Beeline or Hue cannot be used with HPL/SQL since it has its own, separated CLI. To make it easier to implement, we keep things separated in the inside at first, by introducing a hive session level JDBC parameter. {code:java} jdbc:hive2://localhost:1/default;hplsqlMode=true {code} The hplsqlMode indicates that we are in procedural SQL mode where the user can create and call stored procedures. HPLSQL allows you to write any kind of procedural statement at the top level. This patch doesn't limit this but it might be better to eventually restrict what statements are allowed outside of stored procedures. Since HPLSQL and Hive are running in the same process there is no need to use the JDBC driver between them. The patch adds an abstraction with 2 different implementations, one for executing queries on JDBC (for keeping the existing behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode the latter is used. In the inside a new operation (HplSqlOperation) and operation type (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it uses the hplsql interpreter to execute arbitrary scripts. This operation might spawns new SQLOpertions. For example consider the following statement: {code:java} FOR i in 1..10 LOOP SELECT * FROM table END LOOP;{code} We send this to beeline while we'er in hplsql mode. Hive will create a hplsql interpreter and store it in the session state. A new HplSqlOperation is created to run the script on the interpreter. HPLSQL knows how to execute the for loop, but i'll call Hive to run the select expression. The HplSqlOperation is notified when the select reads a row and accumulates the rows into a RowSet (memory consumption need to be considered here) which can be retrieved via thrift from the client side. was: HPL/SQL is a standalone command line program that can store and load scripts from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL depends on Hive and not the other way around. Changing the dependency order between HPL/SQL and HiveServer would open up some possibilities which are currently not feasable to implement. For example one might want to use a third party SQL tool to run selects on stored procedure (or rather function in this case) outputs. {code:java} SELECT * from myStoredProcedure(1, 2); {code} HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not work with the current architecture. Another important factor is performance. Declarative SQL commands are sent to Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC and use HiveSever’s internal API for compilation and execution. The third factor is that existing tools like Beeline or Hue cannot be used with HPL/SQL since it has its own, separated CLI. To make it easier to implement, we keep things separated in the inside at first, by introducing a hive session level JDBC parameter. {code:java} jdbc:hive2://localhost:1/default;hplsqlMode=true {code} The hplsqlMode indicates that we are in procedural SQL mode where the user can create and call stored procedures. HPLSQL allows you to write any kind of procedural statement at the top level. This patch doesn't limit this but it might be better to eventually restrict what statements are allowed outside of stored procedures. Since HPLSQL and Hive are running in the same process there is no need to use the JDBC driver between them. The patch adds an abstraction with 2 different implementations, one for executing queries on JDBC (for keeping the existing behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode the latter is used. In the inside a new operation (HplSqlOperation) and operation type (PROCEDURAL_SQL) was added which works similar to the SQLOpe
[jira] [Updated] (HIVE-24230) Integrate HPL/SQL into HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24230: - Description: HPL/SQL is a standalone command line program that can store and load scripts from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL depends on Hive and not the other way around. Changing the dependency order between HPL/SQL and HiveServer would open up some possibilities which are currently not feasable to implement. For example one might want to use a third party SQL tool to run selects on stored procedure (or rather function in this case) outputs. {code:java} SELECT * from myStoredProcedure(1, 2); {code} HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not work with the current architecture. Another important factor is performance. Declarative SQL commands are sent to Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC and use HiveSever’s internal API for compilation and execution. The third factor is that existing tools like Beeline or Hue cannot be used with HPL/SQL since it has its own, separated CLI. To make it easier to implement, we keep things separated in the inside at first, by introducing a hive session level JDBC parameter. {code:java} jdbc:hive2://localhost:1/default;hplsqlMode=true {code} The hplsqlMode indicates that we are in procedural SQL mode where the user can create and call stored procedures. HPLSQL allows you to write any kind of procedural statement at the top level. This patch doesn't limit this but it might be better to eventually restrict what statements are allowed outside of stored procedures. Since HPLSQL and Hive are running in the same process there is no need to use the JDBC driver between them. The patch adds an abstraction with 2 different implementations, one for executing queries on JDBC (for keeping the existing behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode the latter is used. In the inside a new operation (HplSqlOperation) and operation type (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it uses the hplsql interpreter to execute arbitrary scripts. This operation might spawns new SQLOpertions. For example consider the following statement: {code:java} FOR i in 1..10 LOOP SELECT * FROM table END LOOP;{code} We send this to beeline while we'er in hplsql mode. Hive will create a hplsql interpreter and store it in the session state. A new HplSqlOperation is created to run the script on the interpreter. HPLSQL knows how to execute the for loop, but i'll call Hive to run the select expression. The HplSqlOperation is notified when the select reads a row and accumulates the rows into a RowSet which can be retrieved via thrift from the client side. was: HPL/SQL is a standalone command line program that can store and load scripts from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL depends on Hive and not the other way around. Changing the dependency order between HPL/SQL and HiveServer would open up some possibilities which are currently not feasable to implement. For example one might want to use a third party SQL tool to run selects on stored procedure (or rather function in this case) outputs. {code:java} SELECT * from myStoredProcedure(1, 2); {code} HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not work with the current architecture. Another important factor is performance. Declarative SQL commands are sent to Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC and use HiveSever’s internal API for compilation and execution. The third factor is that existing tools like Beeline or Hue cannot be used with HPL/SQL since it has its own, separated CLI. To make it easier to implement, we keep things separated in the inside at first, by introducing a hive session level JDBC parameter. {code:java} jdbc:hive2://localhost:1/default;hplsqlMode=true {code} The hplsqlMode indicates that we are in procedural SQL mode where the user can create and call stored procedures. HPLSQL allows you to write any kind of procedural statement at the top level. This patch doesn't limit this but it might be better to eventually restrict what statements are allowed outside of stored procedures. Since HPLSQL and Hive are running in the same process there is no need to use the JDBC driver between them. The patch adds an abstraction on with 2 different implementations, one for executing queries on JDBC (for keeping the existing behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode the latter is used. In the inside a new operation (HplSqlOperation) and operation type (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it uses the hplsql interpreter to
[jira] [Updated] (HIVE-24230) Integrate HPL/SQL into HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24230: - Description: HPL/SQL is a standalone command line program that can store and load scripts from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL depends on Hive and not the other way around. Changing the dependency order between HPL/SQL and HiveServer would open up some possibilities which are currently not feasable to implement. For example one might want to use a third party SQL tool to run selects on stored procedure (or rather function in this case) outputs. {code:java} SELECT * from myStoredProcedure(1, 2); {code} HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not work with the current architecture. Another important factor is performance. Declarative SQL commands are sent to Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC and use HiveSever’s internal API for compilation and execution. The third factor is that existing tools like Beeline or Hue cannot be used with HPL/SQL since it has its own, separated CLI. To make it easier to implement, we keep things separated in the inside at first, by introducing a hive session level JDBC parameter. {code:java} jdbc:hive2://localhost:1/default;hplsqlMode=true {code} The hplsqlMode indicates that we are in procedural SQL mode where the user can create and call stored procedures. HPLSQL allows you to write any kind of procedural statement at the top level. This patch doesn't limit this but it might be better to eventually restrict what statements are allowed outside of stored procedures. Since HPLSQL and Hive are running in the same process there is no need to use the JDBC driver between them. The patch adds an abstraction on with 2 different implementations, one for executing queries on JDBC (for keeping the existing behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode the latter is used. In the inside a new operation (HplSqlOperation) and operation type (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it uses the hplsql interpreter to execute arbitrary scripts. This operation might spawns new SQLOpertions. For example consider the following statement: {code:java} FOR i in 1..10 LOOP SELECT * FROM table END LOOP;{code} We send this to beeline while we'er in hplsql mode. Hive will create a hplsql interpreter and store it in the session state. A new HplSqlOperation is created to run the script on the interpreter. HPLSQL knows how to execute the for loop, but i'll call Hive to run the select expression. The HplSqlOperation is notified when the select reads a row and accumulates the rows into a RowSet which can be retrieved via thrift from the client side. was: HPL/SQL is a standalone command line program that can store and load scripts from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL depends on Hive and not the other way around. Changing the dependency order between HPL/SQL and HiveServer would open up some possibilities which are currently not feasable to implement. For example one might want to use a third party SQL tool to run selects on stored procedure (or rather function in this case) outputs. {code:java} SELECT * from myStoredProcedure(1, 2); {code} HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not work with the current architecture. Another important factor is performance. Declarative SQL commands are sent to Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC and use HiveSever’s internal API for compilation and execution. The third factor is that existing tools like Beeline or Hue cannot be used with HPL/SQL since it has its own, separated CLI. > Integrate HPL/SQL into HiveServer2 > -- > > Key: HIVE-24230 > URL: https://issues.apache.org/jira/browse/HIVE-24230 > Project: Hive > Issue Type: Bug > Components: HiveServer2, hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > HPL/SQL is a standalone command line program that can store and load scripts > from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL > depends on Hive and not the other way around. > Changing the dependency order between HPL/SQL and HiveServer would open up > some possibilities which are currently not feasable to implement. For example > one might want to use a third party SQL tool to run selects on stored > procedure (or rather function in this case) outputs. > {code:java} > SELECT * from myStoredProcedure(1, 2); {code} > HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not > work with the current archit
[jira] [Assigned] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL
[ https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24315: > Improve validation and semantic analysis in HPL/SQL > > > Key: HIVE-24315 > URL: https://issues.apache.org/jira/browse/HIVE-24315 > Project: Hive > Issue Type: Improvement > Components: hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > There are some known issues that need to be fixed. For example it seems that > arity of a function is not checked when calling it, and same is true for > parameter types. Calling an undefined function is evaluated to null and > sometimes it seems that incorrect syntax is silently ignored. > In cases like this a helpful error message would be expected, thought we > should also consider how PL/SQL works and maintain compatibility. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-24230) Integrate HPL/SQL into HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208111#comment-17208111 ] Attila Magyar commented on HIVE-24230: -- cc: [~kgyrtkirk] > Integrate HPL/SQL into HiveServer2 > -- > > Key: HIVE-24230 > URL: https://issues.apache.org/jira/browse/HIVE-24230 > Project: Hive > Issue Type: Bug > Components: HiveServer2, hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > HPL/SQL is a standalone command line program that can store and load scripts > from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL > depends on Hive and not the other way around. > Changing the dependency order between HPL/SQL and HiveServer would open up > some possibilities which are currently not feasable to implement. For example > one might want to use a third party SQL tool to run selects on stored > procedure (or rather function in this case) outputs. > {code:java} > SELECT * from myStoredProcedure(1, 2); {code} > HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not > work with the current architecture. > Another important factor is performance. Declarative SQL commands are sent to > Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC > and use HiveSever’s internal API for compilation and execution. > The third factor is that existing tools like Beeline or Hue cannot be used > with HPL/SQL since it has its own, separated CLI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24230) Integrate HPL/SQL into HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24230: > Integrate HPL/SQL into HiveServer2 > -- > > Key: HIVE-24230 > URL: https://issues.apache.org/jira/browse/HIVE-24230 > Project: Hive > Issue Type: Bug > Components: HiveServer2, hpl/sql >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > HPL/SQL is a standalone command line program that can store and load scripts > from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL > depends on Hive and not the other way around. > Changing the dependency order between HPL/SQL and HiveServer would open up > some possibilities which are currently not feasable to implement. For example > one might want to use a third party SQL tool to run selects on stored > procedure (or rather function in this case) outputs. > {code:java} > SELECT * from myStoredProcedure(1, 2); {code} > HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not > work with the current architecture. > Another important factor is performance. Declarative SQL commands are sent to > Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC > and use HiveSever’s internal API for compilation and execution. > The third factor is that existing tools like Beeline or Hue cannot be used > with HPL/SQL since it has its own, separated CLI. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24217: - Description: HPL/SQL procedures are currently stored in text files. The goal of this Jira is to implement a Metastore backend for storing and loading these procedures. See the attached design for more information. was:HPL/SQL procedures are currently stored in text files. The goal of this Jira is to implement a Metastore backend for storing and loading these procedures. > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Attachments: HPL_SQL storedproc HMS storage.pdf > > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24217: - Description: HPL/SQL procedures are currently stored in text files. The goal of this Jira is to implement a Metastore backend for storing and loading these procedures. This is an incremental step towards having fully capable stored procedures in Hive. See the attached design for more information. was: HPL/SQL procedures are currently stored in text files. The goal of this Jira is to implement a Metastore backend for storing and loading these procedures. See the attached design for more information. > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Attachments: HPL_SQL storedproc HMS storage.pdf > > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. > This is an incremental step towards having fully capable stored procedures in > Hive. > > See the attached design for more information. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-24217: - Attachment: HPL_SQL storedproc HMS storage.pdf > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Attachments: HPL_SQL storedproc HMS storage.pdf > > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures
[ https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24217: > HMS storage backend for HPL/SQL stored procedures > - > > Key: HIVE-24217 > URL: https://issues.apache.org/jira/browse/HIVE-24217 > Project: Hive > Issue Type: Bug > Components: Hive, hpl/sql, Metastore >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > HPL/SQL procedures are currently stored in text files. The goal of this Jira > is to implement a Metastore backend for storing and loading these procedures. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-24149) HiveStreamingConnection doesn't close HMS connection
[ https://issues.apache.org/jira/browse/HIVE-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-24149: > HiveStreamingConnection doesn't close HMS connection > > > Key: HIVE-24149 > URL: https://issues.apache.org/jira/browse/HIVE-24149 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > There 3 HMS connections used by HiveStreamingConnection. One for TX one for > hearbeat and for notifications. The close method only closes the first 2 > leaving the last one open which eventually overloads HMS and it becomes > unresponsive. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23957) Limit followed by TopNKey improvement
[ https://issues.apache.org/jira/browse/HIVE-23957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-23957: > Limit followed by TopNKey improvement > - > > Key: HIVE-23957 > URL: https://issues.apache.org/jira/browse/HIVE-23957 > Project: Hive > Issue Type: Improvement >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > The Limit + topnkey pushdown might result a limit operator followed by a TNK > in the physical plan. This likely makes the TNK unnecessary in cases like > this. Need to investigate if/when we can remove the TNK. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23723: - Attachment: (was: HIVE-23723.1.patch) > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23937) Take null ordering into consideration when pushing TNK through inner joins
[ https://issues.apache.org/jira/browse/HIVE-23937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-23937: > Take null ordering into consideration when pushing TNK through inner joins > -- > > Key: HIVE-23937 > URL: https://issues.apache.org/jira/browse/HIVE-23937 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23817) Pushing TopN Key operator PKFK inner joins
[ https://issues.apache.org/jira/browse/HIVE-23817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-23817: > Pushing TopN Key operator PKFK inner joins > -- > > Key: HIVE-23817 > URL: https://issues.apache.org/jira/browse/HIVE-23817 > Project: Hive > Issue Type: Improvement >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > > If there is primary key foreign key relationship between the tables we can > push the topnkey operator through the join. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23757) Pushing TopN Key operator through MAPJOIN
[ https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23757: - Attachment: (was: HIVE-23757.1.patch) > Pushing TopN Key operator through MAPJOIN > - > > Key: HIVE-23757 > URL: https://issues.apache.org/jira/browse/HIVE-23757 > Project: Hive > Issue Type: Improvement >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > So far only MERGEJOIN + JOIN cases are handled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23757) Pushing TopN Key operator through MAPJOIN
[ https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144739#comment-17144739 ] Attila Magyar commented on HIVE-23757: -- cc: [~kkasa], [~jcamacho] > Pushing TopN Key operator through MAPJOIN > - > > Key: HIVE-23757 > URL: https://issues.apache.org/jira/browse/HIVE-23757 > Project: Hive > Issue Type: Improvement >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23757.1.patch > > > So far only MERGEJOIN + JOIN cases are handled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23757) Pushing TopN Key operator through MAPJOIN
[ https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23757: - Status: Patch Available (was: Open) > Pushing TopN Key operator through MAPJOIN > - > > Key: HIVE-23757 > URL: https://issues.apache.org/jira/browse/HIVE-23757 > Project: Hive > Issue Type: Improvement >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23757.1.patch > > > So far only MERGEJOIN + JOIN cases are handled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23757) Pushing TopN Key operator through MAPJOIN
[ https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23757: - Attachment: HIVE-23757.1.patch > Pushing TopN Key operator through MAPJOIN > - > > Key: HIVE-23757 > URL: https://issues.apache.org/jira/browse/HIVE-23757 > Project: Hive > Issue Type: Improvement >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23757.1.patch > > > So far only MERGEJOIN + JOIN cases are handled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23757) Pushing TopN Key operator through MAPJOIN
[ https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-23757: > Pushing TopN Key operator through MAPJOIN > - > > Key: HIVE-23757 > URL: https://issues.apache.org/jira/browse/HIVE-23757 > Project: Hive > Issue Type: Improvement >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23757.1.patch > > > So far only MERGEJOIN + JOIN cases are handled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142863#comment-17142863 ] Attila Magyar edited comment on HIVE-23723 at 6/23/20, 5:14 PM: [~jcamachorodriguez], ??Concerning your patch, it seems you are removing the original limit on top of the left outer join? Note that you cannot remove it : If you have 5 input rows on the left side, you know the LOJ will produce at least 5 rows, however you cannot guarantee the join will produce 5 rows at most.?? Got it, that should be kept indeed. However reason why additional reducers are introduced by the limittranspose implementation is not fully clear to me. Do you think we should drop this patch as it's already implemented by the limittranspose, and focus on tweaking the existing implementation? cc: [~ashutoshc] was (Author: amagyar): [~jcamachorodriguez], ??Concerning your patch, it seems you are removing the original limit on top of the left outer join? Note that you cannot remove it : If you have 5 input rows on the left side, you know the LOJ will produce at least 5 rows, however you cannot guarantee the join will produce 5 rows at most.?? Got it, that should be kept indeed. However reason why additional reducers are introduced by the limittranspose implementation is not fully clear to me. Do you think we should drop this patch as it's already implemented by the limittranspose, and focus on tweaking the existing implementation? > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23723.1.patch > > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142863#comment-17142863 ] Attila Magyar commented on HIVE-23723: -- [~jcamachorodriguez], ??Concerning your patch, it seems you are removing the original limit on top of the left outer join? Note that you cannot remove it : If you have 5 input rows on the left side, you know the LOJ will produce at least 5 rows, however you cannot guarantee the join will produce 5 rows at most.?? Got it, that should be kept indeed. However reason why additional reducers are introduced by the limittranspose implementation is not fully clear to me. Do you think we should drop this patch as it's already implemented by the limittranspose, and focus on tweaking the existing implementation? > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23723.1.patch > > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140588#comment-17140588 ] Attila Magyar commented on HIVE-23723: -- [~jcamachorodriguez], thanks for letting me know, I haven't realized it. Any idea why it is disabled by default? Also the plan looks different with limittranspose, not sure why. There are 3 Limit operators. The first one is what was pushed through the LOJ. But there are 2 others in Reducer 2. {code:java} explain SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key = src2.key) LIMIT 5; {code} {code:java} PREHOOK: query: explain SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key = src2.key) LIMIT 5 PREHOOK: type: QUERY PREHOOK: Input: default@src A masked pattern was here POSTHOOK: query: explain SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key = src2.key) LIMIT 5 POSTHOOK: type: QUERY POSTHOOK: Input: default@src A masked pattern was here STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1STAGE PLANS: Stage: Stage-1 Tez A masked pattern was here Edges: Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE) Reducer 3 <- Map 4 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE) A masked pattern was here Vertices: Map 1 Map Operator Tree: TableScan alias: src1 Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: string) outputColumnNames: _col0 Statistics: Num rows: 500 Data size: 43500 Basic stats: COMPLETE Column stats: COMPLETE Limit Number of rows: 5 Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator null sort order: sort order: Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE TopN Hash Memory Usage: 0.3 value expressions: _col0 (type: string) Execution mode: vectorized, llap LLAP IO: no inputs Map 4 Map Operator Tree: TableScan alias: src2 filterExpr: key is not null (type: boolean) Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE Filter Operator predicate: key is not null (type: boolean) Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: key (type: string), value (type: string) outputColumnNames: _col0, _col1 Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) null sort order: z sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 500 Data size: 89000 Basic stats: COMPLETE Column stats: COMPLETE value expressions: _col1 (type: string) Execution mode: vectorized, llap LLAP IO: no inputs Reducer 2 Execution mode: vectorized, llap Reduce Operator Tree: Limit Number of rows: 5 Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE Select Operator expressions: VALUE._col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE Limit Number of rows: 5 Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE Reduce Output Operator key expressions: _col0 (type: string) null sort order: z sort order: + Map-reduce partition columns: _col0 (type: string) Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE Column stats: COMPLETE Reducer 3 Execution mode: llap Reduce Operator Tree: Merge Join Operator condition map: Left Outer Join 0 to 1
[jira] [Updated] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23723: - Status: Patch Available (was: Open) > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23723.1.patch > > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23723: - Attachment: HIVE-23723.1.patch > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23723.1.patch > > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23723) Limit operator pushdown through LOJ
[ https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-23723: > Limit operator pushdown through LOJ > --- > > Key: HIVE-23723 > URL: https://issues.apache.org/jira/browse/HIVE-23723 > Project: Hive > Issue Type: Improvement > Components: Hive >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > > Limit operator (without an order by) can be pushed through SELECTS and LEFT > OUTER JOINs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-22687) Query hangs indefinitely if LLAP daemon registers after the query is submitted
[ https://issues.apache.org/jira/browse/HIVE-22687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129639#comment-17129639 ] Attila Magyar commented on HIVE-22687: -- I reproduced this issue by putting a sleep between the worker + slot node creation and submitting a query in between those two events. After I applied the patch I was no longer able to reproduce it, so this seems to be a viable fix to me. cc: [~ashutoshc] [~prasanth_j] > Query hangs indefinitely if LLAP daemon registers after the query is submitted > -- > > Key: HIVE-22687 > URL: https://issues.apache.org/jira/browse/HIVE-22687 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.1.0 >Reporter: Himanshu Mishra >Assignee: Himanshu Mishra >Priority: Major > Attachments: HIVE-22687.01.patch, HIVE-22687.02.patch > > > If a query is submitted and no LLAP daemon is running, it waits for 1 minute > and times out with error {{SERVICE_UNAVAILABLE}}. > While waiting, if a new LLAP Daemon starts, then the timeout is cancelled, > and the tasks do not get scheduled as well. As a result, the query hangs > indefinitely. > This is due to the race condition where LLAP Daemon first registers the LLAP > instance at {{.../workers/worker-}}, and afterwards registers > {{.../workers/slot-}}. In the gap between two, Tez AM gets notified of > worker zk node and while processing it checks if slot zk node is present, if > not it rejects the LLAP Daemon. Error in Tez AM is: > {code:java} > [INFO] [LlapScheduler] |impl.LlapZookeeperRegistryImpl|: Unknown slot for > 8ebfdc45-0382-4757-9416-52898885af90{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23277: - Status: Open (was: Patch Available) > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 > AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23277: - Attachment: (was: HIVE-23277.2.patch) > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 > AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23277: - Attachment: HIVE-23277.2.patch > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, HIVE-23277.2.patch, Screenshot > 2020-04-23 at 11.27.42 AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23277: - Status: Patch Available (was: Open) > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, HIVE-23277.2.patch, Screenshot > 2020-04-23 at 11.27.42 AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23277: - Attachment: (was: HIVE-23277.2.patch) > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, HIVE-23277.2.patch, Screenshot > 2020-04-23 at 11.27.42 AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23277: - Attachment: HIVE-23277.2.patch > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, HIVE-23277.2.patch, Screenshot > 2020-04-23 at 11.27.42 AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23277: - Status: Open (was: Patch Available) > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, HIVE-23277.2.patch, Screenshot > 2020-04-23 at 11.27.42 AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124953#comment-17124953 ] Attila Magyar edited comment on HIVE-23277 at 6/3/20, 1:31 PM: --- Hey [~rajesh.balamohan], I made a patch for this, where json serialization happens on the logWriter's thread. The event is only built partially up front with a json object (not the serialized string) and the conversion happens right before writing out the event. However events like this take up more space in memory as before. About twice as much. The queue has a default max capacity of 64 so this might not be a problem. {code:java} ./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.838Z hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log Writer 0"] XXX size with serialized JSON: 392288 ./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.833Z hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log Writer 0"] XXX with JSON object: 779536{code} How significant do you think the speed improvements is? Is it worth it? Based on my own measurements the JSON serialization wasn't that slow with the queries I used (about 10-15 ms). was (Author: amagyar): Hey [~rajesh.balamohan], I made a patch for this, where json serialization happens on the logWriter's thread. The event is only built partially up front with a json object (not the serialized string) and the conversion happens right before writing out the event. However events like this takes up more space in memory as before. About twice as much. The queue has a default max capacity of 64 so this might not be a problem. {code:java} ./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.838Z hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log Writer 0"] XXX size with serialized JSON: 392288 ./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.833Z hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log Writer 0"] XXX with JSON object: 779536{code} > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 > AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124953#comment-17124953 ] Attila Magyar commented on HIVE-23277: -- Hey [~rajesh.balamohan], I made a patch for this, where json serialization happens on the logWriter's thread. The event is only built partially up front with a json object (not the serialized string) and the conversion happens right before writing out the event. However events like this takes up more space in memory as before. About twice as much. The queue has a default max capacity of 64 so this might not be a problem. {code:java} ./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.838Z hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log Writer 0"] XXX size with serialized JSON: 392288 ./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.833Z hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log Writer 0"] XXX with JSON object: 779536{code} > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 > AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23277: - Status: Patch Available (was: Open) > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 > AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23277: - Attachment: HIVE-23277.1.patch > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 > AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread
[ https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar reassigned HIVE-23277: Assignee: Attila Magyar > HiveProtoLogger should carry out JSON conversion in its own thread > -- > > Key: HIVE-23277 > URL: https://issues.apache.org/jira/browse/HIVE-23277 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Attila Magyar >Priority: Minor > Attachments: Screenshot 2020-04-23 at 11.27.42 AM.png > > > !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HIVE-16220) Memory leak when creating a table using location and NameNode in HA
[ https://issues.apache.org/jira/browse/HIVE-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123732#comment-17123732 ] Attila Magyar commented on HIVE-16220: -- Do you need to constantly run multiple create table statements to reproduce this or only one? > Memory leak when creating a table using location and NameNode in HA > --- > > Key: HIVE-16220 > URL: https://issues.apache.org/jira/browse/HIVE-16220 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.2.1, 3.0.0 > Environment: HDP-2.4.0.0 > HDP-3.1.0.0 >Reporter: Angel Alvarez Pascua >Priority: Major > > The following simple DDL > CREATE TABLE `test`(`field` varchar(1)) LOCATION > 'hdfs://benderHA/apps/hive/warehouse/test' > ends up generating a huge memory leak in the HiveServer2 service. > After two weeks without a restart, the service stops suddenly because of > OutOfMemory errors. > This only happens when we're in an environment in which the NameNode is in > HA, otherwise, nothing (so weird) happens. If the location clause is not > present, everything is also fine. > It seems, multiples instances of Hadoop configuration are created when we're > in an HA environment: > > 2.618 instances of "org.apache.hadoop.conf.Configuration", loaded by > "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" > occupy 350.263.816 (81,66%) bytes. These instances are referenced from one > instance of "java.util.HashMap$Node[]", > loaded by "" > > 5.216 instances of "org.apache.hadoop.conf.Configuration", loaded by > "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" > occupy 699.901.416 (87,32%) bytes. These instances are referenced from one > instance of "java.util.HashMap$Node[]", > loaded by "" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23580) deleteOnExit set is not cleaned up, causing memory pressure
[ https://issues.apache.org/jira/browse/HIVE-23580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23580: - Status: Patch Available (was: Open) > deleteOnExit set is not cleaned up, causing memory pressure > --- > > Key: HIVE-23580 > URL: https://issues.apache.org/jira/browse/HIVE-23580 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23580.1.patch > > > removeScratchDir doesn't always calls cancelDeleteOnExit() on context::clear -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HIVE-23580) deleteOnExit set is not cleaned up, causing memory pressure
[ https://issues.apache.org/jira/browse/HIVE-23580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Magyar updated HIVE-23580: - Attachment: (was: HIVE-23580.1.patch) > deleteOnExit set is not cleaned up, causing memory pressure > --- > > Key: HIVE-23580 > URL: https://issues.apache.org/jira/browse/HIVE-23580 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 4.0.0 >Reporter: Attila Magyar >Assignee: Attila Magyar >Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-23580.1.patch > > > removeScratchDir doesn't always calls cancelDeleteOnExit() on context::clear -- This message was sent by Atlassian Jira (v8.3.4#803005)