[jira] [Assigned] (HIVE-23757) Pushing TopN Key operator through MAPJOIN

2022-10-21 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-23757:


Assignee: (was: Attila Magyar)

> Pushing TopN Key operator through MAPJOIN
> -
>
> Key: HIVE-23757
> URL: https://issues.apache.org/jira/browse/HIVE-23757
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> So far only MERGEJOIN + JOIN cases are handled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-25519) Knox homepage service UI links missing when CM intermittently unavailable

2021-09-14 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-25519.
--
Resolution: Invalid

Wrong project.

> Knox homepage service UI links missing when CM intermittently unavailable
> -
>
> Key: HIVE-25519
> URL: https://issues.apache.org/jira/browse/HIVE-25519
> Project: Hive
>  Issue Type: Task
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25519) Knox homepage service UI links missing when CM intermittently unavailable

2021-09-14 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-25519:



> Knox homepage service UI links missing when CM intermittently unavailable
> -
>
> Key: HIVE-25519
> URL: https://issues.apache.org/jira/browse/HIVE-25519
> Project: Hive
>  Issue Type: Task
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-22960) Approximate TopN Key Operator

2021-07-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-22960.
--
Resolution: Won't Fix

> Approximate TopN Key Operator
> -
>
> Key: HIVE-22960
> URL: https://issues.apache.org/jira/browse/HIVE-22960
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: Screen Shot 2020-03-02 at 4.55.46 PM.png
>
>
> "Different from other operators, top n operator demonstrates the notable 
> “long tail” characteristics which makes it distinct from other operators like 
> join, group by and etc. will saturate very quickly. Update is pretty frequent 
> at the beginning and then diverges to a very slow update frequently.
> The approximation can be implemented in two ways: one way is to stop the 
> array/heap update after certain percentage of the data is been read, for 
> example, 10% or 20%, if we know the table size. The other way is to set a 
> frequency threshold of the array/heap update. After the threshold is met, 
> then stop the top n processing"
> [~rzhappy]
> !Screen Shot 2020-03-02 at 4.55.46 PM.png|width=688,height=468!
> Y: number of updates in every 100msec



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24338) HPL/SQL missing features

2021-07-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-24338.
--
Resolution: Fixed

> HPL/SQL missing features
> 
>
> Key: HIVE-24338
> URL: https://issues.apache.org/jira/browse/HIVE-24338
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are some features which are supported by Oracle's PL/SQL but not by 
> HPL/SQL. This Jira is about to prioritize them and investigate the 
> feasibility of the implementation.
>  * ForAll syntax like: ForAll j in i..j save exceptions
>  * Bulk collect: : Fetch cursor Bulk Collect Into list Limit n;
>  * Type declartion: Type T_cab is TABLE of
>  * TABLE datatype
>  * GOTO and LABEL
>  * Global variables like $$PLSQL_UNIT and others
>  * Named parameters func(name1 => value1, name2 => value2);
>  * Built in functions: trunc, lpad, to_date, ltrim, rtrim, sysdate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24427) HPL/SQL improvements

2021-07-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-24427.
--
Resolution: Fixed

> HPL/SQL improvements
> 
>
> Key: HIVE-24427
> URL: https://issues.apache.org/jira/browse/HIVE-24427
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: epic
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen

2021-06-24 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-25242.
--
Resolution: Fixed

>  Query performs extremely slow with hive.vectorized.adaptor.usage.mode = 
> chosen
> ---
>
> Key: HIVE-25242
> URL: https://issues.apache.org/jira/browse/HIVE-25242
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
> vectorized through the vectorized adaptor.
> Queries like this one, performs very slowly because the concat is not chosen 
> to be vectorized.
> {code:java}
> select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
> between to_date('2018-12-01') and to_date('2021-03-01');  {code}
> The patch whitelists the concat udf so that it uses the vectorized adaptor in 
> chosen mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen

2021-06-21 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-25242:
-
Description: 
If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
vectorized through the vectorized adaptor.

Queries like this one, performs very slowly because the concat is not chosen to 
be vectorized.
{code:java}
select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
between to_date('2018-12-01') and to_date('2021-03-01');  {code}
The patch whitelists the concat udf so that it uses the vectorized adaptor in 
chosen mode.

  was:
If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
vectorized through the vectorized adaptor.

Queries like this one, performs very slowly because the concat is not chosen to 
be vectorized.
{code:java}
select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
between to_date('2018-12-01') and to_date('2021-03-01');  {code}


>  Query performs extremely slow with hive.vectorized.adaptor.usage.mode = 
> chosen
> ---
>
> Key: HIVE-25242
> URL: https://issues.apache.org/jira/browse/HIVE-25242
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
> vectorized through the vectorized adaptor.
> Queries like this one, performs very slowly because the concat is not chosen 
> to be vectorized.
> {code:java}
> select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
> between to_date('2018-12-01') and to_date('2021-03-01');  {code}
> The patch whitelists the concat udf so that it uses the vectorized adaptor in 
> chosen mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen

2021-06-14 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-25242:
-
Description: 
If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
vectorized through the vectorized adaptor.

Queries like this one, performs very slowly because the concat is not chosen to 
be vectorized.
{code:java}
select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
between to_date('2018-12-01') and to_date('2021-03-01');  {code}

>  Query performs extremely slow with hive.vectorized.adaptor.usage.mode = 
> chosen
> ---
>
> Key: HIVE-25242
> URL: https://issues.apache.org/jira/browse/HIVE-25242
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> If hive.vectorized.adaptor.usage.mode is set to chosen only certain UDFS are 
> vectorized through the vectorized adaptor.
> Queries like this one, performs very slowly because the concat is not chosen 
> to be vectorized.
> {code:java}
> select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
> between to_date('2018-12-01') and to_date('2021-03-01');  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25242) Query performs extremely slow with hive.vectorized.adaptor.usage.mode = chosen

2021-06-14 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-25242:
-
Environment: (was: If hive.vectorized.adaptor.usage.mode is set to 
chosen only certain UDFS are vectorized through the vectorized adaptor.

Queries like this one, performs very slowly because the concat is not chosen to 
be vectorized.
{code:java}
select count(*) from tbl where to_date(concat(year, '-', month, '-', day)) 
between to_date('2018-12-01') and to_date('2021-03-01');  {code})

>  Query performs extremely slow with hive.vectorized.adaptor.usage.mode = 
> chosen
> ---
>
> Key: HIVE-25242
> URL: https://issues.apache.org/jira/browse/HIVE-25242
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25223) Select with limit returns no rows on non native table

2021-06-09 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-25223:



> Select with limit returns no rows on non native table
> -
>
> Key: HIVE-25223
> URL: https://issues.apache.org/jira/browse/HIVE-25223
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> Str:
> {code:java}
> CREATE EXTERNAL TABLE hht (key string, value int) 
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val")
> TBLPROPERTIES ("hbase.table.name" = "hht", "hbase.mapred.output.outputtable" 
> = "hht");
> insert into hht select uuid(), cast((rand() * 100) as int);
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
> insert into hht select uuid(), cast((rand() * 100) as int) from hht;
>  set hive.fetch.task.conversion=none;
>  select * from hht limit 10;
> +--++
> | hht.key  | hht.value  |
> +--++
> +--++
> No rows selected (5.22 seconds) {code}
>  
> This is caused by GlobalLimitOptimizer. The table directory is always empty 
> with a non native table since the data is not managed by hive (but hbase in 
> this case).
> The optimizer scans the directory and sets the file list to an empty list.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25033) HPL/SQL thrift call fails when returning null

2021-04-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-25033.
--
Resolution: Fixed

> HPL/SQL thrift call fails when returning null
> -
>
> Key: HIVE-25033
> URL: https://issues.apache.org/jira/browse/HIVE-25033
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25033) HPL/SQL thrift call fails when returning null

2021-04-20 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-25033:



> HPL/SQL thrift call fails when returning null
> -
>
> Key: HIVE-25033
> URL: https://issues.apache.org/jira/browse/HIVE-25033
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline

2021-04-14 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-25004.
--
Resolution: Fixed

> HPL/SQL subsequent statements are failing after typing a malformed input in 
> beeline
> ---
>
> Key: HIVE-25004
> URL: https://issues.apache.org/jira/browse/HIVE-25004
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> An error signal is stuck after evaluating the first expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode

2021-04-14 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24997:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> HPL/SQL udf doesn't work in tez container mode
> --
>
> Key: HIVE-24997
> URL: https://issues.apache.org/jira/browse/HIVE-24997
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Since  HIVE-24230  it assumes the UDF is evaluated on HS2 which is not true 
> in general. The SessionState is only available at compile time evaluation but 
> later on a new interpreter should be instantiated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24427) HPL/SQL improvements

2021-04-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24427 started by Attila Magyar.

> HPL/SQL improvements
> 
>
> Key: HIVE-24427
> URL: https://issues.apache.org/jira/browse/HIVE-24427
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: epic
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline

2021-04-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-25004:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: Bug)

> HPL/SQL subsequent statements are failing after typing a malformed input in 
> beeline
> ---
>
> Key: HIVE-25004
> URL: https://issues.apache.org/jira/browse/HIVE-25004
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> An error signal is stuck after evaluating the first expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode

2021-04-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24997:
-
Status: Patch Available  (was: Open)

> HPL/SQL udf doesn't work in tez container mode
> --
>
> Key: HIVE-24997
> URL: https://issues.apache.org/jira/browse/HIVE-24997
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Since  HIVE-24230  it assumes the UDF is evaluated on HS2 which is not true 
> in general. The SessionState is only available at compile time evaluation but 
> later on a new interpreter should be instantiated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline

2021-04-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-25004:


Assignee: Attila Magyar

> HPL/SQL subsequent statements are failing after typing a malformed input in 
> beeline
> ---
>
> Key: HIVE-25004
> URL: https://issues.apache.org/jira/browse/HIVE-25004
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> An error signal is stuck after evaluating the first expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25004) HPL/SQL subsequent statements are failing after typing a malformed input in beeline

2021-04-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-25004:
-
Description: An error signal is stuck after evaluating the first expression.

> HPL/SQL subsequent statements are failing after typing a malformed input in 
> beeline
> ---
>
> Key: HIVE-25004
> URL: https://issues.apache.org/jira/browse/HIVE-25004
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> An error signal is stuck after evaluating the first expression.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24383) Add Table type to HPL/SQL

2021-04-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-24383.
--
Resolution: Fixed

> Add Table type to HPL/SQL
> -
>
> Key: HIVE-24383
> URL: https://issues.apache.org/jira/browse/HIVE-24383
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode

2021-04-09 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24997:
-
Description: Since  HIVE-24230  it assumes the UDF is evaluated on HS2 
which is not true in general. The SessionState is only available at compile 
time evaluation but later on a new interpreter should be instantiated.

> HPL/SQL udf doesn't work in tez container mode
> --
>
> Key: HIVE-24997
> URL: https://issues.apache.org/jira/browse/HIVE-24997
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> Since  HIVE-24230  it assumes the UDF is evaluated on HS2 which is not true 
> in general. The SessionState is only available at compile time evaluation but 
> later on a new interpreter should be instantiated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24997) HPL/SQL udf doesn't work in tez container mode

2021-04-09 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24997:



> HPL/SQL udf doesn't work in tez container mode
> --
>
> Key: HIVE-24997
> URL: https://issues.apache.org/jira/browse/HIVE-24997
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24315) Improve validation and error handling in HPL/SQL

2021-03-24 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-24315.
--
Resolution: Fixed

> Improve validation and error handling in HPL/SQL 
> -
>
> Key: HIVE-24315
> URL: https://issues.apache.org/jira/browse/HIVE-24315
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> There are some known issues that need to be fixed. For example it seems that 
> arity of a function is not checked when calling it, and same is true for 
> parameter types. Calling an undefined function is evaluated to null and 
> sometimes it seems that incorrect syntax is silently ignored. 
> In cases like this a helpful error message would be expected, thought we 
> should also consider how PL/SQL works and maintain compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24838) Reduce FS creation in Warehouse::getDnsPath for object stores

2021-03-17 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24838:


Assignee: Attila Magyar

> Reduce FS creation in Warehouse::getDnsPath for object stores
> -
>
> Key: HIVE-24838
> URL: https://issues.apache.org/jira/browse/HIVE-24838
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Major
> Attachments: Screenshot 2021-03-02 at 11.09.01 AM.png
>
>
> [https://github.com/apache/hive/blob/master/standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/Warehouse.java#L143]
>  
> Warehouse::getDnsPath gets invoked from multiple places (e.g getDatabase() 
> etc). In certain cases like dynamic partition loads, lot of calls FS 
> instantiation calls can be avoided for object stores.
> It would be good to check for BlobStorages and if so, it should be possible 
> to avoid FS creation.
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/BlobStorageUtils.java#L33]
>  
> !Screenshot 2021-03-02 at 11.09.01 AM.png|width=372,height=296!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24315) Improve validation and error handling in HPL/SQL

2021-03-10 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24315:
-
Summary: Improve validation and error handling in HPL/SQL   (was: Improve 
validation and semantic analysis in HPL/SQL )

> Improve validation and error handling in HPL/SQL 
> -
>
> Key: HIVE-24315
> URL: https://issues.apache.org/jira/browse/HIVE-24315
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are some known issues that need to be fixed. For example it seems that 
> arity of a function is not checked when calling it, and same is true for 
> parameter types. Calling an undefined function is evaluated to null and 
> sometimes it seems that incorrect syntax is silently ignored. 
> In cases like this a helpful error message would be expected, thought we 
> should also consider how PL/SQL works and maintain compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24813) thrift regeneration is failing with cannot find symbol TABLE_IS_CTAS

2021-02-23 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24813:



> thrift regeneration is failing with cannot find symbol TABLE_IS_CTAS
> 
>
> Key: HIVE-24813
> URL: https://issues.apache.org/jira/browse/HIVE-24813
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> {code:java}
> [ERROR] 
> /Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java:[2145,34]
>  cannot find symbol
> [ERROR]   symbol:   variable TABLE_IS_CTAS
> [ERROR]   location: class org.apache.hadoop.hive.metastore.HMSHandler
> [ERROR] 
> /Users/amagyar/development/hive/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetastoreDefaultTransformer.java:[591,58]
>  cannot find symbol
> [ERROR]   symbol:   variable TABLE_IS_CTAS
> [ERROR]   location: class 
> org.apache.hadoop.hive.metastore.MetastoreDefaultTransformer
> [ERROR] -> [Help 1] {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24715) Increase bucketId range

2021-02-18 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24715:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Bucket Id range increase.pdf
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24715) Increase bucketId range

2021-02-15 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24715:
-
Attachment: Bucket Id range increase.pdf

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Bucket Id range increase.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24715) Increase bucketId range

2021-02-15 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24715:
-
Status: Patch Available  (was: Open)

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-02-02 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277208#comment-17277208
 ] 

Attila Magyar edited comment on HIVE-24584 at 2/2/21, 3:39 PM:
---

Hi Syed, thanks for trying it out.

I think it should have worked that way. Anyway I created a test that reproduces 
the issue, depending on what value you set for IS_METASTORE_REMOTE.
{code:java}
$ cd itests/hive-unit
$ mvn test -Dtest=TestMsck
$ grep "Failed to deserialize the expression" target/tmp/log/hive.log {code}
The test doesn't fail in any case (that's a testing issue) but you can see the 
Kryo related error in the log when you run it with IS_METASTORE_REMOTE=true.

[^msckrepro.patch]

 


was (Author: amagyar):
Hi Syed, thanks for trying it out.

I think it should have worked that way. Anyway I created a test that reproduces 
the issue, depending on what value you set for IS_METASTORE_REMOTE.
{code:java}
$ cd itests/hive-unit
$ mvn test -Dtest=TestMsck
$ grep "Failed to deserialize the expression" target/tmp/log/hive.log {code}
The test doesn't fail in any case but you can see (that's a testing issue) but 
you can see the Kry related error when you run it with IS_METASTORE_REMOTE=true.

[^msckrepro.patch]

 

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: msckrepro.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-02-02 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24584:
-
Attachment: msckrepro.patch

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: msckrepro.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-02-02 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277208#comment-17277208
 ] 

Attila Magyar commented on HIVE-24584:
--

Hi Syed, thanks for trying it out.

I think it should have worked that way. Anyway I created a test that reproduces 
the issue, depending on what value you set for IS_METASTORE_REMOTE.
{code:java}
$ cd itests/hive-unit
$ mvn test -Dtest=TestMsck
$ grep "Failed to deserialize the expression" target/tmp/log/hive.log {code}
The test doesn't fail in any case but you can see (that's a testing issue) but 
you can see the Kry related error when you run it with IS_METASTORE_REMOTE=true.

[^msckrepro.patch]

 

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Attachments: msckrepro.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276424#comment-17276424
 ] 

Attila Magyar edited comment on HIVE-24715 at 2/1/21, 6:02 PM:
---

Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 and TEZ-4130 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by max_statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

The change is backward compatible with the prior implementation while upsizing 
the range wouldn't.

 


was (Author: amagyar):
Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 and TEZ-4130 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276424#comment-17276424
 ] 

Attila Magyar edited comment on HIVE-24715 at 2/1/21, 4:04 PM:
---

Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 


was (Author: amagyar):
Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows.
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276424#comment-17276424
 ] 

Attila Magyar edited comment on HIVE-24715 at 2/1/21, 4:04 PM:
---

Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 and TEZ-4130 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 


was (Author: amagyar):
Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows. See TEZ-4271 for more context.

 
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276424#comment-17276424
 ] 

Attila Magyar commented on HIVE-24715:
--

Currently the bucketId field is stored in 12 bits. When TEZ starts more tasks 
than 4095 it overflows.
{code:java}
* Represents format of "bucket" property in Hive 3.0.
* top 3 bits - version code.
* next 1 bit - reserved for future
* next 12 bits - the bucket ID
* next 4 bits reserved for future {code}
Simply increasing the range would have an undesired effect on compaction 
efficiency. If hundred thousands of tasks are started than we would and up 
having hundred thousands of files and since compaction works across statement 
ids it wouldn't merge those.

Instead of increasing the range, the proposed solution is to let bucket id 
overflow into the statement id, so that the 4096th bucket will bucket_0 and it 
will look like it was created by statement_id+1.

This way compaction will be able to merge the same buckets that belong to 
different statements.

 

 

> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24715) Increase bucketId range

2021-02-01 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24715:



> Increase bucketId range
> ---
>
> Key: HIVE-24715
> URL: https://issues.apache.org/jira/browse/HIVE-24715
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-02-01 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17276184#comment-17276184
 ] 

Attila Magyar commented on HIVE-24584:
--

Hi [~srahman], did you manage to reproduce it?

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24696) Drop procedure and drop package syntax for HPLSQL

2021-01-28 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar resolved HIVE-24696.
--
  Assignee: Attila Magyar
Resolution: Fixed

Fixed as part of HIVE-24346.

> Drop procedure and drop package syntax for HPLSQL
> -
>
> Key: HIVE-24696
> URL: https://issues.apache.org/jira/browse/HIVE-24696
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-28 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273439#comment-17273439
 ] 

Attila Magyar commented on HIVE-24584:
--

[~srahman]

If you have a hive with a remote, non embedded HMS then:
1.) create external table t1 (c1 int, c2 int) partitioned by (c3 int) location 
'hdfs:///warehouse/tablespace/external/hive/t1';

2.)
insert into t1 partition(c3=1) values (1,1);
insert into t1 partition(c3=2) values (2,2);
insert into t1 partition(c3=3) values (3,3);

3.) hdfs dfs -rm -r hdfs:///warehouse/tablespace/external/hive/t1/c3=3

4.) msck repair table t1 sync partitions;

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-13 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263996#comment-17263996
 ] 

Attila Magyar commented on HIVE-24584:
--

Hi [~srahman],

Thanks for the input. My understanding is that PartitionExpressionForMetastore 
is the default value of "metastore.expression.proxy" (In 
HiveConf.java/MetaStoreConf.java).

Msck attempts to override this by creating a HiveMetaStoreClient with a 
modified config object. However unless HS2 and HMS are running inside the same 
process (or Msck is called within HMS via the periodically running 
PartitionManagementTask) this doesn't work.

In case of a remote HMS, Msck should have called msc.setMetaConf() or something 
that modifies the config via thrift.

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory

2021-01-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24625:
-
Description: 
MetastoreDefaultTransformer in HMS converts a managed non transactional table 
to external table. MoveTask still uses the managed path when loading the data, 
resulting an always empty table.
{code:java}
create table tbl1 TBLPROPERTIES ('transactional'='false') as select * from 
other;{code}
After the conversion the table location points to an external directory:
Location: | 
hdfs://c670-node2.coelab.cloudera.com:8020/warehouse/tablespace/external/hive/tbl1
Move task uses the managed location"
{code:java}
INFO : Moving data to directory 
hdfs://...:8020/warehouse/tablespace/managed/hive/tbl1 from 
hdfs://...:8020/warehouse/tablespace/managed/hive/.hive-staging_hive_2021-01-05_16-10-39_973_41005081081760609-4/-ext-1000
 {code}

> CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect 
> directory
> -
>
> Key: HIVE-24625
> URL: https://issues.apache.org/jira/browse/HIVE-24625
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> MetastoreDefaultTransformer in HMS converts a managed non transactional table 
> to external table. MoveTask still uses the managed path when loading the 
> data, resulting an always empty table.
> {code:java}
> create table tbl1 TBLPROPERTIES ('transactional'='false') as select * from 
> other;{code}
> After the conversion the table location points to an external directory:
> Location: | 
> hdfs://c670-node2.coelab.cloudera.com:8020/warehouse/tablespace/external/hive/tbl1
> Move task uses the managed location"
> {code:java}
> INFO : Moving data to directory 
> hdfs://...:8020/warehouse/tablespace/managed/hive/tbl1 from 
> hdfs://...:8020/warehouse/tablespace/managed/hive/.hive-staging_hive_2021-01-05_16-10-39_973_41005081081760609-4/-ext-1000
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24625) CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect directory

2021-01-12 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24625:



> CTAS with TBLPROPERTIES ('transactional'='false') loads data into incorrect 
> directory
> -
>
> Key: HIVE-24625
> URL: https://issues.apache.org/jira/browse/HIVE-24625
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-05 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258905#comment-17258905
 ] 

Attila Magyar commented on HIVE-24584:
--

cc: [~srahman]

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24584 started by Attila Magyar.

> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24584) IndexOutOfBoundsException from Kryo when running msck repair

2021-01-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24584:



> IndexOutOfBoundsException from Kryo when running msck repair
> 
>
> Key: HIVE-24584
> URL: https://issues.apache.org/jira/browse/HIVE-24584
> Project: Hive
>  Issue Type: Bug
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> The following exception is coming when running "msck repair table t1 sync 
> partitions".
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
> at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
> at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
> at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
> ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
> at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
>  [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2021-01-04 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258298#comment-17258298
 ] 

Attila Magyar commented on HIVE-23851:
--

Hey [~srahman], it looks like the same exception can be thrown from 

filterPartitionsByExpr() as well. This patch addresses only 
convertExprToFilter().
{code:java}
java.lang.IndexOutOfBoundsException: Index: 97, Size: 0
at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_232]
at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_232]
at 
org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
 ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:834)
 ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:684) 
~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
 ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:814)
 ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
 ~[hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:116)
 [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.filterPartitionsByExpr(PartitionExpressionForMetastore.java:88)
 [hive-exec-3.1.3000.7.2.7.0-144.jar:3.1.3000.7.2.7.0-SNAPSHOT] {code}
I see the original issue was fixed by skipping the deserialization and 
returning the exprBytes as string when the deserialization would fail. But 

filterPartitionsByExpr returns a boolean. Would returning false be an 
acceptable fix when deserialization fails in

filterPartitionsByExpr?

 

 

cc: [~kgyrtkirk]

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartF

[jira] [Updated] (HIVE-24383) Add Table type to HPL/SQL

2020-11-25 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24383:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: Improvement)

> Add Table type to HPL/SQL
> -
>
> Key: HIVE-24383
> URL: https://issues.apache.org/jira/browse/HIVE-24383
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL

2020-11-25 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24315:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: Improvement)

> Improve validation and semantic analysis in HPL/SQL 
> 
>
> Key: HIVE-24315
> URL: https://issues.apache.org/jira/browse/HIVE-24315
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are some known issues that need to be fixed. For example it seems that 
> arity of a function is not checked when calling it, and same is true for 
> parameter types. Calling an undefined function is evaluated to null and 
> sometimes it seems that incorrect syntax is silently ignored. 
> In cases like this a helpful error message would be expected, thought we 
> should also consider how PL/SQL works and maintain compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24346) Store HPL/SQL packages into HMS

2020-11-25 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24346:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: New Feature)

> Store HPL/SQL packages into HMS
> ---
>
> Key: HIVE-24346
> URL: https://issues.apache.org/jira/browse/HIVE-24346
> Project: Hive
>  Issue Type: Sub-task
>  Components: hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-11-25 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24217:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: Bug)

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-11-25 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24230:
-
Parent: HIVE-24427
Issue Type: Sub-task  (was: Bug)

> Integrate HPL/SQL into HiveServer2
> --
>
> Key: HIVE-24230
> URL: https://issues.apache.org/jira/browse/HIVE-24230
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> HPL/SQL is a standalone command line program that can store and load scripts 
> from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
> depends on Hive and not the other way around.
> Changing the dependency order between HPL/SQL and HiveServer would open up 
> some possibilities which are currently not feasable to implement. For example 
> one might want to use a third party SQL tool to run selects on stored 
> procedure (or rather function in this case) outputs.
> {code:java}
> SELECT * from myStoredProcedure(1, 2); {code}
> HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
> work with the current architecture.
> Another important factor is performance. Declarative SQL commands are sent to 
> Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
> and use HiveSever’s internal API for compilation and execution.
> The third factor is that existing tools like Beeline or Hue cannot be used 
> with HPL/SQL since it has its own, separated CLI.
>  
> To make it easier to implement, we keep things separated in the inside at 
> first, by introducing a hive session level JDBC parameter.
> {code:java}
> jdbc:hive2://localhost:1/default;hplsqlMode=true {code}
>  
> The hplsqlMode indicates that we are in procedural SQL mode where the user 
> can create and call stored procedures. HPLSQL allows you to write any kind of 
> procedural statement at the top level. This patch doesn't limit this but it 
> might be better to eventually restrict what statements are allowed outside of 
> stored procedures.
>  
> Since HPLSQL and Hive are running in the same process there is no need to use 
> the JDBC driver between them. The patch adds an abstraction with 2 different 
> implementations, one for executing queries on JDBC (for keeping the existing 
> behaviour) and another one for directly calling Hive's compiler. In HPLSQL 
> mode the latter is used.
> In the inside a new operation (HplSqlOperation) and operation type 
> (PROCEDURAL_SQL) was added which works similar to the SQLOperation but it 
> uses the hplsql interpreter to execute arbitrary scripts. This operation 
> might spawns new SQLOpertions.
> For example consider the following statement:
> {code:java}
> FOR i in 1..10 LOOP   
>   SELECT * FROM table 
> END LOOP;{code}
> We send this to beeline while we'er in hplsql mode. Hive will create a hplsql 
> interpreter and store it in the session state. A new HplSqlOperation is 
> created to run the script on the interpreter.
> HPLSQL knows how to execute the for loop, but i'll call Hive to run the 
> select expression. The HplSqlOperation is notified when the select reads a 
> row and accumulates the rows into a RowSet (memory consumption need to be 
> considered here) which can be retrieved via thrift from the client side.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24427) HPL/SQL improvements

2020-11-25 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24427:



> HPL/SQL improvements
> 
>
> Key: HIVE-24427
> URL: https://issues.apache.org/jira/browse/HIVE-24427
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24383) Add Table type to HPL/SQL

2020-11-13 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24383:



> Add Table type to HPL/SQL
> -
>
> Key: HIVE-24383
> URL: https://issues.apache.org/jira/browse/HIVE-24383
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24346) Store HPL/SQL packages into HMS

2020-11-02 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24346:



> Store HPL/SQL packages into HMS
> ---
>
> Key: HIVE-24346
> URL: https://issues.apache.org/jira/browse/HIVE-24346
> Project: Hive
>  Issue Type: New Feature
>  Components: hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24338) HPL/SQL missing features

2020-10-30 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17223661#comment-17223661
 ] 

Attila Magyar commented on HIVE-24338:
--

The ForAll is mainly an optimization to avoid sending statements line by line 
to the DB. Named parameters are not that widely used as far as I know.

Goto and label is mostly used for error handling. Bulk collect can be useful 
for selecting into an array but it requires an Table or Array type first and 
type declarations first.

> HPL/SQL missing features
> 
>
> Key: HIVE-24338
> URL: https://issues.apache.org/jira/browse/HIVE-24338
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are some features which are supported by Oracle's PL/SQL but not by 
> HPL/SQL. This Jira is about to prioritize them and investigate the 
> feasibility of the implementation.
>  * ForAll syntax like: ForAll j in i..j save exceptions
>  * Bulk collect: : Fetch cursor Bulk Collect Into list Limit n;
>  * Type declartion: Type T_cab is TABLE of
>  * TABLE datatype
>  * GOTO and LABEL
>  * Global variables like $$PLSQL_UNIT and others
>  * Named parameters func(name1 => value1, name2 => value2);
>  * Built in functions: trunc, lpad, to_date, ltrim, rtrim, sysdate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24338) HPL/SQL missing features

2020-10-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24338:



> HPL/SQL missing features
> 
>
> Key: HIVE-24338
> URL: https://issues.apache.org/jira/browse/HIVE-24338
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are some features which are supported by Oracle's PL/SQL but not by 
> HPL/SQL. This Jira is about to prioritize them and investigate the 
> feasibility of the implementation.
>  * ForAll syntax like: ForAll j in i..j save exceptions
>  * Bulk collect: : Fetch cursor Bulk Collect Into list Limit n;
>  * Type declartion: Type T_cab is TABLE of
>  * TABLE datatype
>  * GOTO and LABEL
>  * Global variables like $$PLSQL_UNIT and others
>  * Named parameters func(name1 => value1, name2 => value2);
>  * Built in functions: trunc, lpad, to_date, ltrim, rtrim, sysdate



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-10-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24230:
-
Description: 
HPL/SQL is a standalone command line program that can store and load scripts 
from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
depends on Hive and not the other way around.

Changing the dependency order between HPL/SQL and HiveServer would open up some 
possibilities which are currently not feasable to implement. For example one 
might want to use a third party SQL tool to run selects on stored procedure (or 
rather function in this case) outputs.
{code:java}
SELECT * from myStoredProcedure(1, 2); {code}
HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
work with the current architecture.

Another important factor is performance. Declarative SQL commands are sent to 
Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
and use HiveSever’s internal API for compilation and execution.

The third factor is that existing tools like Beeline or Hue cannot be used with 
HPL/SQL since it has its own, separated CLI.

 

To make it easier to implement, we keep things separated in the inside at 
first, by introducing a hive session level JDBC parameter.
{code:java}
jdbc:hive2://localhost:1/default;hplsqlMode=true {code}
 

The hplsqlMode indicates that we are in procedural SQL mode where the user can 
create and call stored procedures. HPLSQL allows you to write any kind of 
procedural statement at the top level. This patch doesn't limit this but it 
might be better to eventually restrict what statements are allowed outside of 
stored procedures.

 

Since HPLSQL and Hive are running in the same process there is no need to use 
the JDBC driver between them. The patch adds an abstraction with 2 different 
implementations, one for executing queries on JDBC (for keeping the existing 
behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode 
the latter is used.

In the inside a new operation (HplSqlOperation) and operation type 
(PROCEDURAL_SQL) was added which works similar to the SQLOperation but it uses 
the hplsql interpreter to execute arbitrary scripts. This operation might 
spawns new SQLOpertions.

For example consider the following statement:
{code:java}
FOR i in 1..10 LOOP   
  SELECT * FROM table 
END LOOP;{code}
We send this to beeline while we'er in hplsql mode. Hive will create a hplsql 
interpreter and store it in the session state. A new HplSqlOperation is created 
to run the script on the interpreter.

HPLSQL knows how to execute the for loop, but i'll call Hive to run the select 
expression. The HplSqlOperation is notified when the select reads a row and 
accumulates the rows into a RowSet (memory consumption need to be considered 
here) which can be retrieved via thrift from the client side.

 

  was:
HPL/SQL is a standalone command line program that can store and load scripts 
from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
depends on Hive and not the other way around.

Changing the dependency order between HPL/SQL and HiveServer would open up some 
possibilities which are currently not feasable to implement. For example one 
might want to use a third party SQL tool to run selects on stored procedure (or 
rather function in this case) outputs.
{code:java}
SELECT * from myStoredProcedure(1, 2); {code}
HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
work with the current architecture.

Another important factor is performance. Declarative SQL commands are sent to 
Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
and use HiveSever’s internal API for compilation and execution.

The third factor is that existing tools like Beeline or Hue cannot be used with 
HPL/SQL since it has its own, separated CLI.

 

To make it easier to implement, we keep things separated in the inside at 
first, by introducing a hive session level JDBC parameter.
{code:java}
jdbc:hive2://localhost:1/default;hplsqlMode=true {code}
 

The hplsqlMode indicates that we are in procedural SQL mode where the user can 
create and call stored procedures. HPLSQL allows you to write any kind of 
procedural statement at the top level. This patch doesn't limit this but it 
might be better to eventually restrict what statements are allowed outside of 
stored procedures.

 

Since HPLSQL and Hive are running in the same process there is no need to use 
the JDBC driver between them. The patch adds an abstraction with 2 different 
implementations, one for executing queries on JDBC (for keeping the existing 
behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode 
the latter is used.

In the inside a new operation (HplSqlOperation) and operation type 
(PROCEDURAL_SQL) was added which works similar to the SQLOpe

[jira] [Updated] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-10-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24230:
-
Description: 
HPL/SQL is a standalone command line program that can store and load scripts 
from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
depends on Hive and not the other way around.

Changing the dependency order between HPL/SQL and HiveServer would open up some 
possibilities which are currently not feasable to implement. For example one 
might want to use a third party SQL tool to run selects on stored procedure (or 
rather function in this case) outputs.
{code:java}
SELECT * from myStoredProcedure(1, 2); {code}
HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
work with the current architecture.

Another important factor is performance. Declarative SQL commands are sent to 
Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
and use HiveSever’s internal API for compilation and execution.

The third factor is that existing tools like Beeline or Hue cannot be used with 
HPL/SQL since it has its own, separated CLI.

 

To make it easier to implement, we keep things separated in the inside at 
first, by introducing a hive session level JDBC parameter.
{code:java}
jdbc:hive2://localhost:1/default;hplsqlMode=true {code}
 

The hplsqlMode indicates that we are in procedural SQL mode where the user can 
create and call stored procedures. HPLSQL allows you to write any kind of 
procedural statement at the top level. This patch doesn't limit this but it 
might be better to eventually restrict what statements are allowed outside of 
stored procedures.

 

Since HPLSQL and Hive are running in the same process there is no need to use 
the JDBC driver between them. The patch adds an abstraction with 2 different 
implementations, one for executing queries on JDBC (for keeping the existing 
behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode 
the latter is used.

In the inside a new operation (HplSqlOperation) and operation type 
(PROCEDURAL_SQL) was added which works similar to the SQLOperation but it uses 
the hplsql interpreter to execute arbitrary scripts. This operation might 
spawns new SQLOpertions.

For example consider the following statement:
{code:java}
FOR i in 1..10 LOOP   
  SELECT * FROM table 
END LOOP;{code}
We send this to beeline while we'er in hplsql mode. Hive will create a hplsql 
interpreter and store it in the session state. A new HplSqlOperation is created 
to run the script on the interpreter.

HPLSQL knows how to execute the for loop, but i'll call Hive to run the select 
expression. The HplSqlOperation is notified when the select reads a row and 
accumulates the rows into a RowSet which can be retrieved via thrift from the 
client side.

 

  was:
HPL/SQL is a standalone command line program that can store and load scripts 
from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
depends on Hive and not the other way around.

Changing the dependency order between HPL/SQL and HiveServer would open up some 
possibilities which are currently not feasable to implement. For example one 
might want to use a third party SQL tool to run selects on stored procedure (or 
rather function in this case) outputs.
{code:java}
SELECT * from myStoredProcedure(1, 2); {code}
HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
work with the current architecture.

Another important factor is performance. Declarative SQL commands are sent to 
Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
and use HiveSever’s internal API for compilation and execution.

The third factor is that existing tools like Beeline or Hue cannot be used with 
HPL/SQL since it has its own, separated CLI.

 

To make it easier to implement, we keep things separated in the inside at 
first, by introducing a hive session level JDBC parameter.
{code:java}
jdbc:hive2://localhost:1/default;hplsqlMode=true {code}
 

The hplsqlMode indicates that we are in procedural SQL mode where the user can 
create and call stored procedures. HPLSQL allows you to write any kind of 
procedural statement at the top level. This patch doesn't limit this but it 
might be better to eventually restrict what statements are allowed outside of 
stored procedures.

 

Since HPLSQL and Hive are running in the same process there is no need to use 
the JDBC driver between them. The patch adds an abstraction on with 2 different 
implementations, one for executing queries on JDBC (for keeping the existing 
behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode 
the latter is used.

In the inside a new operation (HplSqlOperation) and operation type 
(PROCEDURAL_SQL) was added which works similar to the SQLOperation but it uses 
the hplsql interpreter to

[jira] [Updated] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-10-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24230:
-
Description: 
HPL/SQL is a standalone command line program that can store and load scripts 
from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
depends on Hive and not the other way around.

Changing the dependency order between HPL/SQL and HiveServer would open up some 
possibilities which are currently not feasable to implement. For example one 
might want to use a third party SQL tool to run selects on stored procedure (or 
rather function in this case) outputs.
{code:java}
SELECT * from myStoredProcedure(1, 2); {code}
HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
work with the current architecture.

Another important factor is performance. Declarative SQL commands are sent to 
Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
and use HiveSever’s internal API for compilation and execution.

The third factor is that existing tools like Beeline or Hue cannot be used with 
HPL/SQL since it has its own, separated CLI.

 

To make it easier to implement, we keep things separated in the inside at 
first, by introducing a hive session level JDBC parameter.
{code:java}
jdbc:hive2://localhost:1/default;hplsqlMode=true {code}
 

The hplsqlMode indicates that we are in procedural SQL mode where the user can 
create and call stored procedures. HPLSQL allows you to write any kind of 
procedural statement at the top level. This patch doesn't limit this but it 
might be better to eventually restrict what statements are allowed outside of 
stored procedures.

 

Since HPLSQL and Hive are running in the same process there is no need to use 
the JDBC driver between them. The patch adds an abstraction on with 2 different 
implementations, one for executing queries on JDBC (for keeping the existing 
behaviour) and another one for directly calling Hive's compiler. In HPLSQL mode 
the latter is used.

In the inside a new operation (HplSqlOperation) and operation type 
(PROCEDURAL_SQL) was added which works similar to the SQLOperation but it uses 
the hplsql interpreter to execute arbitrary scripts. This operation might 
spawns new SQLOpertions.

For example consider the following statement:
{code:java}
FOR i in 1..10 LOOP   
  SELECT * FROM table 
END LOOP;{code}
We send this to beeline while we'er in hplsql mode. Hive will create a hplsql 
interpreter and store it in the session state. A new HplSqlOperation is created 
to run the script on the interpreter.

HPLSQL knows how to execute the for loop, but i'll call Hive to run the select 
expression. The HplSqlOperation is notified when the select reads a row and 
accumulates the rows into a RowSet which can be retrieved via thrift from the 
client side.

 

  was:
HPL/SQL is a standalone command line program that can store and load scripts 
from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
depends on Hive and not the other way around.

Changing the dependency order between HPL/SQL and HiveServer would open up some 
possibilities which are currently not feasable to implement. For example one 
might want to use a third party SQL tool to run selects on stored procedure (or 
rather function in this case) outputs.
{code:java}
SELECT * from myStoredProcedure(1, 2); {code}
HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
work with the current architecture.

Another important factor is performance. Declarative SQL commands are sent to 
Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
and use HiveSever’s internal API for compilation and execution.

The third factor is that existing tools like Beeline or Hue cannot be used with 
HPL/SQL since it has its own, separated CLI.


> Integrate HPL/SQL into HiveServer2
> --
>
> Key: HIVE-24230
> URL: https://issues.apache.org/jira/browse/HIVE-24230
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> HPL/SQL is a standalone command line program that can store and load scripts 
> from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
> depends on Hive and not the other way around.
> Changing the dependency order between HPL/SQL and HiveServer would open up 
> some possibilities which are currently not feasable to implement. For example 
> one might want to use a third party SQL tool to run selects on stored 
> procedure (or rather function in this case) outputs.
> {code:java}
> SELECT * from myStoredProcedure(1, 2); {code}
> HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
> work with the current archit

[jira] [Assigned] (HIVE-24315) Improve validation and semantic analysis in HPL/SQL

2020-10-27 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24315:



> Improve validation and semantic analysis in HPL/SQL 
> 
>
> Key: HIVE-24315
> URL: https://issues.apache.org/jira/browse/HIVE-24315
> Project: Hive
>  Issue Type: Improvement
>  Components: hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> There are some known issues that need to be fixed. For example it seems that 
> arity of a function is not checked when calling it, and same is true for 
> parameter types. Calling an undefined function is evaluated to null and 
> sometimes it seems that incorrect syntax is silently ignored. 
> In cases like this a helpful error message would be expected, thought we 
> should also consider how PL/SQL works and maintain compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-10-05 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17208111#comment-17208111
 ] 

Attila Magyar commented on HIVE-24230:
--

cc: [~kgyrtkirk]

> Integrate HPL/SQL into HiveServer2
> --
>
> Key: HIVE-24230
> URL: https://issues.apache.org/jira/browse/HIVE-24230
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> HPL/SQL is a standalone command line program that can store and load scripts 
> from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
> depends on Hive and not the other way around.
> Changing the dependency order between HPL/SQL and HiveServer would open up 
> some possibilities which are currently not feasable to implement. For example 
> one might want to use a third party SQL tool to run selects on stored 
> procedure (or rather function in this case) outputs.
> {code:java}
> SELECT * from myStoredProcedure(1, 2); {code}
> HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
> work with the current architecture.
> Another important factor is performance. Declarative SQL commands are sent to 
> Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
> and use HiveSever’s internal API for compilation and execution.
> The third factor is that existing tools like Beeline or Hue cannot be used 
> with HPL/SQL since it has its own, separated CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24230) Integrate HPL/SQL into HiveServer2

2020-10-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24230:



> Integrate HPL/SQL into HiveServer2
> --
>
> Key: HIVE-24230
> URL: https://issues.apache.org/jira/browse/HIVE-24230
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, hpl/sql
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> HPL/SQL is a standalone command line program that can store and load scripts 
> from text files, or from Hive Metastore (since HIVE-24217). Currently HPL/SQL 
> depends on Hive and not the other way around.
> Changing the dependency order between HPL/SQL and HiveServer would open up 
> some possibilities which are currently not feasable to implement. For example 
> one might want to use a third party SQL tool to run selects on stored 
> procedure (or rather function in this case) outputs.
> {code:java}
> SELECT * from myStoredProcedure(1, 2); {code}
> HPL/SQL doesn’t have a JDBC interface and it’s not a daemon so this would not 
> work with the current architecture.
> Another important factor is performance. Declarative SQL commands are sent to 
> Hive via JDBC by HPL/SQL. The integration would make it possible to drop JDBC 
> and use HiveSever’s internal API for compilation and execution.
> The third factor is that existing tools like Beeline or Hue cannot be used 
> with HPL/SQL since it has its own, separated CLI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-09-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24217:
-
Description: 
HPL/SQL procedures are currently stored in text files. The goal of this Jira is 
to implement a Metastore backend for storing and loading these procedures.

 

See the attached design for more information.

  was:HPL/SQL procedures are currently stored in text files. The goal of this 
Jira is to implement a Metastore backend for storing and loading these 
procedures.


> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-09-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24217:
-
Description: 
HPL/SQL procedures are currently stored in text files. The goal of this Jira is 
to implement a Metastore backend for storing and loading these procedures. This 
is an incremental step towards having fully capable stored procedures in Hive.

 

See the attached design for more information.

  was:
HPL/SQL procedures are currently stored in text files. The goal of this Jira is 
to implement a Metastore backend for storing and loading these procedures.

 

See the attached design for more information.


> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures. 
> This is an incremental step towards having fully capable stored procedures in 
> Hive.
>  
> See the attached design for more information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-09-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-24217:
-
Attachment: HPL_SQL storedproc HMS storage.pdf

> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Attachments: HPL_SQL storedproc HMS storage.pdf
>
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24217) HMS storage backend for HPL/SQL stored procedures

2020-09-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24217:



> HMS storage backend for HPL/SQL stored procedures
> -
>
> Key: HIVE-24217
> URL: https://issues.apache.org/jira/browse/HIVE-24217
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, hpl/sql, Metastore
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> HPL/SQL procedures are currently stored in text files. The goal of this Jira 
> is to implement a Metastore backend for storing and loading these procedures.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24149) HiveStreamingConnection doesn't close HMS connection

2020-09-11 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-24149:



> HiveStreamingConnection doesn't close HMS connection
> 
>
> Key: HIVE-24149
> URL: https://issues.apache.org/jira/browse/HIVE-24149
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> There 3 HMS connections used by HiveStreamingConnection. One for TX one for 
> hearbeat and for notifications. The close method only closes the first 2 
> leaving the last one open which eventually overloads HMS and it becomes 
> unresponsive.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23957) Limit followed by TopNKey improvement

2020-07-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-23957:



> Limit followed by TopNKey improvement
> -
>
> Key: HIVE-23957
> URL: https://issues.apache.org/jira/browse/HIVE-23957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> The Limit + topnkey pushdown might result a limit operator followed by a TNK 
> in the physical plan. This likely makes the TNK unnecessary in cases like 
> this. Need to investigate if/when we can remove the TNK.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23723) Limit operator pushdown through LOJ

2020-07-27 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23723:
-
Attachment: (was: HIVE-23723.1.patch)

> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23937) Take null ordering into consideration when pushing TNK through inner joins

2020-07-27 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-23937:



> Take null ordering into consideration when pushing TNK through inner joins
> --
>
> Key: HIVE-23937
> URL: https://issues.apache.org/jira/browse/HIVE-23937
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23817) Pushing TopN Key operator PKFK inner joins

2020-07-08 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-23817:



> Pushing TopN Key operator PKFK inner joins
> --
>
> Key: HIVE-23817
> URL: https://issues.apache.org/jira/browse/HIVE-23817
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>
> If there is primary key foreign key relationship between the tables we can 
> push the topnkey operator through the join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23757) Pushing TopN Key operator through MAPJOIN

2020-06-30 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23757:
-
Attachment: (was: HIVE-23757.1.patch)

> Pushing TopN Key operator through MAPJOIN
> -
>
> Key: HIVE-23757
> URL: https://issues.apache.org/jira/browse/HIVE-23757
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> So far only MERGEJOIN + JOIN cases are handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23757) Pushing TopN Key operator through MAPJOIN

2020-06-25 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144739#comment-17144739
 ] 

Attila Magyar commented on HIVE-23757:
--

cc: [~kkasa], [~jcamacho]

> Pushing TopN Key operator through MAPJOIN
> -
>
> Key: HIVE-23757
> URL: https://issues.apache.org/jira/browse/HIVE-23757
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23757.1.patch
>
>
> So far only MERGEJOIN + JOIN cases are handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23757) Pushing TopN Key operator through MAPJOIN

2020-06-24 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23757:
-
Status: Patch Available  (was: Open)

> Pushing TopN Key operator through MAPJOIN
> -
>
> Key: HIVE-23757
> URL: https://issues.apache.org/jira/browse/HIVE-23757
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23757.1.patch
>
>
> So far only MERGEJOIN + JOIN cases are handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23757) Pushing TopN Key operator through MAPJOIN

2020-06-24 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23757:
-
Attachment: HIVE-23757.1.patch

> Pushing TopN Key operator through MAPJOIN
> -
>
> Key: HIVE-23757
> URL: https://issues.apache.org/jira/browse/HIVE-23757
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23757.1.patch
>
>
> So far only MERGEJOIN + JOIN cases are handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23757) Pushing TopN Key operator through MAPJOIN

2020-06-24 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-23757:



> Pushing TopN Key operator through MAPJOIN
> -
>
> Key: HIVE-23757
> URL: https://issues.apache.org/jira/browse/HIVE-23757
> Project: Hive
>  Issue Type: Improvement
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23757.1.patch
>
>
> So far only MERGEJOIN + JOIN cases are handled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-23 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142863#comment-17142863
 ] 

Attila Magyar edited comment on HIVE-23723 at 6/23/20, 5:14 PM:


[~jcamachorodriguez],

??Concerning your patch, it seems you are removing the original limit on top of 
the left outer join? Note that you cannot remove it : If you have 5 input rows 
on the left side, you know the LOJ will produce at least 5 rows, however you 
cannot guarantee the join will produce 5 rows at most.??

 

Got it, that should be kept indeed. However reason why additional reducers are 
introduced by the limittranspose implementation is not fully clear to me.

Do you think we should drop this patch as it's already implemented by the 
limittranspose, and focus on tweaking the existing implementation?

 

cc: [~ashutoshc]


was (Author: amagyar):
[~jcamachorodriguez],

??Concerning your patch, it seems you are removing the original limit on top of 
the left outer join? Note that you cannot remove it : If you have 5 input rows 
on the left side, you know the LOJ will produce at least 5 rows, however you 
cannot guarantee the join will produce 5 rows at most.??

 

Got it, that should be kept indeed. However reason why additional reducers are 
introduced by the limittranspose implementation is not fully clear to me.

Do you think we should drop this patch as it's already implemented by the 
limittranspose, and focus on tweaking the existing implementation?

> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23723.1.patch
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-23 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17142863#comment-17142863
 ] 

Attila Magyar commented on HIVE-23723:
--

[~jcamachorodriguez],

??Concerning your patch, it seems you are removing the original limit on top of 
the left outer join? Note that you cannot remove it : If you have 5 input rows 
on the left side, you know the LOJ will produce at least 5 rows, however you 
cannot guarantee the join will produce 5 rows at most.??

 

Got it, that should be kept indeed. However reason why additional reducers are 
introduced by the limittranspose implementation is not fully clear to me.

Do you think we should drop this patch as it's already implemented by the 
limittranspose, and focus on tweaking the existing implementation?

> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23723.1.patch
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-19 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140588#comment-17140588
 ] 

Attila Magyar commented on HIVE-23723:
--

[~jcamachorodriguez], thanks for letting me know, I haven't realized it. Any 
idea why it is disabled by default? 

Also the plan looks different with limittranspose, not sure why. There are 3 
Limit operators. The first one is what was pushed through the LOJ. But there 
are 2 others in Reducer 2.

 
{code:java}
explain
SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key 
= src2.key) LIMIT 5; {code}
{code:java}
PREHOOK: query: explain
SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key 
= src2.key) LIMIT 5
PREHOOK: type: QUERY
PREHOOK: Input: default@src
 A masked pattern was here 
POSTHOOK: query: explain
SELECT src1.key, src2.value FROM src src1 LEFT OUTER JOIN src src2 ON (src1.key 
= src2.key) LIMIT 5
POSTHOOK: type: QUERY
POSTHOOK: Input: default@src
 A masked pattern was here 
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1STAGE PLANS:
  Stage: Stage-1
Tez
 A masked pattern was here 
  Edges:
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
Reducer 3 <- Map 4 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE)
 A masked pattern was here 
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: src1
  Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
  Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 500 Data size: 43500 Basic stats: 
COMPLETE Column stats: COMPLETE
Limit
  Number of rows: 5
  Statistics: Num rows: 5 Data size: 435 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
null sort order: 
sort order: 
Statistics: Num rows: 5 Data size: 435 Basic stats: 
COMPLETE Column stats: COMPLETE
TopN Hash Memory Usage: 0.3
value expressions: _col0 (type: string)
Execution mode: vectorized, llap
LLAP IO: no inputs
Map 4 
Map Operator Tree:
TableScan
  alias: src2
  filterExpr: key is not null (type: boolean)
  Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
Select Operator
  expressions: key (type: string), value (type: string)
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
  Reduce Output Operator
key expressions: _col0 (type: string)
null sort order: z
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 500 Data size: 89000 Basic stats: 
COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: string)
Execution mode: vectorized, llap
LLAP IO: no inputs
Reducer 2 
Execution mode: vectorized, llap
Reduce Operator Tree:
  Limit
Number of rows: 5
Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE 
Column stats: COMPLETE
Select Operator
  expressions: VALUE._col0 (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 5 Data size: 435 Basic stats: COMPLETE 
Column stats: COMPLETE
  Limit
Number of rows: 5
Statistics: Num rows: 5 Data size: 435 Basic stats: 
COMPLETE Column stats: COMPLETE
Reduce Output Operator
  key expressions: _col0 (type: string)
  null sort order: z
  sort order: +
  Map-reduce partition columns: _col0 (type: string)
  Statistics: Num rows: 5 Data size: 435 Basic stats: 
COMPLETE Column stats: COMPLETE
Reducer 3 
Execution mode: llap
Reduce Operator Tree:
  Merge Join Operator
condition map:
 Left Outer Join 0 to 1
   

[jira] [Updated] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-18 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23723:
-
Status: Patch Available  (was: Open)

> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23723.1.patch
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-18 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23723:
-
Attachment: HIVE-23723.1.patch

> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23723.1.patch
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23723) Limit operator pushdown through LOJ

2020-06-18 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-23723:



> Limit operator pushdown through LOJ
> ---
>
> Key: HIVE-23723
> URL: https://issues.apache.org/jira/browse/HIVE-23723
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
>
> Limit operator (without an order by) can be pushed through SELECTS and LEFT 
> OUTER JOINs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22687) Query hangs indefinitely if LLAP daemon registers after the query is submitted

2020-06-09 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17129639#comment-17129639
 ] 

Attila Magyar commented on HIVE-22687:
--

I reproduced this issue by putting a sleep between the worker + slot node 
creation and submitting a query in between those two events.

After I applied the patch I was no longer able to reproduce it, so this seems 
to be a viable fix to me.

cc: [~ashutoshc] [~prasanth_j]

> Query hangs indefinitely if LLAP daemon registers after the query is submitted
> --
>
> Key: HIVE-22687
> URL: https://issues.apache.org/jira/browse/HIVE-22687
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Affects Versions: 3.1.0
>Reporter: Himanshu Mishra
>Assignee: Himanshu Mishra
>Priority: Major
> Attachments: HIVE-22687.01.patch, HIVE-22687.02.patch
>
>
> If a query is submitted and no LLAP daemon is running, it waits for 1 minute 
> and times out with error {{SERVICE_UNAVAILABLE}}.
> While waiting, if a new LLAP Daemon starts, then the timeout is cancelled, 
> and the tasks do not get scheduled as well. As a result, the query hangs 
> indefinitely.
> This is due to the race condition where LLAP Daemon first registers the LLAP 
> instance at {{.../workers/worker-}}, and afterwards registers 
> {{.../workers/slot-}}. In the gap between two, Tez AM gets notified of 
> worker zk node and while processing it checks if slot zk node is present, if 
> not it rejects the LLAP Daemon. Error in Tez AM is:
> {code:java}
> [INFO] [LlapScheduler] |impl.LlapZookeeperRegistryImpl|: Unknown slot for 
> 8ebfdc45-0382-4757-9416-52898885af90{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-08 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23277:
-
Status: Open  (was: Patch Available)

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 
> AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-08 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23277:
-
Attachment: (was: HIVE-23277.2.patch)

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 
> AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23277:
-
Attachment: HIVE-23277.2.patch

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, HIVE-23277.2.patch, Screenshot 
> 2020-04-23 at 11.27.42 AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23277:
-
Status: Patch Available  (was: Open)

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, HIVE-23277.2.patch, Screenshot 
> 2020-04-23 at 11.27.42 AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23277:
-
Attachment: (was: HIVE-23277.2.patch)

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, HIVE-23277.2.patch, Screenshot 
> 2020-04-23 at 11.27.42 AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23277:
-
Attachment: HIVE-23277.2.patch

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, HIVE-23277.2.patch, Screenshot 
> 2020-04-23 at 11.27.42 AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-05 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23277:
-
Status: Open  (was: Patch Available)

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, HIVE-23277.2.patch, Screenshot 
> 2020-04-23 at 11.27.42 AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-03 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124953#comment-17124953
 ] 

Attila Magyar edited comment on HIVE-23277 at 6/3/20, 1:31 PM:
---

Hey [~rajesh.balamohan], I made a patch for this, where json serialization 
happens on the logWriter's thread. The event is only built partially up front 
with a json object (not the serialized string) and the conversion happens right 
before writing out the event.

However events like this take up more space in memory as before. About twice as 
much. The queue has a default max capacity of 64 so this might not be a problem.
{code:java}
./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.838Z 
hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local 
hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 
class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log 
Writer 0"] XXX size with serialized JSON: 392288 

./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.833Z 
hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local 
hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 
class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log 
Writer 0"] XXX with JSON object: 779536{code}
 

How significant do you think the speed improvements is? Is it worth it? Based 
on my own measurements the JSON serialization wasn't that slow with the queries 
I used (about 10-15 ms).


was (Author: amagyar):
Hey [~rajesh.balamohan], I made a patch for this, where json serialization 
happens on the logWriter's thread. The event is only built partially up front 
with a json object (not the serialized string) and the conversion happens right 
before writing out the event.

However events like this takes up more space in memory as before. About twice 
as much. The queue has a default max capacity of 64 so this might not be a 
problem.
{code:java}
./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.838Z 
hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local 
hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 
class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log 
Writer 0"] XXX size with serialized JSON: 392288 

./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.833Z 
hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local 
hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 
class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log 
Writer 0"] XXX with JSON object: 779536{code}

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 
> AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-03 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124953#comment-17124953
 ] 

Attila Magyar commented on HIVE-23277:
--

Hey [~rajesh.balamohan], I made a patch for this, where json serialization 
happens on the logWriter's thread. The event is only built partially up front 
with a json object (not the serialized string) and the conversion happens right 
before writing out the event.

However events like this takes up more space in memory as before. About twice 
as much. The queue has a default max capacity of 64 so this might not be a 
problem.
{code:java}
./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.838Z 
hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local 
hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 
class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log 
Writer 0"] XXX size with serialized JSON: 392288 

./app=hiveserver2/2020-06-03-13-05_0.log.gz:<14>1 2020-06-03T13:07:39.833Z 
hiveserver2-0.hiveserver2-service.compute-1591188147-6npj.svc.cluster.local 
hiveserver2 1 7e79dde9-4ac7-4df6-932f-1be75ec58e73 [mdc@18060 
class="hooks.HiveProtoLoggingHook" level="INFO" thread="Hive Hook Proto Log 
Writer 0"] XXX with JSON object: 779536{code}

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 
> AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-03 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23277:
-
Status: Patch Available  (was: Open)

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 
> AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-03 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23277:
-
Attachment: HIVE-23277.1.patch

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: HIVE-23277.1.patch, Screenshot 2020-04-23 at 11.27.42 
> AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23277) HiveProtoLogger should carry out JSON conversion in its own thread

2020-06-02 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar reassigned HIVE-23277:


Assignee: Attila Magyar

> HiveProtoLogger should carry out JSON conversion in its own thread
> --
>
> Key: HIVE-23277
> URL: https://issues.apache.org/jira/browse/HIVE-23277
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Attila Magyar
>Priority: Minor
> Attachments: Screenshot 2020-04-23 at 11.27.42 AM.png
>
>
> !Screenshot 2020-04-23 at 11.27.42 AM.png|width=623,height=423!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-16220) Memory leak when creating a table using location and NameNode in HA

2020-06-02 Thread Attila Magyar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-16220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123732#comment-17123732
 ] 

Attila Magyar commented on HIVE-16220:
--

Do you need to constantly run multiple create table statements to reproduce 
this or only one?

> Memory leak when creating a table using location and NameNode in HA
> ---
>
> Key: HIVE-16220
> URL: https://issues.apache.org/jira/browse/HIVE-16220
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.2.1, 3.0.0
> Environment: HDP-2.4.0.0
> HDP-3.1.0.0
>Reporter: Angel Alvarez Pascua
>Priority: Major
>
> The following simple DDL
> CREATE TABLE `test`(`field` varchar(1)) LOCATION 
> 'hdfs://benderHA/apps/hive/warehouse/test'
> ends up generating a huge memory leak in the HiveServer2 service.
> After two weeks without a restart, the service stops suddenly because of 
> OutOfMemory errors.
> This only happens when we're in an environment in which the NameNode is in 
> HA,  otherwise, nothing (so weird) happens. If the location clause is not 
> present, everything is also fine.
> It seems, multiples instances of Hadoop configuration are created when we're 
> in an HA environment:
> 
> 2.618 instances of "org.apache.hadoop.conf.Configuration", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" 
> occupy 350.263.816 (81,66%) bytes. These instances are referenced from one 
> instance of "java.util.HashMap$Node[]", 
> loaded by ""
> 
> 5.216 instances of "org.apache.hadoop.conf.Configuration", loaded by 
> "sun.misc.Launcher$AppClassLoader @ 0x4d260de88" 
> occupy 699.901.416 (87,32%) bytes. These instances are referenced from one 
> instance of "java.util.HashMap$Node[]", 
> loaded by ""



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23580) deleteOnExit set is not cleaned up, causing memory pressure

2020-05-29 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23580:
-
Status: Patch Available  (was: Open)

> deleteOnExit set is not cleaned up, causing memory pressure
> ---
>
> Key: HIVE-23580
> URL: https://issues.apache.org/jira/browse/HIVE-23580
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23580.1.patch
>
>
> removeScratchDir doesn't always calls cancelDeleteOnExit() on context::clear



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23580) deleteOnExit set is not cleaned up, causing memory pressure

2020-05-29 Thread Attila Magyar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Magyar updated HIVE-23580:
-
Attachment: (was: HIVE-23580.1.patch)

> deleteOnExit set is not cleaned up, causing memory pressure
> ---
>
> Key: HIVE-23580
> URL: https://issues.apache.org/jira/browse/HIVE-23580
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: Attila Magyar
>Assignee: Attila Magyar
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-23580.1.patch
>
>
> removeScratchDir doesn't always calls cancelDeleteOnExit() on context::clear



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   >