[jira] [Updated] (HIVE-27498) Support custom delimiter in SkippingTextInputFormat

2024-05-28 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar updated HIVE-27498:
-
Status: Patch Available  (was: In Progress)

> Support custom delimiter in SkippingTextInputFormat
> ---
>
> Key: HIVE-27498
> URL: https://issues.apache.org/jira/browse/HIVE-27498
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: pull-request-available
>
> Simple select is returning results as expected when there are configs
> {noformat}
> 'skip.header.line.count'='1',                    
> 'textinputformat.record.delimiter'='|'{noformat}
> but if we execute select count(*) or any query that launches a tez job is 
> considering the whole text as single line
> *Test case*
> data.csv
> {noformat}
> CodeName|A |B 
> |C  {noformat}
> DDL
> {noformat}
> create external table test(code string,name string)
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>  WITH SERDEPROPERTIES (
>'field.delim'='\t')
>  STORED AS INPUTFORMAT
>'org.apache.hadoop.mapred.TextInputFormat'
>  OUTPUTFORMAT
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>location '${system:test.tmp.dir}/test'
>  TBLPROPERTIES (
>'skip.header.line.count'='1',
>'textinputformat.record.delimiter'='|');{noformat}
> Query result
> select code,name from test;
> {noformat}
> A 
> B 
> 
> C {noformat}
> *Problem:* But query _+select count(*) from test+_  is returning 1 instead of 
> 3
> It used to work in older hive versions.
> The difference in behaviour started to happen after the introduction of 
> feature https://issues.apache.org/jira/browse/HIVE-21924
> The feature aims at splitting the text files while reading even though the 
> table has configuration to skip headers. There by increasing the number of 
> mappers to process the query there by improving throughput of the query.
> The actual problem lies in how new feature is reading a file. It does not 
> consider 'textinputformat.record.delimiter' property and tries to read the 
> file looking for new line characters. Since the input file does not have a 
> new line for every record, it is reading the whole file as single line and 
> count is returned as 1
> Ref: 
> [https://github.com/apache/hive/blob/24a82a65f96b65eeebe4e23b2fec425037a70216/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L548]
>  
>  *Workaround*
> If we can remove headers in the data and skip header config in table 
> properties or compress the files, then we will not get into this issue
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-27498) Support custom delimiter in SkippingTextInputFormat

2024-05-28 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-27498 started by Mayank Kunwar.

> Support custom delimiter in SkippingTextInputFormat
> ---
>
> Key: HIVE-27498
> URL: https://issues.apache.org/jira/browse/HIVE-27498
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: pull-request-available
>
> Simple select is returning results as expected when there are configs
> {noformat}
> 'skip.header.line.count'='1',                    
> 'textinputformat.record.delimiter'='|'{noformat}
> but if we execute select count(*) or any query that launches a tez job is 
> considering the whole text as single line
> *Test case*
> data.csv
> {noformat}
> CodeName|A |B 
> |C  {noformat}
> DDL
> {noformat}
> create external table test(code string,name string)
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>  WITH SERDEPROPERTIES (
>'field.delim'='\t')
>  STORED AS INPUTFORMAT
>'org.apache.hadoop.mapred.TextInputFormat'
>  OUTPUTFORMAT
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>location '${system:test.tmp.dir}/test'
>  TBLPROPERTIES (
>'skip.header.line.count'='1',
>'textinputformat.record.delimiter'='|');{noformat}
> Query result
> select code,name from test;
> {noformat}
> A 
> B 
> 
> C {noformat}
> *Problem:* But query _+select count(*) from test+_  is returning 1 instead of 
> 3
> It used to work in older hive versions.
> The difference in behaviour started to happen after the introduction of 
> feature https://issues.apache.org/jira/browse/HIVE-21924
> The feature aims at splitting the text files while reading even though the 
> table has configuration to skip headers. There by increasing the number of 
> mappers to process the query there by improving throughput of the query.
> The actual problem lies in how new feature is reading a file. It does not 
> consider 'textinputformat.record.delimiter' property and tries to read the 
> file looking for new line characters. Since the input file does not have a 
> new line for every record, it is reading the whole file as single line and 
> count is returned as 1
> Ref: 
> [https://github.com/apache/hive/blob/24a82a65f96b65eeebe4e23b2fec425037a70216/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L548]
>  
>  *Workaround*
> If we can remove headers in the data and skip header config in table 
> properties or compress the files, then we will not get into this issue
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (HIVE-27498) Support custom delimiter in SkippingTextInputFormat

2024-05-23 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar reopened HIVE-27498:
--

The issue is hitting again, so reopening the ticket

> Support custom delimiter in SkippingTextInputFormat
> ---
>
> Key: HIVE-27498
> URL: https://issues.apache.org/jira/browse/HIVE-27498
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Priority: Major
>
> Simple select is returning results as expected when there are configs
> {noformat}
> 'skip.header.line.count'='1',                    
> 'textinputformat.record.delimiter'='|'{noformat}
> but if we execute select count(*) or any query that launches a tez job is 
> considering the whole text as single line
> *Test case*
> data.csv
> {noformat}
> CodeName|A |B 
> |C  {noformat}
> DDL
> {noformat}
> create external table test(code string,name string)
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>  WITH SERDEPROPERTIES (
>'field.delim'='\t')
>  STORED AS INPUTFORMAT
>'org.apache.hadoop.mapred.TextInputFormat'
>  OUTPUTFORMAT
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>location '${system:test.tmp.dir}/test'
>  TBLPROPERTIES (
>'skip.header.line.count'='1',
>'textinputformat.record.delimiter'='|');{noformat}
> Query result
> select code,name from test;
> {noformat}
> A 
> B 
> 
> C {noformat}
> *Problem:* But query _+select count(*) from test+_  is returning 1 instead of 
> 3
> It used to work in older hive versions.
> The difference in behaviour started to happen after the introduction of 
> feature https://issues.apache.org/jira/browse/HIVE-21924
> The feature aims at splitting the text files while reading even though the 
> table has configuration to skip headers. There by increasing the number of 
> mappers to process the query there by improving throughput of the query.
> The actual problem lies in how new feature is reading a file. It does not 
> consider 'textinputformat.record.delimiter' property and tries to read the 
> file looking for new line characters. Since the input file does not have a 
> new line for every record, it is reading the whole file as single line and 
> count is returned as 1
> Ref: 
> [https://github.com/apache/hive/blob/24a82a65f96b65eeebe4e23b2fec425037a70216/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L548]
>  
>  *Workaround*
> If we can remove headers in the data and skip header config in table 
> properties or compress the files, then we will not get into this issue
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27498) Support custom delimiter in SkippingTextInputFormat

2024-05-23 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar reassigned HIVE-27498:


Assignee: Mayank Kunwar

> Support custom delimiter in SkippingTextInputFormat
> ---
>
> Key: HIVE-27498
> URL: https://issues.apache.org/jira/browse/HIVE-27498
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Assignee: Mayank Kunwar
>Priority: Major
>
> Simple select is returning results as expected when there are configs
> {noformat}
> 'skip.header.line.count'='1',                    
> 'textinputformat.record.delimiter'='|'{noformat}
> but if we execute select count(*) or any query that launches a tez job is 
> considering the whole text as single line
> *Test case*
> data.csv
> {noformat}
> CodeName|A |B 
> |C  {noformat}
> DDL
> {noformat}
> create external table test(code string,name string)
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>  WITH SERDEPROPERTIES (
>'field.delim'='\t')
>  STORED AS INPUTFORMAT
>'org.apache.hadoop.mapred.TextInputFormat'
>  OUTPUTFORMAT
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>location '${system:test.tmp.dir}/test'
>  TBLPROPERTIES (
>'skip.header.line.count'='1',
>'textinputformat.record.delimiter'='|');{noformat}
> Query result
> select code,name from test;
> {noformat}
> A 
> B 
> 
> C {noformat}
> *Problem:* But query _+select count(*) from test+_  is returning 1 instead of 
> 3
> It used to work in older hive versions.
> The difference in behaviour started to happen after the introduction of 
> feature https://issues.apache.org/jira/browse/HIVE-21924
> The feature aims at splitting the text files while reading even though the 
> table has configuration to skip headers. There by increasing the number of 
> mappers to process the query there by improving throughput of the query.
> The actual problem lies in how new feature is reading a file. It does not 
> consider 'textinputformat.record.delimiter' property and tries to read the 
> file looking for new line characters. Since the input file does not have a 
> new line for every record, it is reading the whole file as single line and 
> count is returned as 1
> Ref: 
> [https://github.com/apache/hive/blob/24a82a65f96b65eeebe4e23b2fec425037a70216/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L548]
>  
>  *Workaround*
> If we can remove headers in the data and skip header config in table 
> properties or compress the files, then we will not get into this issue
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27498) Support custom delimiter in SkippingTextInputFormat

2024-05-09 Thread Mayank Kunwar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844902#comment-17844902
 ] 

Mayank Kunwar edited comment on HIVE-27498 at 5/9/24 8:38 AM:
--

Hi [~tarak271] 

I tried running the query, using hive *master* branch, but it works fine by 
returning the result as 3 for the +select count(*) from test+ query.


was (Author: JIRAUSER291741):
Hi [~tarak271] 

I tried running the query, but it works fine by returning the result as 3 for 
the +select count(*) from test+ query.

> Support custom delimiter in SkippingTextInputFormat
> ---
>
> Key: HIVE-27498
> URL: https://issues.apache.org/jira/browse/HIVE-27498
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Priority: Major
>
> Simple select is returning results as expected when there are configs
> {noformat}
> 'skip.header.line.count'='1',                    
> 'textinputformat.record.delimiter'='|'{noformat}
> but if we execute select count(*) or any query that launches a tez job is 
> considering the whole text as single line
> *Test case*
> data.csv
> {noformat}
> CodeName|A |B 
> |C  {noformat}
> DDL
> {noformat}
> create external table test(code string,name string)
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>  WITH SERDEPROPERTIES (
>'field.delim'='\t')
>  STORED AS INPUTFORMAT
>'org.apache.hadoop.mapred.TextInputFormat'
>  OUTPUTFORMAT
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>location '${system:test.tmp.dir}/test'
>  TBLPROPERTIES (
>'skip.header.line.count'='1',
>'textinputformat.record.delimiter'='|');{noformat}
> Query result
> select code,name from test;
> {noformat}
> A 
> B 
> 
> C {noformat}
> *Problem:* But query _+select count(*) from test+_  is returning 1 instead of 
> 3
> It used to work in older hive versions.
> The difference in behaviour started to happen after the introduction of 
> feature https://issues.apache.org/jira/browse/HIVE-21924
> The feature aims at splitting the text files while reading even though the 
> table has configuration to skip headers. There by increasing the number of 
> mappers to process the query there by improving throughput of the query.
> The actual problem lies in how new feature is reading a file. It does not 
> consider 'textinputformat.record.delimiter' property and tries to read the 
> file looking for new line characters. Since the input file does not have a 
> new line for every record, it is reading the whole file as single line and 
> count is returned as 1
> Ref: 
> [https://github.com/apache/hive/blob/24a82a65f96b65eeebe4e23b2fec425037a70216/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L548]
>  
>  *Workaround*
> If we can remove headers in the data and skip header config in table 
> properties or compress the files, then we will not get into this issue
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27498) Support custom delimiter in SkippingTextInputFormat

2024-05-09 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar updated HIVE-27498:
-
Fix Version/s: (was: 4.0.0)

> Support custom delimiter in SkippingTextInputFormat
> ---
>
> Key: HIVE-27498
> URL: https://issues.apache.org/jira/browse/HIVE-27498
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Priority: Major
>
> Simple select is returning results as expected when there are configs
> {noformat}
> 'skip.header.line.count'='1',                    
> 'textinputformat.record.delimiter'='|'{noformat}
> but if we execute select count(*) or any query that launches a tez job is 
> considering the whole text as single line
> *Test case*
> data.csv
> {noformat}
> CodeName|A |B 
> |C  {noformat}
> DDL
> {noformat}
> create external table test(code string,name string)
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>  WITH SERDEPROPERTIES (
>'field.delim'='\t')
>  STORED AS INPUTFORMAT
>'org.apache.hadoop.mapred.TextInputFormat'
>  OUTPUTFORMAT
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>location '${system:test.tmp.dir}/test'
>  TBLPROPERTIES (
>'skip.header.line.count'='1',
>'textinputformat.record.delimiter'='|');{noformat}
> Query result
> select code,name from test;
> {noformat}
> A 
> B 
> 
> C {noformat}
> *Problem:* But query _+select count(*) from test+_  is returning 1 instead of 
> 3
> It used to work in older hive versions.
> The difference in behaviour started to happen after the introduction of 
> feature https://issues.apache.org/jira/browse/HIVE-21924
> The feature aims at splitting the text files while reading even though the 
> table has configuration to skip headers. There by increasing the number of 
> mappers to process the query there by improving throughput of the query.
> The actual problem lies in how new feature is reading a file. It does not 
> consider 'textinputformat.record.delimiter' property and tries to read the 
> file looking for new line characters. Since the input file does not have a 
> new line for every record, it is reading the whole file as single line and 
> count is returned as 1
> Ref: 
> [https://github.com/apache/hive/blob/24a82a65f96b65eeebe4e23b2fec425037a70216/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L548]
>  
>  *Workaround*
> If we can remove headers in the data and skip header config in table 
> properties or compress the files, then we will not get into this issue
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (HIVE-27498) Support custom delimiter in SkippingTextInputFormat

2024-05-09 Thread Mayank Kunwar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844902#comment-17844902
 ] 

Mayank Kunwar edited comment on HIVE-27498 at 5/9/24 7:54 AM:
--

Hi [~tarak271] 

I tried running the query, but it works fine by returning the result as 3 for 
the +select count(*) from test+ query.


was (Author: JIRAUSER291741):
Hi [~tarak271] 

I tried running the query, but it works fine by return the result as 3 for the 
+select count(*) from test+ query.

> Support custom delimiter in SkippingTextInputFormat
> ---
>
> Key: HIVE-27498
> URL: https://issues.apache.org/jira/browse/HIVE-27498
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Priority: Major
> Fix For: 4.0.0
>
>
> Simple select is returning results as expected when there are configs
> {noformat}
> 'skip.header.line.count'='1',                    
> 'textinputformat.record.delimiter'='|'{noformat}
> but if we execute select count(*) or any query that launches a tez job is 
> considering the whole text as single line
> *Test case*
> data.csv
> {noformat}
> CodeName|A |B 
> |C  {noformat}
> DDL
> {noformat}
> create external table test(code string,name string)
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>  WITH SERDEPROPERTIES (
>'field.delim'='\t')
>  STORED AS INPUTFORMAT
>'org.apache.hadoop.mapred.TextInputFormat'
>  OUTPUTFORMAT
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>location '${system:test.tmp.dir}/test'
>  TBLPROPERTIES (
>'skip.header.line.count'='1',
>'textinputformat.record.delimiter'='|');{noformat}
> Query result
> select code,name from test;
> {noformat}
> A 
> B 
> 
> C {noformat}
> *Problem:* But query _+select count(*) from test+_  is returning 1 instead of 
> 3
> It used to work in older hive versions.
> The difference in behaviour started to happen after the introduction of 
> feature https://issues.apache.org/jira/browse/HIVE-21924
> The feature aims at splitting the text files while reading even though the 
> table has configuration to skip headers. There by increasing the number of 
> mappers to process the query there by improving throughput of the query.
> The actual problem lies in how new feature is reading a file. It does not 
> consider 'textinputformat.record.delimiter' property and tries to read the 
> file looking for new line characters. Since the input file does not have a 
> new line for every record, it is reading the whole file as single line and 
> count is returned as 1
> Ref: 
> [https://github.com/apache/hive/blob/24a82a65f96b65eeebe4e23b2fec425037a70216/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L548]
>  
>  *Workaround*
> If we can remove headers in the data and skip header config in table 
> properties or compress the files, then we will not get into this issue
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27498) Support custom delimiter in SkippingTextInputFormat

2024-05-09 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar resolved HIVE-27498.
--
Fix Version/s: 4.0.0
   Resolution: Not A Problem

> Support custom delimiter in SkippingTextInputFormat
> ---
>
> Key: HIVE-27498
> URL: https://issues.apache.org/jira/browse/HIVE-27498
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Priority: Major
> Fix For: 4.0.0
>
>
> Simple select is returning results as expected when there are configs
> {noformat}
> 'skip.header.line.count'='1',                    
> 'textinputformat.record.delimiter'='|'{noformat}
> but if we execute select count(*) or any query that launches a tez job is 
> considering the whole text as single line
> *Test case*
> data.csv
> {noformat}
> CodeName|A |B 
> |C  {noformat}
> DDL
> {noformat}
> create external table test(code string,name string)
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>  WITH SERDEPROPERTIES (
>'field.delim'='\t')
>  STORED AS INPUTFORMAT
>'org.apache.hadoop.mapred.TextInputFormat'
>  OUTPUTFORMAT
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>location '${system:test.tmp.dir}/test'
>  TBLPROPERTIES (
>'skip.header.line.count'='1',
>'textinputformat.record.delimiter'='|');{noformat}
> Query result
> select code,name from test;
> {noformat}
> A 
> B 
> 
> C {noformat}
> *Problem:* But query _+select count(*) from test+_  is returning 1 instead of 
> 3
> It used to work in older hive versions.
> The difference in behaviour started to happen after the introduction of 
> feature https://issues.apache.org/jira/browse/HIVE-21924
> The feature aims at splitting the text files while reading even though the 
> table has configuration to skip headers. There by increasing the number of 
> mappers to process the query there by improving throughput of the query.
> The actual problem lies in how new feature is reading a file. It does not 
> consider 'textinputformat.record.delimiter' property and tries to read the 
> file looking for new line characters. Since the input file does not have a 
> new line for every record, it is reading the whole file as single line and 
> count is returned as 1
> Ref: 
> [https://github.com/apache/hive/blob/24a82a65f96b65eeebe4e23b2fec425037a70216/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L548]
>  
>  *Workaround*
> If we can remove headers in the data and skip header config in table 
> properties or compress the files, then we will not get into this issue
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HIVE-27498) Support custom delimiter in SkippingTextInputFormat

2024-05-09 Thread Mayank Kunwar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17844902#comment-17844902
 ] 

Mayank Kunwar commented on HIVE-27498:
--

Hi [~tarak271] 

I tried running the query, but it works fine by return the result as 3 for the 
+select count(*) from test+ query.

> Support custom delimiter in SkippingTextInputFormat
> ---
>
> Key: HIVE-27498
> URL: https://issues.apache.org/jira/browse/HIVE-27498
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Taraka Rama Rao Lethavadla
>Priority: Major
>
> Simple select is returning results as expected when there are configs
> {noformat}
> 'skip.header.line.count'='1',                    
> 'textinputformat.record.delimiter'='|'{noformat}
> but if we execute select count(*) or any query that launches a tez job is 
> considering the whole text as single line
> *Test case*
> data.csv
> {noformat}
> CodeName|A |B 
> |C  {noformat}
> DDL
> {noformat}
> create external table test(code string,name string)
> ROW FORMAT SERDE
>'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
>  WITH SERDEPROPERTIES (
>'field.delim'='\t')
>  STORED AS INPUTFORMAT
>'org.apache.hadoop.mapred.TextInputFormat'
>  OUTPUTFORMAT
>'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
>location '${system:test.tmp.dir}/test'
>  TBLPROPERTIES (
>'skip.header.line.count'='1',
>'textinputformat.record.delimiter'='|');{noformat}
> Query result
> select code,name from test;
> {noformat}
> A 
> B 
> 
> C {noformat}
> *Problem:* But query _+select count(*) from test+_  is returning 1 instead of 
> 3
> It used to work in older hive versions.
> The difference in behaviour started to happen after the introduction of 
> feature https://issues.apache.org/jira/browse/HIVE-21924
> The feature aims at splitting the text files while reading even though the 
> table has configuration to skip headers. There by increasing the number of 
> mappers to process the query there by improving throughput of the query.
> The actual problem lies in how new feature is reading a file. It does not 
> consider 'textinputformat.record.delimiter' property and tries to read the 
> file looking for new line characters. Since the input file does not have a 
> new line for every record, it is reading the whole file as single line and 
> count is returned as 1
> Ref: 
> [https://github.com/apache/hive/blob/24a82a65f96b65eeebe4e23b2fec425037a70216/ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java#L548]
>  
>  *Workaround*
> If we can remove headers in the data and skip header config in table 
> properties or compress the files, then we will not get into this issue
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27892) Hive "insert overwrite table" for multiple partition table issue

2023-11-27 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar updated HIVE-27892:
-
Status: Patch Available  (was: Open)

> Hive "insert overwrite table" for multiple partition table issue
> 
>
> Key: HIVE-27892
> URL: https://issues.apache.org/jira/browse/HIVE-27892
> Project: Hive
>  Issue Type: Bug
>Reporter: Mayank Kunwar
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: pull-request-available
>
> Authorization is not working for Hive "insert overwrite table" for multiple 
> partition table.
> Steps to reproduce the issue:
> 1) CREATE EXTERNAL TABLE Part (eid int, name int)
> PARTITIONED BY (position int, dept int);
> 2) SET hive.exec.dynamic.partition.mode=nonstrict;
> 3) INSERT INTO TABLE PART PARTITION (position,DEPT)
> SELECT 1,1,1,1;
> 4) select * from part;
> create a test user test123, and grant test123 only Select permission for db 
> default, table Part and column * .
> 1) insert overwrite table part partition(position=2,DEPT=2) select 2,2;
> This will failed as expected.
> 2) insert overwrite table part partition(position,DEPT) select 2,2,2,2;
> This will failed as expected.
> 3) insert overwrite table part partition(position=2,DEPT) select 2,2,2;
> But this will succeed and no audit in Ranger, which means no authorization 
> happened when this query was executed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27892) Hive "insert overwrite table" for multiple partition table issue

2023-11-19 Thread Mayank Kunwar (Jira)
Mayank Kunwar created HIVE-27892:


 Summary: Hive "insert overwrite table" for multiple partition 
table issue
 Key: HIVE-27892
 URL: https://issues.apache.org/jira/browse/HIVE-27892
 Project: Hive
  Issue Type: Bug
Reporter: Mayank Kunwar
Assignee: Mayank Kunwar


Authorization is not working for Hive "insert overwrite table" for multiple 
partition table.

Steps to reproduce the issue:

1) CREATE EXTERNAL TABLE Part (eid int, name int)
PARTITIONED BY (position int, dept int);

2) SET hive.exec.dynamic.partition.mode=nonstrict;

3) INSERT INTO TABLE PART PARTITION (position,DEPT)
SELECT 1,1,1,1;

4) select * from part;

create a test user test123, and grant test123 only Select permission for db 
default, table Part and column * .

1) insert overwrite table part partition(position=2,DEPT=2) select 2,2;
This will failed as expected.

2) insert overwrite table part partition(position,DEPT) select 2,2,2,2;
This will failed as expected.

3) insert overwrite table part partition(position=2,DEPT) select 2,2,2;
But this will succeed and no audit in Ranger, which means no authorization 
happened when this query was executed.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-27702) Remove PowerMock from beeline and upgrade mockito to 4.11

2023-09-21 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar updated HIVE-27702:
-
Status: Patch Available  (was: Open)

> Remove PowerMock from beeline and upgrade mockito to 4.11
> -
>
> Key: HIVE-27702
> URL: https://issues.apache.org/jira/browse/HIVE-27702
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Zsolt Miskolczi
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: newbie, pull-request-available, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27702) Remove PowerMock from beeline and upgrade mockito to 4.11

2023-09-19 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar reassigned HIVE-27702:


Assignee: Mayank Kunwar

> Remove PowerMock from beeline and upgrade mockito to 4.11
> -
>
> Key: HIVE-27702
> URL: https://issues.apache.org/jira/browse/HIVE-27702
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Zsolt Miskolczi
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: newbie, starter
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HIVE-17350) metrics errors when retrying HS2 startup

2023-09-06 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17350 started by Mayank Kunwar.

> metrics errors when retrying HS2 startup
> 
>
> Key: HIVE-17350
> URL: https://issues.apache.org/jira/browse/HIVE-17350
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: pull-request-available
>
> Looks like there are some sort of retries that happen when HS2 init fails. 
> When HS2 startup fails for an unrelated reason and is retried, the metrics 
> source initialization fails on subsequent attempts. 
> {noformat}
> 2017-08-15T23:31:47,650 WARN  [main]: impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(152)) - hiveserver2 metrics system already 
> initialized!
> 2017-08-15T23:31:47,650 ERROR [main]: metastore.HiveMetaStore 
> (HiveMetaStore.java:init(438)) - error in Metrics init: 
> java.lang.reflect.InvocationTargetException null
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.common.metrics.common.MetricsFactory.init(MetricsFactory.java:42)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:435)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6892)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:140)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1653)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3612)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3664)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3644)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
>   at 
> org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:128)
>   at org.apache.hive.service.cli.CLIService.init(CLIService.java:113)
>   at 
> org.apache.hive.service.CompositeService.init(CompositeService.java:59)
>   at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:139)
>   at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:595)
>   at 
> org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:97)
>   at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:843)
>   at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:712)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.Del

[jira] [Updated] (HIVE-17350) metrics errors when retrying HS2 startup

2023-09-06 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar updated HIVE-17350:
-
Status: Patch Available  (was: In Progress)

> metrics errors when retrying HS2 startup
> 
>
> Key: HIVE-17350
> URL: https://issues.apache.org/jira/browse/HIVE-17350
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: pull-request-available
>
> Looks like there are some sort of retries that happen when HS2 init fails. 
> When HS2 startup fails for an unrelated reason and is retried, the metrics 
> source initialization fails on subsequent attempts. 
> {noformat}
> 2017-08-15T23:31:47,650 WARN  [main]: impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(152)) - hiveserver2 metrics system already 
> initialized!
> 2017-08-15T23:31:47,650 ERROR [main]: metastore.HiveMetaStore 
> (HiveMetaStore.java:init(438)) - error in Metrics init: 
> java.lang.reflect.InvocationTargetException null
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.common.metrics.common.MetricsFactory.init(MetricsFactory.java:42)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:435)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6892)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:140)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1653)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3612)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3664)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3644)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
>   at 
> org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:128)
>   at org.apache.hive.service.cli.CLIService.init(CLIService.java:113)
>   at 
> org.apache.hive.service.CompositeService.init(CompositeService.java:59)
>   at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:139)
>   at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:595)
>   at 
> org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:97)
>   at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:843)
>   at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:712)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 

[jira] [Commented] (HIVE-27657) Change hive.fetch.task.conversion.threshold default value

2023-09-05 Thread Mayank Kunwar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-27657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761980#comment-17761980
 ] 

Mayank Kunwar commented on HIVE-27657:
--

Thanks [~lvegh] for the review.

> Change hive.fetch.task.conversion.threshold default value
> -
>
> Key: HIVE-27657
> URL: https://issues.apache.org/jira/browse/HIVE-27657
> Project: Hive
>  Issue Type: Bug
>Reporter: Mayank Kunwar
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: pull-request-available
>
> With the introduction of [fetch task 
> caching|https://github.com/apache/hive/blob/d0a06239b09396d1b7a6414d85011f9a20f8486a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3532-L3535],
>  HS2 may quickly run out of memory if there are multiple parallel queries 
> with fetch task.
> We should be reducing the default value for 
> hive.fetch.task.conversion.threshold to 209715200 (200M) from current 
> 1073741824 (1G)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HIVE-27657) Change hive.fetch.task.conversion.threshold default value

2023-09-05 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar resolved HIVE-27657.
--
Resolution: Fixed

> Change hive.fetch.task.conversion.threshold default value
> -
>
> Key: HIVE-27657
> URL: https://issues.apache.org/jira/browse/HIVE-27657
> Project: Hive
>  Issue Type: Bug
>Reporter: Mayank Kunwar
>Assignee: Mayank Kunwar
>Priority: Major
>  Labels: pull-request-available
>
> With the introduction of [fetch task 
> caching|https://github.com/apache/hive/blob/d0a06239b09396d1b7a6414d85011f9a20f8486a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3532-L3535],
>  HS2 may quickly run out of memory if there are multiple parallel queries 
> with fetch task.
> We should be reducing the default value for 
> hive.fetch.task.conversion.threshold to 209715200 (200M) from current 
> 1073741824 (1G)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-27657) Change hive.fetch.task.conversion.threshold default value

2023-08-30 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar reassigned HIVE-27657:


Assignee: Mayank Kunwar

> Change hive.fetch.task.conversion.threshold default value
> -
>
> Key: HIVE-27657
> URL: https://issues.apache.org/jira/browse/HIVE-27657
> Project: Hive
>  Issue Type: Bug
>Reporter: Mayank Kunwar
>Assignee: Mayank Kunwar
>Priority: Major
>
> With the introduction of [fetch task 
> caching|https://github.com/apache/hive/blob/d0a06239b09396d1b7a6414d85011f9a20f8486a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3532-L3535],
>  HS2 may quickly run out of memory if there are multiple parallel queries 
> with fetch task.
> We should be reducing the default value for 
> hive.fetch.task.conversion.threshold to 209715200 (200M) from current 
> 1073741824 (1G)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27657) Change hive.fetch.task.conversion.threshold default value

2023-08-30 Thread Mayank Kunwar (Jira)
Mayank Kunwar created HIVE-27657:


 Summary: Change hive.fetch.task.conversion.threshold default value
 Key: HIVE-27657
 URL: https://issues.apache.org/jira/browse/HIVE-27657
 Project: Hive
  Issue Type: Bug
Reporter: Mayank Kunwar


With the introduction of [fetch task 
caching|https://github.com/apache/hive/blob/d0a06239b09396d1b7a6414d85011f9a20f8486a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3532-L3535],
 HS2 may quickly run out of memory if there are multiple parallel queries with 
fetch task.

We should be reducing the default value for 
hive.fetch.task.conversion.threshold to 209715200 (200M) from current 
1073741824 (1G)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-17350) metrics errors when retrying HS2 startup

2023-08-25 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-17350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar reassigned HIVE-17350:


Assignee: Mayank Kunwar

> metrics errors when retrying HS2 startup
> 
>
> Key: HIVE-17350
> URL: https://issues.apache.org/jira/browse/HIVE-17350
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Mayank Kunwar
>Priority: Major
>
> Looks like there are some sort of retries that happen when HS2 init fails. 
> When HS2 startup fails for an unrelated reason and is retried, the metrics 
> source initialization fails on subsequent attempts. 
> {noformat}
> 2017-08-15T23:31:47,650 WARN  [main]: impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:init(152)) - hiveserver2 metrics system already 
> initialized!
> 2017-08-15T23:31:47,650 ERROR [main]: metastore.HiveMetaStore 
> (HiveMetaStore.java:init(438)) - error in Metrics init: 
> java.lang.reflect.InvocationTargetException null
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.common.metrics.common.MetricsFactory.init(MetricsFactory.java:42)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:435)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:79)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:92)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:6892)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:140)
>   at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1653)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:83)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:133)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3612)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3664)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3644)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
>   at 
> org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:128)
>   at org.apache.hive.service.cli.CLIService.init(CLIService.java:113)
>   at 
> org.apache.hive.service.CompositeService.init(CompositeService.java:59)
>   at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:139)
>   at 
> org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:595)
>   at 
> org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:97)
>   at 
> org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:843)
>   at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:712)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke

[jira] [Assigned] (HIVE-27509) query-executors fail to start: java.lang.NoSuchFieldError: LLAP_LRFU_HOTBUFFERS_PERCENTAGE

2023-07-17 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-27509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar reassigned HIVE-27509:


Assignee: Mayank Kunwar

> query-executors fail to start: java.lang.NoSuchFieldError: 
> LLAP_LRFU_HOTBUFFERS_PERCENTAGE
> --
>
> Key: HIVE-27509
> URL: https://issues.apache.org/jira/browse/HIVE-27509
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Mayank Kunwar
>Assignee: Mayank Kunwar
>Priority: Major
>
>  
> query-executor <11>1 2023-06-27T07:50:29.667Z query-executor-0-0 
> query-executor 1 de9c8ab8-d1f5-4ac3-bc12-28737d5746fd [mdc@18060 
> class="impl.LlapDaemon" level="ERROR" thread="main"] Failed to start LLAP 
> Daemon with exception java.lang.RuntimeException: Failed to create 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl at 
> org.apache.hadoop.hive.llap.io.api.LlapProxy.createInstance(LlapProxy.java:61)
>  at 
> org.apache.hadoop.hive.llap.io.api.LlapProxy.initializeLlapIo(LlapProxy.java:50)
>  at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceInit(LlapDaemon.java:465)
>  at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) 
> at 
> org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:605) 
> Caused by: java.lang.reflect.InvocationTargetException at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>  Method) at 
> java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) 
> at 
> org.apache.hadoop.hive.llap.io.api.LlapProxy.createInstance(LlapProxy.java:59)
>  ... 4 more Caused by: java.lang.NoSuchFieldError: 
> LLAP_LRFU_HOTBUFFERS_PERCENTAGE at 
> org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.(LowLevelLrfuCachePolicy.java:103)
>  at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.(LlapIoImpl.java:181)
>  ... 9 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HIVE-27509) query-executors fail to start: java.lang.NoSuchFieldError: LLAP_LRFU_HOTBUFFERS_PERCENTAGE

2023-07-17 Thread Mayank Kunwar (Jira)
Mayank Kunwar created HIVE-27509:


 Summary: query-executors fail to start: 
java.lang.NoSuchFieldError: LLAP_LRFU_HOTBUFFERS_PERCENTAGE
 Key: HIVE-27509
 URL: https://issues.apache.org/jira/browse/HIVE-27509
 Project: Hive
  Issue Type: Bug
  Components: llap
Reporter: Mayank Kunwar


 

query-executor <11>1 2023-06-27T07:50:29.667Z query-executor-0-0 query-executor 
1 de9c8ab8-d1f5-4ac3-bc12-28737d5746fd [mdc@18060 class="impl.LlapDaemon" 
level="ERROR" thread="main"] Failed to start LLAP Daemon with exception 
java.lang.RuntimeException: Failed to create 
org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl at 
org.apache.hadoop.hive.llap.io.api.LlapProxy.createInstance(LlapProxy.java:61) 
at 
org.apache.hadoop.hive.llap.io.api.LlapProxy.initializeLlapIo(LlapProxy.java:50)
 at 
org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceInit(LlapDaemon.java:465)
 at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at 
org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:605) 
Caused by: java.lang.reflect.InvocationTargetException at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method) at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) 
at 
org.apache.hadoop.hive.llap.io.api.LlapProxy.createInstance(LlapProxy.java:59) 
... 4 more Caused by: java.lang.NoSuchFieldError: 
LLAP_LRFU_HOTBUFFERS_PERCENTAGE at 
org.apache.hadoop.hive.llap.cache.LowLevelLrfuCachePolicy.(LowLevelLrfuCachePolicy.java:103)
 at 
org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.(LlapIoImpl.java:181) 
... 9 more



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26523) Hive job stuck for a long time

2022-09-08 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar updated HIVE-26523:
-
Description: 
The default value of "hive.server2.tez.initialize.default.sessions" is true, 
due to which query was stuck on waiting to choose a session from default queue 
pool as the default queue pool size is set as 1.

[https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3963-L3971]

2022-07-10 16:34:23,831 INFO 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: 
[HiveServer2-Background-Pool: Thread-184167]: Choosing a session from the 
defaultQueuePool
2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
[HiveServer2-Background-Pool: Thread-184167]: Subscribed to counters: [] for 
queryId: hive_20220710163423_c3f3deed-7a41-4865-9ce6-756fc7e6fbb8
2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
[HiveServer2-Background-Pool: Thread-184167]: Session is already open

 

A possible work around is to increase the default value of 
"hive.server2.tez.sessions.per.default.queue"

  was:
The default value of "hive.server2.tez.initialize.default.sessions" is true, 
due to which query was stuck on waiting to choose a session from default queue 
pool as the default queue pool size is set as 1.

https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3963-L3971

2022-07-10 16:34:23,831 INFO 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: 
[HiveServer2-Background-Pool: Thread-184167]: Choosing a session from the 
defaultQueuePool
2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
[HiveServer2-Background-Pool: Thread-184167]: Subscribed to counters: [] for 
queryId: hive_20220710163423_c3f3deed-7a41-4865-9ce6-756fc7e6fbb8
2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
[HiveServer2-Background-Pool: Thread-184167]: Session is already open

 

A possible work around is to increase the value of 
"hive.server2.tez.sessions.per.default.queue"


> Hive job stuck for a long time
> --
>
> Key: HIVE-26523
> URL: https://issues.apache.org/jira/browse/HIVE-26523
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-1
>Reporter: Mayank Kunwar
>Assignee: Mayank Kunwar
>Priority: Major
>
> The default value of "hive.server2.tez.initialize.default.sessions" is true, 
> due to which query was stuck on waiting to choose a session from default 
> queue pool as the default queue pool size is set as 1.
> [https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3963-L3971]
> 2022-07-10 16:34:23,831 INFO 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: 
> [HiveServer2-Background-Pool: Thread-184167]: Choosing a session from the 
> defaultQueuePool
> 2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
> [HiveServer2-Background-Pool: Thread-184167]: Subscribed to counters: [] for 
> queryId: hive_20220710163423_c3f3deed-7a41-4865-9ce6-756fc7e6fbb8
> 2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
> [HiveServer2-Background-Pool: Thread-184167]: Session is already open
>  
> A possible work around is to increase the default value of 
> "hive.server2.tez.sessions.per.default.queue"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-26523) Hive job stuck for a long time

2022-09-08 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar updated HIVE-26523:
-
Description: 
The default value of "hive.server2.tez.initialize.default.sessions" is true, 
due to which query was stuck on waiting to choose a session from default queue 
pool as the default queue pool size is set as 1.

https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3963-L3971

2022-07-10 16:34:23,831 INFO 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: 
[HiveServer2-Background-Pool: Thread-184167]: Choosing a session from the 
defaultQueuePool
2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
[HiveServer2-Background-Pool: Thread-184167]: Subscribed to counters: [] for 
queryId: hive_20220710163423_c3f3deed-7a41-4865-9ce6-756fc7e6fbb8
2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
[HiveServer2-Background-Pool: Thread-184167]: Session is already open

 

A possible work around is to increase the value of 
"hive.server2.tez.sessions.per.default.queue"

  was:
The default value of "hive.server2.tez.initialize.default.sessions" is true, 
due to which query was stuck on waiting to choose a session from default queue 
pool as the default queue pool size is set as 1 .

2022-07-10 16:34:23,831 INFO 
org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: 
[HiveServer2-Background-Pool: Thread-184167]: Choosing a session from the 
defaultQueuePool
2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
[HiveServer2-Background-Pool: Thread-184167]: Subscribed to counters: [] for 
queryId: hive_20220710163423_c3f3deed-7a41-4865-9ce6-756fc7e6fbb8
2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
[HiveServer2-Background-Pool: Thread-184167]: Session is already open

 

A possible work around is to increase the value of 
"hive.server2.tez.sessions.per.default.queue"


> Hive job stuck for a long time
> --
>
> Key: HIVE-26523
> URL: https://issues.apache.org/jira/browse/HIVE-26523
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-1
>Reporter: Mayank Kunwar
>Assignee: Mayank Kunwar
>Priority: Major
>
> The default value of "hive.server2.tez.initialize.default.sessions" is true, 
> due to which query was stuck on waiting to choose a session from default 
> queue pool as the default queue pool size is set as 1.
> https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L3963-L3971
> 2022-07-10 16:34:23,831 INFO 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: 
> [HiveServer2-Background-Pool: Thread-184167]: Choosing a session from the 
> defaultQueuePool
> 2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
> [HiveServer2-Background-Pool: Thread-184167]: Subscribed to counters: [] for 
> queryId: hive_20220710163423_c3f3deed-7a41-4865-9ce6-756fc7e6fbb8
> 2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
> [HiveServer2-Background-Pool: Thread-184167]: Session is already open
>  
> A possible work around is to increase the value of 
> "hive.server2.tez.sessions.per.default.queue"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HIVE-26523) Hive job stuck for a long time

2022-09-08 Thread Mayank Kunwar (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-26523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayank Kunwar reassigned HIVE-26523:



> Hive job stuck for a long time
> --
>
> Key: HIVE-26523
> URL: https://issues.apache.org/jira/browse/HIVE-26523
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 4.0.0-alpha-1
>Reporter: Mayank Kunwar
>Assignee: Mayank Kunwar
>Priority: Major
>
> The default value of "hive.server2.tez.initialize.default.sessions" is true, 
> due to which query was stuck on waiting to choose a session from default 
> queue pool as the default queue pool size is set as 1 .
> 2022-07-10 16:34:23,831 INFO 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager: 
> [HiveServer2-Background-Pool: Thread-184167]: Choosing a session from the 
> defaultQueuePool
> 2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
> [HiveServer2-Background-Pool: Thread-184167]: Subscribed to counters: [] for 
> queryId: hive_20220710163423_c3f3deed-7a41-4865-9ce6-756fc7e6fbb8
> 2022-07-10 18:15:48,295 INFO org.apache.hadoop.hive.ql.exec.tez.TezTask: 
> [HiveServer2-Background-Pool: Thread-184167]: Session is already open
>  
> A possible work around is to increase the value of 
> "hive.server2.tez.sessions.per.default.queue"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)