[jira] [Commented] (HIVE-26376) Hive Metastore connection leak (OOM Error)
[ https://issues.apache.org/jira/browse/HIVE-26376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564447#comment-17564447 ] Ayush Saxena commented on HIVE-26376: - hmm, need to explore, was trying to find if there is any related Jira in Hadoop, found couple of them, but HDFS-3545 looked going near hive as well, so may be related. I am also not very sure about the auth setup and all. One more question, the hive user with which we are getting via UserGroupInformation.getCurrentUser() is this the user with which HMS service started or the one from the client, provided we are not using impersonation also? if it is not of the end client, then Subject shouldn't change right? And one more doubt as well: This is what we saw in Hive-Replication. The FileSystem was cached and we were closing the FileSystem after shooting a DistCp job for data copy, So, since both the threads used the same cached FileSystem, so when one thread closed the FileSystem, the other Thread started giving FileSystem closed exceptions during clean up task after MR jobs under race conditions. So, this is also something we should take care, we don't land up in such a situation. The same cached filesystem shouldn't be getting used at more than one place, else closing at one place will screw up the other one as well. > Hive Metastore connection leak (OOM Error) > -- > > Key: HIVE-26376 > URL: https://issues.apache.org/jira/browse/HIVE-26376 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 > Environment: !Screenshot 2022-07-07 at 11.52.33 AM.png! >Reporter: Ranith Sardar >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2022-07-07 at 11.52.33 AM.png > > Time Spent: 10m > Remaining Estimate: 0h > > Hive version:3.1.2 > Hive metastore heap size is 14GB, Memory Leak is happening after 4-5 days, > hive meta-store throwing error with OOM. > If we disable the configuration, the memory leak disappears. > In the case of, Heap dump size 3.5GB, a large number of filesystem objects(> > 9k instances) are being retained. It's occupying most of the heap space. > Added snapshot from the eclipse MAT. > Bellow are part of the stack trace for OOM error: > {code:java} > at > org.apache.hadoop.hive.common.FileUtils.getFileStatusOrNull(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FileStatus; > (FileUtils.java:801) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.checkPermissions(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/Path;Ljava/util/EnumSet;)V > (StorageBasedAuthorizationProvider.java:371) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:346) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/hive/metastore/api/Database;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:154) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeReadDatabase(Lorg/apache/hadoop/hive/metastore/events/PreReadDatabaseEvent;)V > (AuthorizationPreEventListener.java:208) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (AuthorizationPreEventListener.java:153) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (HiveMetaStore.java:3221) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Database; > (HiveMetaStore.java:1352){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26376) Hive Metastore connection leak (OOM Error)
[ https://issues.apache.org/jira/browse/HIVE-26376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564366#comment-17564366 ] Stamatis Zampetakis commented on HIVE-26376: [~ayushtkn] I am not that familiar with the user authentication logic but even if we are not doing impersonation and we are connecting with the current user (UserGroupInformation.getCurrentUser()) I think we may still have multiple Java objects representing the same user. If for instance we have Kerberos authentication then each time we get a new \{{Subject}} for the "hive" user we are gonna have a new \{{UserGroupInformation}} instance. I am not 100% sure about what I am saying so please correct me if I am wrong. > Hive Metastore connection leak (OOM Error) > -- > > Key: HIVE-26376 > URL: https://issues.apache.org/jira/browse/HIVE-26376 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 > Environment: !Screenshot 2022-07-07 at 11.52.33 AM.png! >Reporter: Ranith Sardar >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2022-07-07 at 11.52.33 AM.png > > Time Spent: 10m > Remaining Estimate: 0h > > Hive version:3.1.2 > Hive metastore heap size is 14GB, Memory Leak is happening after 4-5 days, > hive meta-store throwing error with OOM. > If we disable the configuration, the memory leak disappears. > In the case of, Heap dump size 3.5GB, a large number of filesystem objects(> > 9k instances) are being retained. It's occupying most of the heap space. > Added snapshot from the eclipse MAT. > Bellow are part of the stack trace for OOM error: > {code:java} > at > org.apache.hadoop.hive.common.FileUtils.getFileStatusOrNull(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FileStatus; > (FileUtils.java:801) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.checkPermissions(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/Path;Ljava/util/EnumSet;)V > (StorageBasedAuthorizationProvider.java:371) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:346) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/hive/metastore/api/Database;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:154) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeReadDatabase(Lorg/apache/hadoop/hive/metastore/events/PreReadDatabaseEvent;)V > (AuthorizationPreEventListener.java:208) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (AuthorizationPreEventListener.java:153) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (HiveMetaStore.java:3221) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Database; > (HiveMetaStore.java:1352){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26371) Constant propagation does not evaluate constraint expressions at merge when CBO is enabled
[ https://issues.apache.org/jira/browse/HIVE-26371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Kasa resolved HIVE-26371. --- Resolution: Fixed Pushed to master. Thanks [~zabetak] for review. > Constant propagation does not evaluate constraint expressions at merge when > CBO is enabled > -- > > Key: HIVE-26371 > URL: https://issues.apache.org/jira/browse/HIVE-26371 > Project: Hive > Issue Type: Bug > Components: CBO, Logical Optimizer >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Prior HIVE-23089 check and not null constraint violations may detected faster > when merging. > {code:java} > CREATE TABLE t_target( > name string CHECK (length(name)<=20), > age int, > gpa double CHECK (gpa BETWEEN 0.0 AND 4.0)) > stored as orc TBLPROPERTIES ('transactional'='true'); > CREATE TABLE t_source( > name string, > age int, > gpa double); > insert into t_source(name, age, gpa) values ('student1', 16, null); > insert into t_target(name, age, gpa) values ('student1', 16, 2.0); > merge into t_target using t_source source on source.age=t_target.age when > matched then update set gpa=6; > {code} > Currently CBO can not handle constraint checks when merging so the filter > operator with the {{enforce_constraint}} call is added to the Hive operator > plan after CBO is succeeded and {{ConstantPropagate}} optimization is called > only from TezCompiler with {{{}ConstantPropagateOption.SHORTCUT{}}}. > With this option {{ConstantPropagate}} does not evaluate deterministic > functions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26371) Constant propagation does not evaluate constraint expressions at merge when CBO is enabled
[ https://issues.apache.org/jira/browse/HIVE-26371?focusedWorklogId=788938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788938 ] ASF GitHub Bot logged work on HIVE-26371: - Author: ASF GitHub Bot Created on: 08/Jul/22 11:08 Start Date: 08/Jul/22 11:08 Worklog Time Spent: 10m Work Description: kasakrisz merged PR #3415: URL: https://github.com/apache/hive/pull/3415 Issue Time Tracking --- Worklog Id: (was: 788938) Time Spent: 20m (was: 10m) > Constant propagation does not evaluate constraint expressions at merge when > CBO is enabled > -- > > Key: HIVE-26371 > URL: https://issues.apache.org/jira/browse/HIVE-26371 > Project: Hive > Issue Type: Bug > Components: CBO, Logical Optimizer >Reporter: Krisztian Kasa >Assignee: Krisztian Kasa >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Prior HIVE-23089 check and not null constraint violations may detected faster > when merging. > {code:java} > CREATE TABLE t_target( > name string CHECK (length(name)<=20), > age int, > gpa double CHECK (gpa BETWEEN 0.0 AND 4.0)) > stored as orc TBLPROPERTIES ('transactional'='true'); > CREATE TABLE t_source( > name string, > age int, > gpa double); > insert into t_source(name, age, gpa) values ('student1', 16, null); > insert into t_target(name, age, gpa) values ('student1', 16, 2.0); > merge into t_target using t_source source on source.age=t_target.age when > matched then update set gpa=6; > {code} > Currently CBO can not handle constraint checks when merging so the filter > operator with the {{enforce_constraint}} call is added to the Hive operator > plan after CBO is succeeded and {{ConstantPropagate}} optimization is called > only from TezCompiler with {{{}ConstantPropagateOption.SHORTCUT{}}}. > With this option {{ConstantPropagate}} does not evaluate deterministic > functions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26373) ClassCastException when reading timestamps from HBase table with Avro data
[ https://issues.apache.org/jira/browse/HIVE-26373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis resolved HIVE-26373. Fix Version/s: 4.0.0-alpha-2 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/97d7630bca10e96229519ab397f5cf122e5622e3.] Thanks for the PR [~soumyakanti.das] ! > ClassCastException when reading timestamps from HBase table with Avro data > -- > > Key: HIVE-26373 > URL: https://issues.apache.org/jira/browse/HIVE-26373 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Consider an HBase table (e.g., HiveAvroTable) that has column with Avro data > and there are timestamps nested under complex/struct types. > {code:sql} > CREATE EXTERNAL TABLE hbase_avro_table( > `key` string COMMENT '', > `data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.hbase.HBaseSerDe' > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ( > 'serialization.format'='1', > 'hbase.columns.mapping' = ':key,data:frV4', > 'data.frV4.serialization.type'='avro', > 'data.frV4.avro.schema.url'='path/to/avro/schema/for/column/filename.avsc' > ) > TBLPROPERTIES ( > 'hbase.table.name' = 'HiveAvroTable', > 'hbase.struct.autogenerate'='true'); > {code} > Any attempt to read the timestamp value from the nested struct leads to a > {{{}ClassCastException{}}}. > {code:sql} > select data_frV4.dischargedate.value from hbase_avro_table; > {code} > Below you can find the stack trace for the previous query: > {noformat} > 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] > mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > org.apache.hadoop.hive.serde2.lazy.LazyPrimitive > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) > at > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) > at > org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 11 more > {noformat} > The problem starts in {{toLazyObject}} method of >
[jira] [Commented] (HIVE-26373) ClassCastException when reading timestamps from HBase table with Avro data
[ https://issues.apache.org/jira/browse/HIVE-26373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564233#comment-17564233 ] Stamatis Zampetakis commented on HIVE-26373: I updated the summary and description to better reflect the problem. > ClassCastException when reading timestamps from HBase table with Avro data > -- > > Key: HIVE-26373 > URL: https://issues.apache.org/jira/browse/HIVE-26373 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Consider an HBase table (e.g., HiveAvroTable) that has column with Avro data > and there are timestamps nested under complex/struct types. > {code:sql} > CREATE EXTERNAL TABLE hbase_avro_table( > `key` string COMMENT '', > `data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.hbase.HBaseSerDe' > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ( > 'serialization.format'='1', > 'hbase.columns.mapping' = ':key,data:frV4', > 'data.frV4.serialization.type'='avro', > 'data.frV4.avro.schema.url'='path/to/avro/schema/for/column/filename.avsc' > ) > TBLPROPERTIES ( > 'hbase.table.name' = 'HiveAvroTable', > 'hbase.struct.autogenerate'='true'); > {code} > Any attempt to read the timestamp value from the nested struct leads to a > {{{}ClassCastException{}}}. > {code:sql} > select data_frV4.dischargedate.value from hbase_avro_table; > {code} > Below you can find the stack trace for the previous query: > {noformat} > 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] > mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > org.apache.hadoop.hive.serde2.lazy.LazyPrimitive > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) > at > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) > at > org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 11 more > {noformat} > The problem starts in {{toLazyObject}} method of > {*}AvroLazyObjectInspector.java{*}, when >
[jira] [Updated] (HIVE-26373) ClassCastException when reading timestamps from HBase table with Avro data
[ https://issues.apache.org/jira/browse/HIVE-26373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-26373: --- Description: Consider an HBase table (e.g., HiveAvroTable) that has column with Avro data and there are timestamps nested under complex/struct types. {code:sql} CREATE EXTERNAL TABLE hbase_avro_table( `key` string COMMENT '', `data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'serialization.format'='1', 'hbase.columns.mapping' = ':key,data:frV4', 'data.frV4.serialization.type'='avro', 'data.frV4.avro.schema.url'='path/to/avro/schema/for/column/filename.avsc' ) TBLPROPERTIES ( 'hbase.table.name' = 'HiveAvroTable', 'hbase.struct.autogenerate'='true'); {code} Any attempt to read the timestamp value from the nested struct leads to a {{{}ClassCastException{}}}. {code:sql} select data_frV4.dischargedate.value from hbase_avro_table; {code} Below you can find the stack trace for the previous query: {noformat} 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.common.type.Timestamp cannot be cast to org.apache.hadoop.hive.serde2.lazy.LazyPrimitive at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) at org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) at org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) at org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) ... 11 more {noformat} The problem starts in {{toLazyObject}} method of {*}AvroLazyObjectInspector.java{*}, when [this|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L347] condition returns false for {*}Timestamp{*}, preventing the conversion of *Timestamp* to *LazyTimestamp* [here|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java#L132]. The solution is to return {{true}} for Timestamps in the {{isPrimitive}} method. was: For Avro data where the schema has nested struct with a Timestamp datatype, we get the following ClassCastException: {code:java} 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
[jira] [Commented] (HIVE-26376) Hive Metastore connection leak (OOM Error)
[ https://issues.apache.org/jira/browse/HIVE-26376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564232#comment-17564232 ] Ayush Saxena commented on HIVE-26376: - [~zabetak] This FileSystem isn't per user? then this will always be 1, irrespective of the number of calls, right? {code:java} final FileSystem fs = path.getFileSystem(conf); {code} When it is called per user, it is closed in this method in the finally block: {code:java} public static void checkFileAccessWithImpersonation(final FileSystem fs, final FileStatus stat, final FsAction action, final String user) throws Exception {code} > Hive Metastore connection leak (OOM Error) > -- > > Key: HIVE-26376 > URL: https://issues.apache.org/jira/browse/HIVE-26376 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 > Environment: !Screenshot 2022-07-07 at 11.52.33 AM.png! >Reporter: Ranith Sardar >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2022-07-07 at 11.52.33 AM.png > > Time Spent: 10m > Remaining Estimate: 0h > > Hive version:3.1.2 > Hive metastore heap size is 14GB, Memory Leak is happening after 4-5 days, > hive meta-store throwing error with OOM. > If we disable the configuration, the memory leak disappears. > In the case of, Heap dump size 3.5GB, a large number of filesystem objects(> > 9k instances) are being retained. It's occupying most of the heap space. > Added snapshot from the eclipse MAT. > Bellow are part of the stack trace for OOM error: > {code:java} > at > org.apache.hadoop.hive.common.FileUtils.getFileStatusOrNull(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FileStatus; > (FileUtils.java:801) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.checkPermissions(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/Path;Ljava/util/EnumSet;)V > (StorageBasedAuthorizationProvider.java:371) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:346) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/hive/metastore/api/Database;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:154) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeReadDatabase(Lorg/apache/hadoop/hive/metastore/events/PreReadDatabaseEvent;)V > (AuthorizationPreEventListener.java:208) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (AuthorizationPreEventListener.java:153) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (HiveMetaStore.java:3221) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Database; > (HiveMetaStore.java:1352){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26373) ClassCastException when reading timestamps from HBase table with Avro data
[ https://issues.apache.org/jira/browse/HIVE-26373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis updated HIVE-26373: --- Summary: ClassCastException when reading timestamps from HBase table with Avro data (was: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp) > ClassCastException when reading timestamps from HBase table with Avro data > -- > > Key: HIVE-26373 > URL: https://issues.apache.org/jira/browse/HIVE-26373 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > For Avro data where the schema has nested struct with a Timestamp datatype, > we get the following ClassCastException: > {code:java} > 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] > mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > org.apache.hadoop.hive.serde2.lazy.LazyPrimitive > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) > at > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) > at > org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 11 more {code} > The problem starts in {{toLazyObject}} method of > {*}AvroLazyObjectInspector.java{*}, when > [this|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L347] > condition returns false for {*}Timestamp{*}, preventing the conversion of > *Timestamp* to *LazyTimestamp* > [here|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java#L132]. > The solution is to return {{true}} for Timestamps in the {{isPrimitive}} > method. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26373) ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp
[ https://issues.apache.org/jira/browse/HIVE-26373?focusedWorklogId=788932=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788932 ] ASF GitHub Bot logged work on HIVE-26373: - Author: ASF GitHub Bot Created on: 08/Jul/22 10:40 Start Date: 08/Jul/22 10:40 Worklog Time Spent: 10m Work Description: zabetak closed pull request #3418: HIVE-26373: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp URL: https://github.com/apache/hive/pull/3418 Issue Time Tracking --- Worklog Id: (was: 788932) Time Spent: 1h 10m (was: 1h) > ClassCastException while inserting Avro data into Hbase table for nested > struct with Timestamp > -- > > Key: HIVE-26373 > URL: https://issues.apache.org/jira/browse/HIVE-26373 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Soumyakanti Das >Assignee: Soumyakanti Das >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > For Avro data where the schema has nested struct with a Timestamp datatype, > we get the following ClassCastException: > {code:java} > 2022-07-05T08:40:51,572 ERROR [LocalJobRunner Map Task Executor #0] > mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime > Error while processing row > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:148) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapRunner.run(ExecMapRunner.java:37) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.ClassCastException: > org.apache.hadoop.hive.common.type.Timestamp cannot be cast to > org.apache.hadoop.hive.serde2.lazy.LazyPrimitive > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.AbstractPrimitiveLazyObjectInspector.getPrimitiveWritableObject(AbstractPrimitiveLazyObjectInspector.java:40) > at > org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyTimestampObjectInspector.getPrimitiveWritableObject(LazyTimestampObjectInspector.java:29) > at > org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247) > at > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231) > at > org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:1059) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:937) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:128) > at > org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) > ... 11 more {code} > The problem starts in {{toLazyObject}} method of > {*}AvroLazyObjectInspector.java{*}, when > [this|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroLazyObjectInspector.java#L347] > condition returns false for {*}Timestamp{*}, preventing the conversion of > *Timestamp* to *LazyTimestamp* > [here|https://github.com/apache/hive/blob/53009126f6fe7ccf24cf052fd6c156542f38b19d/serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyFactory.java#L132]. > The solution is to return {{true}} for Timestamps in the {{isPrimitive}} > method. -- This message
[jira] [Comment Edited] (HIVE-24083) hcatalog error in Hadoop 3.3.0: authentication type needed
[ https://issues.apache.org/jira/browse/HIVE-24083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564223#comment-17564223 ] Anmol Sundaram edited comment on HIVE-24083 at 7/8/22 10:35 AM: I was able to get it run by the following (temporary?) fix : {code:java} diff --git a/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java b/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java index d183b2e61b..5e5c4132f4 100644 — a/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java +++ b/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java @@ -38,6 +38,8 @@ import org.apache.hadoop.hive.shims.Utils; import org.apache.hadoop.security.UserGroupInformation; import org.apache.hadoop.security.authentication.client.PseudoAuthenticator; +import org.apache.hadoop.security.authentication.server.AuthenticationFilter; +import org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler; import org.apache.hadoop.security.authentication.server.PseudoAuthenticationHandler; import org.apache.hadoop.security.SecurityUtil; import org.apache.hadoop.util.GenericOptionsParser; @@ -269,6 +271,11 @@ public FilterHolder makeAuthFilter() throws IOException { authFilter.setInitParameter("dfs.web.authentication.kerberos.keytab", conf.kerberosKeytab()); } + + authFilter.setInitParameter(AuthenticationFilter.AUTH_TYPE, UserGroupInformation.isSecurityEnabled() ? + KerberosAuthenticationHandler.TYPE : + PseudoAuthenticationHandler.TYPE); + return authFilter; }{code} was (Author: JIRAUSER288438): I was able to get it run by the following (temporary?) fix : diff --git a/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java b/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java index d183b2e61b..5e5c4132f4 100644 --- a/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java +++ b/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java @@ -38,6 +38,8 @@ import org.apache.hadoop.hive.shims.Utils; import org.apache.hadoop.security.UserGroupInformation; import org.apache.hadoop.security.authentication.client.PseudoAuthenticator; +import org.apache.hadoop.security.authentication.server.AuthenticationFilter; +import org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler; import org.apache.hadoop.security.authentication.server.PseudoAuthenticationHandler; import org.apache.hadoop.security.SecurityUtil; import org.apache.hadoop.util.GenericOptionsParser; @@ -269,6 +271,11 @@ public FilterHolder makeAuthFilter() throws IOException \{ authFilter.setInitParameter("dfs.web.authentication.kerberos.keytab", conf.kerberosKeytab()); } + +authFilter.setInitParameter(AuthenticationFilter.AUTH_TYPE, UserGroupInformation.isSecurityEnabled() ? +KerberosAuthenticationHandler.TYPE : +PseudoAuthenticationHandler.TYPE); + return authFilter; } > hcatalog error in Hadoop 3.3.0: authentication type needed > -- > > Key: HIVE-24083 > URL: https://issues.apache.org/jira/browse/HIVE-24083 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.1.2 >Reporter: Javier J. Salmeron Garcia >Priority: Minor > > Using Hive 3.1.2, webhcat fails to start in Hadoop 3.3.0 with the following > error: > ``` > javax.servlet.ServletException: Authentication type must be specified: > simple|kerberos > ``` > I tried in Hadoop 3.2.1 with the exact settings and it starts without issues: > > ``` > webhcat: /tmp/hadoop-3.2.1//bin/hadoop jar > /opt/bitnami/hadoop/hive/hcatalog/sbin/../share/webhcat/svr/lib/hive-webhcat-3.1.2.jar > org.apache.hive.hcatalog.templeton.Main > webhcat: starting ... started. > webhcat: done > ``` > > I can provide more logs if needed. Detected authentication settings: > > ``` > hadoop.http.authentication.simple.anonymous.allowed=true > hadoop.http.authentication.type=simple > hadoop.security.authentication=simple > ipc.client.fallback-to-simple-auth-allowed=false > yarn.timeline-service.http-authentication.simple.anonymous.allowed=true > yarn.timeline-service.http-authentication.type=simple > ``` > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-24083) hcatalog error in Hadoop 3.3.0: authentication type needed
[ https://issues.apache.org/jira/browse/HIVE-24083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564223#comment-17564223 ] Anmol Sundaram commented on HIVE-24083: --- I was able to get it run by the following (temporary?) fix : diff --git a/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java b/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java index d183b2e61b..5e5c4132f4 100644 --- a/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java +++ b/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/Main.java @@ -38,6 +38,8 @@ import org.apache.hadoop.hive.shims.Utils; import org.apache.hadoop.security.UserGroupInformation; import org.apache.hadoop.security.authentication.client.PseudoAuthenticator; +import org.apache.hadoop.security.authentication.server.AuthenticationFilter; +import org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler; import org.apache.hadoop.security.authentication.server.PseudoAuthenticationHandler; import org.apache.hadoop.security.SecurityUtil; import org.apache.hadoop.util.GenericOptionsParser; @@ -269,6 +271,11 @@ public FilterHolder makeAuthFilter() throws IOException \{ authFilter.setInitParameter("dfs.web.authentication.kerberos.keytab", conf.kerberosKeytab()); } + +authFilter.setInitParameter(AuthenticationFilter.AUTH_TYPE, UserGroupInformation.isSecurityEnabled() ? +KerberosAuthenticationHandler.TYPE : +PseudoAuthenticationHandler.TYPE); + return authFilter; } > hcatalog error in Hadoop 3.3.0: authentication type needed > -- > > Key: HIVE-24083 > URL: https://issues.apache.org/jira/browse/HIVE-24083 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 3.1.2 >Reporter: Javier J. Salmeron Garcia >Priority: Minor > > Using Hive 3.1.2, webhcat fails to start in Hadoop 3.3.0 with the following > error: > ``` > javax.servlet.ServletException: Authentication type must be specified: > simple|kerberos > ``` > I tried in Hadoop 3.2.1 with the exact settings and it starts without issues: > > ``` > webhcat: /tmp/hadoop-3.2.1//bin/hadoop jar > /opt/bitnami/hadoop/hive/hcatalog/sbin/../share/webhcat/svr/lib/hive-webhcat-3.1.2.jar > org.apache.hive.hcatalog.templeton.Main > webhcat: starting ... started. > webhcat: done > ``` > > I can provide more logs if needed. Detected authentication settings: > > ``` > hadoop.http.authentication.simple.anonymous.allowed=true > hadoop.http.authentication.type=simple > hadoop.security.authentication=simple > ipc.client.fallback-to-simple-auth-allowed=false > yarn.timeline-service.http-authentication.simple.anonymous.allowed=true > yarn.timeline-service.http-authentication.type=simple > ``` > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26376) Hive Metastore connection leak (OOM Error)
[ https://issues.apache.org/jira/browse/HIVE-26376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564220#comment-17564220 ] Stamatis Zampetakis commented on HIVE-26376: [~RANith] Thanks for reporting this. I believe the problem starts from the fact that there are various places where we are not closing the Filesystem objects thus they keep accumulate in the CACHE. In this case the class to blame seems to be the {{StorageBasedAuthorizationProvider}}. I created a PR with a proposed fix, can you please test the patch and let us know if it solves your problem. > Hive Metastore connection leak (OOM Error) > -- > > Key: HIVE-26376 > URL: https://issues.apache.org/jira/browse/HIVE-26376 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 > Environment: !Screenshot 2022-07-07 at 11.52.33 AM.png! >Reporter: Ranith Sardar >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2022-07-07 at 11.52.33 AM.png > > Time Spent: 10m > Remaining Estimate: 0h > > Hive version:3.1.2 > Hive metastore heap size is 14GB, Memory Leak is happening after 4-5 days, > hive meta-store throwing error with OOM. > If we disable the configuration, the memory leak disappears. > In the case of, Heap dump size 3.5GB, a large number of filesystem objects(> > 9k instances) are being retained. It's occupying most of the heap space. > Added snapshot from the eclipse MAT. > Bellow are part of the stack trace for OOM error: > {code:java} > at > org.apache.hadoop.hive.common.FileUtils.getFileStatusOrNull(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FileStatus; > (FileUtils.java:801) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.checkPermissions(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/Path;Ljava/util/EnumSet;)V > (StorageBasedAuthorizationProvider.java:371) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:346) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/hive/metastore/api/Database;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:154) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeReadDatabase(Lorg/apache/hadoop/hive/metastore/events/PreReadDatabaseEvent;)V > (AuthorizationPreEventListener.java:208) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (AuthorizationPreEventListener.java:153) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (HiveMetaStore.java:3221) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Database; > (HiveMetaStore.java:1352){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26376) Hive Metastore connection leak (OOM Error)
[ https://issues.apache.org/jira/browse/HIVE-26376?focusedWorklogId=788922=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788922 ] ASF GitHub Bot logged work on HIVE-26376: - Author: ASF GitHub Bot Created on: 08/Jul/22 10:26 Start Date: 08/Jul/22 10:26 Worklog Time Spent: 10m Work Description: zabetak opened a new pull request, #3424: URL: https://github.com/apache/hive/pull/3424 ### What changes were proposed in this pull request? Close filesystem references in StorageBasedAuthorizationProvider after use. ### Why are the changes needed? To prevent leaving filesystem objects and hitting OOM error. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Manual tests. Issue Time Tracking --- Worklog Id: (was: 788922) Remaining Estimate: 0h Time Spent: 10m > Hive Metastore connection leak (OOM Error) > -- > > Key: HIVE-26376 > URL: https://issues.apache.org/jira/browse/HIVE-26376 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 > Environment: !Screenshot 2022-07-07 at 11.52.33 AM.png! >Reporter: Ranith Sardar >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screenshot 2022-07-07 at 11.52.33 AM.png > > Time Spent: 10m > Remaining Estimate: 0h > > Hive version:3.1.2 > Hive metastore heap size is 14GB, Memory Leak is happening after 4-5 days, > hive meta-store throwing error with OOM. > If we disable the configuration, the memory leak disappears. > In the case of, Heap dump size 3.5GB, a large number of filesystem objects(> > 9k instances) are being retained. It's occupying most of the heap space. > Added snapshot from the eclipse MAT. > Bellow are part of the stack trace for OOM error: > {code:java} > at > org.apache.hadoop.hive.common.FileUtils.getFileStatusOrNull(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FileStatus; > (FileUtils.java:801) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.checkPermissions(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/Path;Ljava/util/EnumSet;)V > (StorageBasedAuthorizationProvider.java:371) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:346) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/hive/metastore/api/Database;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:154) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeReadDatabase(Lorg/apache/hadoop/hive/metastore/events/PreReadDatabaseEvent;)V > (AuthorizationPreEventListener.java:208) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (AuthorizationPreEventListener.java:153) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (HiveMetaStore.java:3221) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Database; > (HiveMetaStore.java:1352){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-26376) Hive Metastore connection leak (OOM Error)
[ https://issues.apache.org/jira/browse/HIVE-26376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-26376: -- Labels: pull-request-available (was: ) > Hive Metastore connection leak (OOM Error) > -- > > Key: HIVE-26376 > URL: https://issues.apache.org/jira/browse/HIVE-26376 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 > Environment: !Screenshot 2022-07-07 at 11.52.33 AM.png! >Reporter: Ranith Sardar >Assignee: Stamatis Zampetakis >Priority: Major > Labels: pull-request-available > Attachments: Screenshot 2022-07-07 at 11.52.33 AM.png > > Time Spent: 10m > Remaining Estimate: 0h > > Hive version:3.1.2 > Hive metastore heap size is 14GB, Memory Leak is happening after 4-5 days, > hive meta-store throwing error with OOM. > If we disable the configuration, the memory leak disappears. > In the case of, Heap dump size 3.5GB, a large number of filesystem objects(> > 9k instances) are being retained. It's occupying most of the heap space. > Added snapshot from the eclipse MAT. > Bellow are part of the stack trace for OOM error: > {code:java} > at > org.apache.hadoop.hive.common.FileUtils.getFileStatusOrNull(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FileStatus; > (FileUtils.java:801) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.checkPermissions(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/Path;Ljava/util/EnumSet;)V > (StorageBasedAuthorizationProvider.java:371) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:346) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/hive/metastore/api/Database;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:154) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeReadDatabase(Lorg/apache/hadoop/hive/metastore/events/PreReadDatabaseEvent;)V > (AuthorizationPreEventListener.java:208) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (AuthorizationPreEventListener.java:153) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (HiveMetaStore.java:3221) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Database; > (HiveMetaStore.java:1352){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26376) Hive Metastore connection leak (OOM Error)
[ https://issues.apache.org/jira/browse/HIVE-26376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stamatis Zampetakis reassigned HIVE-26376: -- Assignee: Stamatis Zampetakis > Hive Metastore connection leak (OOM Error) > -- > > Key: HIVE-26376 > URL: https://issues.apache.org/jira/browse/HIVE-26376 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 > Environment: !Screenshot 2022-07-07 at 11.52.33 AM.png! >Reporter: Ranith Sardar >Assignee: Stamatis Zampetakis >Priority: Major > Attachments: Screenshot 2022-07-07 at 11.52.33 AM.png > > > Hive version:3.1.2 > Hive metastore heap size is 14GB, Memory Leak is happening after 4-5 days, > hive meta-store throwing error with OOM. > If we disable the configuration, the memory leak disappears. > In the case of, Heap dump size 3.5GB, a large number of filesystem objects(> > 9k instances) are being retained. It's occupying most of the heap space. > Added snapshot from the eclipse MAT. > Bellow are part of the stack trace for OOM error: > {code:java} > at > org.apache.hadoop.hive.common.FileUtils.getFileStatusOrNull(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FileStatus; > (FileUtils.java:801) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.checkPermissions(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/Path;Ljava/util/EnumSet;)V > (StorageBasedAuthorizationProvider.java:371) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:346) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/hive/metastore/api/Database;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:154) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeReadDatabase(Lorg/apache/hadoop/hive/metastore/events/PreReadDatabaseEvent;)V > (AuthorizationPreEventListener.java:208) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (AuthorizationPreEventListener.java:153) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (HiveMetaStore.java:3221) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Database; > (HiveMetaStore.java:1352){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26376) Hive Metastore connection leak (OOM Error)
[ https://issues.apache.org/jira/browse/HIVE-26376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564210#comment-17564210 ] Ranith Sardar commented on HIVE-26376: -- yes, [~asolimando] [~ayushtkn] Disabled fs.hdfs.impl.disable.cache property in HDFS level. > Hive Metastore connection leak (OOM Error) > -- > > Key: HIVE-26376 > URL: https://issues.apache.org/jira/browse/HIVE-26376 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.1.2 > Environment: !Screenshot 2022-07-07 at 11.52.33 AM.png! >Reporter: Ranith Sardar >Priority: Major > Attachments: Screenshot 2022-07-07 at 11.52.33 AM.png > > > Hive version:3.1.2 > Hive metastore heap size is 14GB, Memory Leak is happening after 4-5 days, > hive meta-store throwing error with OOM. > If we disable the configuration, the memory leak disappears. > In the case of, Heap dump size 3.5GB, a large number of filesystem objects(> > 9k instances) are being retained. It's occupying most of the heap space. > Added snapshot from the eclipse MAT. > Bellow are part of the stack trace for OOM error: > {code:java} > at > org.apache.hadoop.hive.common.FileUtils.getFileStatusOrNull(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/FileStatus; > (FileUtils.java:801) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.checkPermissions(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/fs/Path;Ljava/util/EnumSet;)V > (StorageBasedAuthorizationProvider.java:371) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:346) > at > org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider.authorize(Lorg/apache/hadoop/hive/metastore/api/Database;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;[Lorg/apache/hadoop/hive/ql/security/authorization/Privilege;)V > (StorageBasedAuthorizationProvider.java:154) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.authorizeReadDatabase(Lorg/apache/hadoop/hive/metastore/events/PreReadDatabaseEvent;)V > (AuthorizationPreEventListener.java:208) > at > org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener.onEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (AuthorizationPreEventListener.java:153) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.firePreEvent(Lorg/apache/hadoop/hive/metastore/events/PreEventContext;)V > (HiveMetaStore.java:3221) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(Ljava/lang/String;)Lorg/apache/hadoop/hive/metastore/api/Database; > (HiveMetaStore.java:1352){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26337) Duplicate and single event log under NOTIFICATION_LOG for drop-partition unlike add-partition event which overloads the metadata table
[ https://issues.apache.org/jira/browse/HIVE-26337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564178#comment-17564178 ] Ayush Saxena commented on HIVE-26337: - hmm, that isn't something good, if it is true, this would bother hive-replication performance as well. Do you plan to raise a PR with the fix, let me know if you face any issues there, I can try help... > Duplicate and single event log under NOTIFICATION_LOG for drop-partition > unlike add-partition event which overloads the metadata table > -- > > Key: HIVE-26337 > URL: https://issues.apache.org/jira/browse/HIVE-26337 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.1.0 > Environment: HDInsight 4.1.8 > Hive 3.1.2 >Reporter: Sindhu Subhas >Priority: Major > > Multiple events are generated for each drop partition under NOTIFICATION_LOG > whereas a single event is generated for add_partition during msck sync > partitions as part of partition management discovery. > Say, Hive is to add 5 new partitions and remove 100 partitions into SQL > server, one event is generated for 1 add_partition event entry whereas 100 > drop_partitions events are generated under NOTIFICATION_LOG. This results in > overloading of the table and also indexes associated with the table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-24484) Upgrade Hadoop to 3.3.1 And Tez to 0.10.2
[ https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564175#comment-17564175 ] Ayush Saxena commented on HIVE-24484: - Moving to 3.3.1 & clubbed with Tez upgrade to 0.10.2, since both needs to go together for a green run. Changed the title to reflect that. We have the tests sorted now in the PR, once we get an official Tez release, most probably within 2 weeks, will go ahead and commit the PR. > Upgrade Hadoop to 3.3.1 And Tez to 0.10.2 > -- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 13.05h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HIVE-24484) Upgrade Hadoop to 3.3.1 And Tez to 0.10.2
[ https://issues.apache.org/jira/browse/HIVE-24484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena updated HIVE-24484: Summary: Upgrade Hadoop to 3.3.1 And Tez to 0.10.2 (was: Upgrade Hadoop to 3.3.1) > Upgrade Hadoop to 3.3.1 And Tez to 0.10.2 > -- > > Key: HIVE-24484 > URL: https://issues.apache.org/jira/browse/HIVE-24484 > Project: Hive > Issue Type: Improvement >Reporter: David Mollitor >Assignee: Ayush Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 13.05h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (HIVE-26378) Improve error message for masking over complex data types
[ https://issues.apache.org/jira/browse/HIVE-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HIVE-26378. - Fix Version/s: 4.0.0-alpha-2 Resolution: Fixed > Improve error message for masking over complex data types > - > > Key: HIVE-26378 > URL: https://issues.apache.org/jira/browse/HIVE-26378 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Security >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0-alpha-2 > > Time Spent: 20m > Remaining Estimate: 0h > > The current error when applying column masking over (unsupported) complex > data types could be improved and be more explicit. > Currently, the thrown error is as follows: > {noformat} > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException: > line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type > specification > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146) > ... 15 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize > input near 'map' '<' 'string' in primitive type specification > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214) > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171) > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HIVE-26378) Improve error message for masking over complex data types
[ https://issues.apache.org/jira/browse/HIVE-26378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564166#comment-17564166 ] Ayush Saxena commented on HIVE-26378: - Committed to master. Thanx [~asolimando] for the contribution!!! > Improve error message for masking over complex data types > - > > Key: HIVE-26378 > URL: https://issues.apache.org/jira/browse/HIVE-26378 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Security >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The current error when applying column masking over (unsupported) complex > data types could be improved and be more explicit. > Currently, the thrown error is as follows: > {noformat} > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException: > line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type > specification > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146) > ... 15 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize > input near 'map' '<' 'string' in primitive type specification > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214) > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171) > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Work logged] (HIVE-26378) Improve error message for masking over complex data types
[ https://issues.apache.org/jira/browse/HIVE-26378?focusedWorklogId=788890=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-788890 ] ASF GitHub Bot logged work on HIVE-26378: - Author: ASF GitHub Bot Created on: 08/Jul/22 08:52 Start Date: 08/Jul/22 08:52 Worklog Time Spent: 10m Work Description: ayushtkn merged PR #3421: URL: https://github.com/apache/hive/pull/3421 Issue Time Tracking --- Worklog Id: (was: 788890) Time Spent: 20m (was: 10m) > Improve error message for masking over complex data types > - > > Key: HIVE-26378 > URL: https://issues.apache.org/jira/browse/HIVE-26378 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Security >Affects Versions: 4.0.0-alpha-2 >Reporter: Alessandro Solimando >Assignee: Alessandro Solimando >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The current error when applying column masking over (unsupported) complex > data types could be improved and be more explicit. > Currently, the thrown error is as follows: > {noformat} > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.parse.SemanticException:org.apache.hadoop.hive.ql.parse.ParseException: > line 1:57 cannot recognize input near 'map' '<' 'string' in primitive type > specification > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10370) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10486) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:465) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224) > at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1218) > at > org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146) > ... 15 more > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.parse.ParseException:line 1:57 cannot recognize > input near 'map' '<' 'string' in primitive type specification > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:214) > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171) > at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteASTWithMaskAndFilter(SemanticAnalyzer.java:10368) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HIVE-26380) Fix NPE when reading a struct field with null value from iceberg table
[ https://issues.apache.org/jira/browse/HIVE-26380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] László Pintér reassigned HIVE-26380: > Fix NPE when reading a struct field with null value from iceberg table > -- > > Key: HIVE-26380 > URL: https://issues.apache.org/jira/browse/HIVE-26380 > Project: Hive > Issue Type: Bug >Reporter: László Pintér >Assignee: László Pintér >Priority: Major > > When reading a map that contains a struct of null an NPE is raised > {code:java} > Caused by: java.lang.NullPointerException at > org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldData(IcebergRecordObjectInspector.java:75)at > > org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator._evaluate(ExprNodeFieldEvaluator.java:94) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:88) > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFStruct.evaluate(GenericUDFStruct.java:70) > at > org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator._evaluate(ExprNodeGenericFuncEvaluator.java:197) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80) > at > org.apache.hadoop.hive.ql.exec.ExprNodeFieldEvaluator._evaluate(ExprNodeFieldEvaluator.java:79) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:80) > at > org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:68) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (HIVE-24066) Hive query on parquet data should identify if column is not present in file schema and show NULL value instead of Exception
[ https://issues.apache.org/jira/browse/HIVE-24066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563720#comment-17563720 ] Seonguk Kim edited comment on HIVE-24066 at 7/8/22 6:25 AM: null check support for `context.os` would be useful. (null check for struct column that not exists in file) was (Author: JIRAUSER292443): It would be useful if null check for context.os works. (null check for struct column that not exists in file) > Hive query on parquet data should identify if column is not present in file > schema and show NULL value instead of Exception > --- > > Key: HIVE-24066 > URL: https://issues.apache.org/jira/browse/HIVE-24066 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.3.5, 3.1.2 >Reporter: Jainik Vora >Priority: Major > Attachments: day_01.snappy.parquet > > > I created a hive table containing columns with struct data type > > {code:java} > CREATE EXTERNAL TABLE test_dwh.sample_parquet_table ( > `context` struct< > `app`: struct< > `build`: string, > `name`: string, > `namespace`: string, > `version`: string > >, > `device`: struct< > `adtrackingenabled`: boolean, > `advertisingid`: string, > `id`: string, > `manufacturer`: string, > `model`: string, > `type`: string > >, > `locale`: string, > `library`: struct< > `name`: string, > `version`: string > >, > `os`: struct< > `name`: string, > `version`: string > >, > `screen`: struct< > `height`: bigint, > `width`: bigint > >, > `network`: struct< > `carrier`: string, > `cellular`: boolean, > `wifi`: boolean > >, > `timezone`: string, > `userAgent`: string > > > ) PARTITIONED BY (day string) > STORED as PARQUET > LOCATION 's3://xyz/events'{code} > > All columns are nullable hence the parquet files read by the table don't > always contain all columns. If any file in a partition doesn't have > "context.os" struct and if "context.os.name" is queried, Hive throws an > exception as below. Same for "context.screen" as well. > > {code:java} > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name] > 2020-10-23T00:44:10,496 ERROR [db58bfe6-d0ca-4233-845a-8a10916c3ff1 > main([])]: CliDriver (SessionState.java:printError(1126)) - Failed with > exception java.io.IOException:java.lang.RuntimeException: Primitive type > osshould not doesn't match typeos[name]java.io.IOException: > java.lang.RuntimeException: Primitive type osshould not doesn't match > typeos[name] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:147) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2208) > at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) > at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:787) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > org.apache.hadoop.util.RunJar.run(RunJar.java:239) > at org.apache.hadoop.util.RunJar.main(RunJar.java:153) > Caused by: java.lang.RuntimeException: Primitive type osshould not doesn't > match typeos[name] > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:330) > > at > org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.projectLeafTypes(DataWritableReadSupport.java:322) > > at >