JvmPauseMonitor
Hi, Hive has 2 JvmPauseMonitor classes https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/JvmPauseMonitor.java both of which are close to copies of Hadoop JvmPauseMonitor https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java Is there a reason not to use one from Hadoop? Thanks, Eugene
[jira] [Created] (HIVE-21266) Issue with single delta file
Eugene Koifman created HIVE-21266: - Summary: Issue with single delta file Key: HIVE-21266 URL: https://issues.apache.org/jira/browse/HIVE-21266 Project: Hive Issue Type: Sub-task Affects Versions: 4.0.0 Reporter: Eugene Koifman Assignee: Vaibhav Gumashta [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java#L353-L357] {noformat} if ((deltaCount + (dir.getBaseDirectory() == null ? 0 : 1)) + origCount <= 1) { LOG.debug("Not compacting {}; current base is {} and there are {} deltas and {} originals", sd.getLocation(), dir .getBaseDirectory(), deltaCount, origCount); return; } {noformat} Is problematic. Suppose you have 1 delta file from streaming ingest: {{delta_11_20}} where {{txnid:13}} was aborted. The code above will not rewrite the delta (which drops anything that belongs to the aborted txn) and transition the compaction to "ready_for_cleaning" which will drop the metadata about the aborted txn. Now aborted data will come back as committed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21226) Exclude read-only transactions from ValidTxnList
Eugene Koifman created HIVE-21226: - Summary: Exclude read-only transactions from ValidTxnList Key: HIVE-21226 URL: https://issues.apache.org/jira/browse/HIVE-21226 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Once HIVE-21114 is done, we should make sure that ValidTxnList doesn't contain any read-only txns in the exceptions list since by definition there is no data tagged with such txnid. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21177) Optimize AcidUtils.getLogicalLength()
Eugene Koifman created HIVE-21177: - Summary: Optimize AcidUtils.getLogicalLength() Key: HIVE-21177 URL: https://issues.apache.org/jira/browse/HIVE-21177 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman {{AcidUtils.getLogicalLength()}} - tries look for the side file {{OrcAcidUtils.getSideFile()}} on the file system even when the file couldn't possibly be there, e.g. when the path is delta_x_x or base_x. It could only be there in delta_x_y, x != y. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/#review212399 --- itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 299 (patched) <https://reviews.apache.org/r/69367/#comment298161> testMoreBucketsThanReducers/testMoreBucketsThanReducers2 in TestTxnCommands force a specific number of reducers itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 185 (patched) <https://reviews.apache.org/r/69367/#comment298162> nit: since this is filtering for 'base' it's not checking if it 'only' contains base... itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java Lines 195 (patched) <https://reviews.apache.org/r/69367/#comment298163> I still don't understand what this comment is conveying. This is just a normal read, so I would assume TezSplitGrouper is not running in compactor mode - Eugene Koifman On Jan. 28, 2019, 11:49 a.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69367/ > --- > > (Updated Jan. 28, 2019, 11:49 a.m.) > > > Review request for hive and Eugene Koifman. > > > Bugs: HIVE-20699 > https://issues.apache.org/jira/browse/HIVE-20699 > > > Repository: hive-git > > > Description > --- > > https://jira.apache.org/jira/browse/HIVE-20699 > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b3a475478d > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > d6a41919bf > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e7aa041c25 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java > 15c14c9be5 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java > fbb931cbcd > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java > 6d4578e7a0 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 > ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java > db3b427adc > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > dc05e1990e > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > a0df82cb20 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java > PRE-CREATION > ql/src/test/results/clientpositive/show_functions.q.out c9716e904c > > > Diff: https://reviews.apache.org/r/69367/diff/9/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
[jira] [Created] (HIVE-21172) DEFAULT keyword handling in MERGE UPDATE clause issues
Eugene Koifman created HIVE-21172: - Summary: DEFAULT keyword handling in MERGE UPDATE clause issues Key: HIVE-21172 URL: https://issues.apache.org/jira/browse/HIVE-21172 Project: Hive Issue Type: Sub-task Components: SQL, Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman once HIVE-21159 lands, enable {{HiveConf.MERGE_SPLIT_UPDATE}} and run these tests. TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_stats] mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert_into_default_keyword.q Merge is rewritten as a multi-insert. When Update clause has DEFAULT, it's not properly replaced with a value in the muli-insert - it's treated as a literal {noformat} INSERT INTO `default`.`acidTable`-- update clause(insert part) SELECT `t`.`key`, `DEFAULT`, `t`.`value` WHERE `t`.`key` = `s`.`key` AND `s`.`key` > 3 AND NOT(`s`.`key` < 3) {noformat} See {{LOG.info("Going to reparse <" + originalQuery + "> as \n<" + rewrittenQueryStr.toString() + ">");}} in hive.log {{MergeSemanticAnalyzer.replaceDefaultKeywordForMerge()}} is only called in {{handleInsert}} but not {{handleUpdate()}}. Why does issue only show up with {{MERGE_SPLIT_UPDATE}}? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21161) Remove checks that disallow updating bucketing and partitioning columns
Eugene Koifman created HIVE-21161: - Summary: Remove checks that disallow updating bucketing and partitioning columns Key: HIVE-21161 URL: https://issues.apache.org/jira/browse/HIVE-21161 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman once both update and merge do Update split early, we can remove checks (in SemanticAnalyzer?) that prevent updating of partition/bucketing columns -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21160) Rewrite Update statement as Multi-insert and do Update split early
Eugene Koifman created HIVE-21160: - Summary: Rewrite Update statement as Multi-insert and do Update split early Key: HIVE-21160 URL: https://issues.apache.org/jira/browse/HIVE-21160 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21159) Modify Merge statement logic to perform Update split early
Eugene Koifman created HIVE-21159: - Summary: Modify Merge statement logic to perform Update split early Key: HIVE-21159 URL: https://issues.apache.org/jira/browse/HIVE-21159 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21158) Perform update split early
Eugene Koifman created HIVE-21158: - Summary: Perform update split early Key: HIVE-21158 URL: https://issues.apache.org/jira/browse/HIVE-21158 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Currently Acid 2.0 does U=D+I in the OrcRecordUpdater. This means that all Updates (wide rows) are shuffled AND sorted. We could modify the the multi-insert statement which results from Merge statement so that instead of having one of the legs represent Update, we create 2 legs - 1 representing Delete of original row and 1 representing Insert of the new version. Delete events are very small so sorting them is cheap. The Insert are written to disk in a sorted way by virtue of how ROW__IDs are generated. Exactly the same idea applies to regular Update statement. Note that the U=D+I in OrcRecordUpdater needs to be kept to keep [Streaming Mutate API |https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API] working on 2.0. *This requires that TxnHandler flags 2 Deletes as a conflict - it doesn't currently* Incidentally, 2.0 + early split allows updating all columns including bucketing and partition columns What is lock acquisition based on? Need to make sure that conflict detection (write set tracking) still works So we want to transform {noformat} update T set B = 7 where A=1 {noformat} into {noformat} from T insert into T select ROW__ID where a = 1 SORT BY ROW__ID insert into T select a, 7 where a = 1 {noformat} even better to {noformat} from T where a = 1 insert into T select ROW__ID SORT BY ROW__ID insert into T select a, 7 {noformat} but this won't parse currently. This is very similar to how MERGE stmt is handled. Need some though on on how WriteSet tracking works. If we don't allow updating partition column, then even with dynamic partitions TxnHandler.addDynamicPartitions() should see 1 entry (in Update type) for each partition since both the insert and delete land in the same partition. If part cols can be updated, then then we may insert a Delete event into P1 and corresponding Insert event into P2 so addDynamicPartitions() should see both parts. I guess both need to be recored in Write_Set but with different types. The delete as 'update' and insert as insert so that it can conflict with some IOW on the 'new' partition. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21154) Investigate using object IDs in Acid HMS schema instead of names
Eugene Koifman created HIVE-21154: - Summary: Investigate using object IDs in Acid HMS schema instead of names Key: HIVE-21154 URL: https://issues.apache.org/jira/browse/HIVE-21154 Project: Hive Issue Type: New Feature Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Currently all Acid related tables in HMS DB (HIVE_LOCKS, TXN_COMPONENTS, etc) use db_name/table_name/partition_name to identify the metastore object that is being tracked (these are potentially long strings, esp partition name. It would improve perf to use object ID such as TBLS.TBL_ID which is exposed in Thrift since HIVE-20556. It would also make handling object rename operations no-op (currently handled in {{TxnHandler.onRename()}} from {{AcidEventListener extends MetaStoreEventListener}}). This would require significant HMS schema changes and surfacing the ID of Database/Partition objects. Need to think how this affects replication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
(patched) <https://reviews.apache.org/r/69367/#comment297891> What throws the IAE? Above I see if (!reader.hasMetadataValue(OrcRecordUpdater.ACID_KEY_INDEX_NAME)) { shouldn't it bail out there if there is no index? ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java Lines 638 (patched) <https://reviews.apache.org/r/69367/#comment297892> is there a followup Jira for this? ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java Lines 2201 (patched) <https://reviews.apache.org/r/69367/#comment297893> it would be helpful to add COMPACTOR_CRUD_QUERY_BASED property name to the error msg ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 248 (patched) <https://reviews.apache.org/r/69367/#comment297919> What does this do for MM table? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 371 (patched) <https://reviews.apache.org/r/69367/#comment297922> should 'conf' be cloned? will this affect 'conf' for something else? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 537 (patched) <https://reviews.apache.org/r/69367/#comment297923> why does it need "0+ ..." ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java Lines 101 (patched) <https://reviews.apache.org/r/69367/#comment297895> Useful to include table/part name in the msg ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 17 (patched) <https://reviews.apache.org/r/69367/#comment297912> ROW__ID.bucket_column - you mean bucketId? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 57 (patched) <https://reviews.apache.org/r/69367/#comment297915> This doesn't compare statemetId anywhere but it should. I think the easiest is to compare bucketProperty or you could extract statemetId from it and do it explicitly ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 60 (patched) <https://reviews.apache.org/r/69367/#comment297917> I don't think equals makes sense ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 61 (patched) <https://reviews.apache.org/r/69367/#comment297918> it maybe useful to include both ROW__IDs in the message. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 74 (patched) <https://reviews.apache.org/r/69367/#comment297916> nit: make class and fields final to make sure compareTo is inlined? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 91 (patched) <https://reviews.apache.org/r/69367/#comment297914> when is it ok for 2 consecutive ROW_IDs to be equal? - Eugene Koifman On Jan. 21, 2019, 11:04 p.m., Vaibhav Gumashta wrote: > > ------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69367/ > --- > > (Updated Jan. 21, 2019, 11:04 p.m.) > > > Review request for hive and Eugene Koifman. > > > Bugs: HIVE-20699 > https://issues.apache.org/jira/browse/HIVE-20699 > > > Repository: hive-git > > > Description > --- > > https://jira.apache.org/jira/browse/HIVE-20699 > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > d6a41919bf > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java > 15c14c9be5 > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java > fbb931cbcd > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java > 6d4578e7a0 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 > ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java > 0e5b3e5473 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > dc05e1990e > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > a0df82cb20 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java > PRE-CREATION > ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f > > > Diff: https://reviews.apache.org/r/69367/diff/7/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
[jira] [Created] (HIVE-21146) Enforce TransactionBatch size=1 for blob stores
Eugene Koifman created HIVE-21146: - Summary: Enforce TransactionBatch size=1 for blob stores Key: HIVE-21146 URL: https://issues.apache.org/jira/browse/HIVE-21146 Project: Hive Issue Type: Bug Components: Streaming, Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Streaming Ingest API supports a concept of {{TransactionBatch}} where N transactions can be opened at once and the data in all of them will be written to the same delta_x_y directory where each transaction in the batch can be committed/aborted independently. The implementation relies on {{FSDataOutputStream.hflush()}} (called from OrcRecordUpdater}} which is available on HDFS but is often implemented as no-op in Blob store backed {{FileSystem}} objects. Need to add a check to {{HiveStreamingConnection()}} constructor to raise an error if {{builder.transactionBatchSize > 1}} and the target table/partitions are backed by something that doesn't support {{hflush()}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69704: HIVE-21052
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69704/#review212120 --- ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java Line 129 (original), 129 (patched) <https://reviews.apache.org/r/69704/#comment297738> This doesn't check 'p' type compactions so you could enqueue multiple ones for the same table, but see my Jira comments. standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java Lines 53 (patched) <https://reviews.apache.org/r/69704/#comment297739> why is this needed? when is the writeId list ever get passed over the wire? - Eugene Koifman On Jan. 16, 2019, 10:08 a.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69704/ > --- > > (Updated Jan. 16, 2019, 10:08 a.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > Make sure transaction get cleaned if they are aborted before addPartitions is > called > > > Diffs > - > > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > dc7b2877bf > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 5dbf634825 > ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java 3482cfce36 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 06b0209aa0 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > a0df82cb20 > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 5e085f84af > shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java > b6f70ebe63 > shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java > c569b242ae > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AddDynamicPartitions.java > 9c33229270 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AlterPartitionsRequest.java > f7d9ed2e2e > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataRequest.java > f4e3d6bd71 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClientCapabilities.java > 2b394449a3 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java > 4aee45ce5f > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionType.java > 7450b27cf3 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CreationMetadata.java > 9595a5dc10 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FindSchemasByColsResp.java > 42073db544 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FireEventRequest.java > dd6658d636 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse.java > 68146e4561 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprRequest.java > ee535a0c80 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprResult.java > 71e92b6c03 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataRequest.java > 0ea6ef5fb3 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataResult.java > 759b495bf6 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsFilterSpec.java > b5a2b68efd > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsProjectionSpec.java > e6c9c06beb > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsRequest.java > 7ec107ea6c > > standalone-metastore/metastore-common/src/gen/thrift
Re: Review Request 69704: HIVE-21052
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69704/#review212117 --- ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java Lines 2539 (patched) <https://reviews.apache.org/r/69704/#comment297735> JavaDoc ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java Lines 97 (patched) <https://reviews.apache.org/r/69704/#comment297736> JavaDoc ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java Lines 83 (patched) <https://reviews.apache.org/r/69704/#comment297728> Since you only have a single HMS connection (I assume this is what this locks is protecting), wouldn't it be better to get the table/partition path before parallelizing the work that can actually be parallelized? This way you fork threads and then synch them immediately. ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java Lines 140 (patched) <https://reviews.apache.org/r/69704/#comment297730> I'm not sure this achieves what the commnet says. For normal clean (as we had before) you may have > 1 compaction_queue entry in ready for cleaning. You should not have > 1 entry in Working state for the same partition, you may have > 1 entry in ready-for-cleaning since you have more workers than Cleaners. It's perhaps made even worse by the new "table level" clean. I think you are right to worry about this though. I'll make a more detail comment on the Jira shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java Lines 797 (patched) <https://reviews.apache.org/r/69704/#comment297734> Why is this needed? It should have some JavaDoc standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java Line 91 (original), 91 (patched) <https://reviews.apache.org/r/69704/#comment297733> I don't think this is right. You are now counting aborted txns by type, so that you need > maxAborted aborted Inserts or > maxAborted aborted Updates, etc to trigger compaction rather than ( > maxAborted of (aborted inserts + updates+ deletes) standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java Line 1228 (original), 1231 (patched) <https://reviews.apache.org/r/69704/#comment297737> exclude 'p' type here - Eugene Koifman On Jan. 16, 2019, 10:08 a.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69704/ > --- > > (Updated Jan. 16, 2019, 10:08 a.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > Make sure transaction get cleaned if they are aborted before addPartitions is > called > > > Diffs > - > > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > dc7b2877bf > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 5dbf634825 > ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java 3482cfce36 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 06b0209aa0 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > a0df82cb20 > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 5e085f84af > shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java > b6f70ebe63 > shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java > c569b242ae > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AddDynamicPartitions.java > 9c33229270 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AlterPartitionsRequest.java > f7d9ed2e2e > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataRequest.java > f4e3d6bd71 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClientCapabilities.java > 2b394449a3 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java > 4aee45ce5f > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionType.java > 7450b27cf3 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Cr
[jira] [Created] (HIVE-21114) Create read-only transactions
Eugene Koifman created HIVE-21114: - Summary: Create read-only transactions Key: HIVE-21114 URL: https://issues.apache.org/jira/browse/HIVE-21114 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman With HIVE-21036 we have a way to indicate that a txn is read only. We should (at least in auto-commit mode) determine if the single stmt is a read and mark the txn accordingly. Then we can optimize {{TxnHandler.commitTxn()}} so that it doesn't do any checks in write_set etc. HiveOperation only has QUERY, which includes Insert and Select, so this requires figuring out how to determine if a query is a SELECT. By the time {{Driver.openTransaction();}} is called, we have already parsed the query so there should be a way to know if the statement only reads. For multi-stmt txns (once these are supported) we should allow user to indicate that a txn is read-only and then not allow any statements that can make modifications in this txn. This should be a different jira. cc [~ikryvenko] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21106) Potential NEP in VectorizedOrcAcidRowBatchReader.ColumnizedDeleteEventRegistry
Eugene Koifman created HIVE-21106: - Summary: Potential NEP in VectorizedOrcAcidRowBatchReader.ColumnizedDeleteEventRegistry Key: HIVE-21106 URL: https://issues.apache.org/jira/browse/HIVE-21106 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman {{VectorizedOrcAcidRowBatchReader.ColumnizedDeleteEventRegistry()}} {noformat} AcidStats acidStats = OrcAcidUtils.parseAcidStats(deleteDeltaReader); if (acidStats.deletes == 0) { continue; // just a safe check to ensure that we are not reading empty delete files. } {noformat} If the {{delete_delta../bucket_x}} is empty, it may not have a {{hive.acid.index}} and {{OrcAcidUtils.parseAcidStats()}} will return null which causes NPE. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69462: HIVE-20936
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69462/#review211519 --- Ship it! Ship It! - Eugene Koifman On Dec. 21, 2018, 4:30 p.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69462/ > --- > > (Updated Dec. 21, 2018, 4:30 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > Allow the Worker thread in the metastore to run outside of it > > > Diffs > - > > > hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java > b290a40734 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > d3800cdf2a > jdbc/src/java/org/apache/hive/jdbc/Utils.java 852942e6a2 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 18253c9bab > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 42ce1746fd > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java > f5b901d6e8 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > cdcc0e9548 > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 49662cd68b > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java a3034fb195 > ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java > 287aeaecb0 > service/src/java/org/apache/hive/service/server/HiveServer2.java 0c55654475 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AlterPartitionsRequest.java > d85dda5acd > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataRequest.java > 3eb55b1b59 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClientCapabilities.java > 17f8b7730a > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java > PRE-CREATION > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FindSchemasByColsResp.java > f2f8fb475e > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FireEventRequest.java > f7e188dfda > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse.java > bd38bbe45d > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprRequest.java > fb591dcec5 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprResult.java > e8dfba523d > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataRequest.java > 3d32f372d6 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataResult.java > 2b176efee4 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsFilterSpec.java > c0fe726f8a > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsProjectionSpec.java > db91e0bf89 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsRequest.java > d26cde23fc > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsResponse.java > 3db9095b5c > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetTablesRequest.java > c3f71fe13e > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetTablesResult.java > 5716922bd3 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/InsertEventRequestData.java > 3ef24310b2
Re: Review Request 69462: HIVE-20936
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69462/#review211508 --- it looks like it has merge conflits standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionInfo.java Line 34 (original), 42 (patched) <https://reviews.apache.org/r/69462/#comment296731> Why does this have more state fields than CompactorInfoStruct? Perhaps it can done in HIVE-21056 ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 586 (patched) <https://reviews.apache.org/r/69462/#comment296732> "rj.getID().toString()" shouldn't be inside the quotes ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java Lines 216 (patched) <https://reviews.apache.org/r/69462/#comment296733> this seems unused anywhere - Eugene Koifman On Dec. 20, 2018, 5:02 p.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69462/ > --- > > (Updated Dec. 20, 2018, 5:02 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > Allow the Worker thread in the metastore to run outside of it > > > Diffs > - > > > hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java > b290a40734 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > 5af047f465 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 18253c9bab > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 42ce1746fd > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java > f5b901d6e8 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > cdcc0e9548 > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 49662cd68b > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java a3034fb195 > ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java > 287aeaecb0 > service/src/java/org/apache/hive/service/server/HiveServer2.java 0c55654475 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AlterPartitionsRequest.java > d85dda5acd > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataRequest.java > 3eb55b1b59 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClientCapabilities.java > 17f8b7730a > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java > PRE-CREATION > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FindSchemasByColsResp.java > f2f8fb475e > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FireEventRequest.java > f7e188dfda > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse.java > bd38bbe45d > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprRequest.java > fb591dcec5 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprResult.java > e8dfba523d > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataRequest.java > 3d32f372d6 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataResult.java > 2b176efee4 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsFilterSpec.java > c0fe726f8a > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsProjectionSpec.java > db91e0bf89 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Ge
[jira] [Created] (HIVE-21058) Make Compactor run in a transaction (Umbrella)
Eugene Koifman created HIVE-21058: - Summary: Make Compactor run in a transaction (Umbrella) Key: HIVE-21058 URL: https://issues.apache.org/jira/browse/HIVE-21058 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 4.0.0 Ensure that files produced by the compactor have their visibility controlled via Hive transaction commit like any other write to an ACID table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69462: HIVE-20936
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69462/#review211381 --- ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java Line 184 (original), 184 (patched) <https://reviews.apache.org/r/69462/#comment296389> Just realized this needs a new metastore connection. Thrift connections are not thread safe - when you mulitplex calls on a single connection, response messages sometimes get lost or matched to the wrong request. If you look at how Heartbeating is done in DbTxnHandler, it does something similar except that it relies on ThreadLocal in Hive.get(conf).getMSC(). ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java Line 187 (original), 187 (patched) <https://reviews.apache.org/r/69462/#comment296388> why is this added here? The CompactionHeartbeater should do this - Eugene Koifman On Dec. 17, 2018, 9:59 a.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69462/ > --- > > (Updated Dec. 17, 2018, 9:59 a.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > Allow the Worker thread in the metastore to run outside of it > > > Diffs > - > > > hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java > b290a40734 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > 5af047f465 > jdbc/src/java/org/apache/hive/jdbc/Utils.java 852942e6a2 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 18253c9bab > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 42ce1746fd > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java > f5b901d6e8 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > cdcc0e9548 > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 21043415d3 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 546ff955b7 > ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java > 52453a2ec4 > service/src/java/org/apache/hive/service/server/HiveServer2.java 0c55654475 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java > PRE-CREATION > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/OptionalCompactionInfoStruct.java > PRE-CREATION > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java > b6a0893524 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php > 3170798663 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 39f8b1f05a > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote > d57de353c6 > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py > a896849989 > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 4ef4aadfee > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > 97dc0696b7 > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/thrift_hive_metastore.rb > a5f976bc5c > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java > 9eb1193a27 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java > fa19440ba2 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java > e25a8cf9a1 > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > cb899d791f > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 598847df03 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreThread.java > 6ef2e3560d > > standalone-metastor
Re: Review Request 69462: HIVE-20936
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69462/#review211379 --- standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java Lines 1019 (patched) <https://reviews.apache.org/r/69462/#comment296383> you can have Conf validate the values for you, for exaple MATERIALIZATIONS_INVALIDATION_CACHE_IMPL standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift Lines 1079 (patched) <https://reviews.apache.org/r/69462/#comment296384> nit: it seems like a returning a (possibly) list of CompactionInfoStruct is simpler/easier to understand. standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionInfo.java Lines 172 (patched) <https://reviews.apache.org/r/69462/#comment296385> could you add a comment at the top next to the member variables to indicate that these methods should be modfied to be in sync - Eugene Koifman On Dec. 17, 2018, 9:59 a.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69462/ > --- > > (Updated Dec. 17, 2018, 9:59 a.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > Allow the Worker thread in the metastore to run outside of it > > > Diffs > - > > > hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java > b290a40734 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > 5af047f465 > jdbc/src/java/org/apache/hive/jdbc/Utils.java 852942e6a2 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 18253c9bab > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 42ce1746fd > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java > f5b901d6e8 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > cdcc0e9548 > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 21043415d3 > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 546ff955b7 > ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java > 52453a2ec4 > service/src/java/org/apache/hive/service/server/HiveServer2.java 0c55654475 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java > PRE-CREATION > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/OptionalCompactionInfoStruct.java > PRE-CREATION > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java > b6a0893524 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php > 3170798663 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 39f8b1f05a > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote > d57de353c6 > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py > a896849989 > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 4ef4aadfee > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > 97dc0696b7 > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/thrift_hive_metastore.rb > a5f976bc5c > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java > 9eb1193a27 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java > fa19440ba2 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java > e25a8cf9a1 > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > cb899d791f > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 598847df03 > > standalone-metastore/metastore-server/sr
Re: Review Request 69462: HIVE-20936
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69462/#review211256 --- ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java Lines 60 (patched) <https://reviews.apache.org/r/69462/#comment296188> @Override ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java Line 80 (original), 79 (patched) <https://reviews.apache.org/r/69462/#comment296189> @Override ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java Lines 35 (patched) <https://reviews.apache.org/r/69462/#comment296199> Perhaps add that this can run inside HMS as well. ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java Line 97 (original), 94 (patched) <https://reviews.apache.org/r/69462/#comment296190> this is also intialzied in init() - should it throw here instead? It seems that the contract is that init() must be called first ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java Lines 241 (patched) <https://reviews.apache.org/r/69462/#comment296193> What is the purpose of this? Why doesn't the existing catch(Throwable) with the same log msg work? service/src/java/org/apache/hive/service/server/HiveServer2.java Lines 1016 (patched) <https://reviews.apache.org/r/69462/#comment296200> Are there any tests that actually enable HIVE_MAPREDUCE_AVAILABLE ? service/src/java/org/apache/hive/service/server/HiveServer2.java Lines 1019 (patched) <https://reviews.apache.org/r/69462/#comment296198> Why do you need reflection for this? Why not just do Worker w = new Worker();, etc? standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java Lines 1020 (patched) <https://reviews.apache.org/r/69462/#comment296194> Perhaps this should be called hive.metastore.runworker.remotely - you can run 'remote' worker even with MR or 'hive.metastore.runworker.in" and support values "metastore" and "hs2" - this is probable more extensivble standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift Lines 2448 (patched) <https://reviews.apache.org/r/69462/#comment296187> nit: could you move these to around line 2312 where the rest of TxnStore methods are standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java Lines 1476 (patched) <https://reviews.apache.org/r/69462/#comment296191> I would put both of these in CompactionInfo. If someone adds fields to CompactionInfo, they are unlikely to ever find these methods and so some info will be lost in the marshalling back and forth. Alternatively, could CompactionInfo be a subclass of CompactionInfoStruct? - Eugene Koifman On Dec. 11, 2018, 3:45 p.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69462/ > --- > > (Updated Dec. 11, 2018, 3:45 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > Allow the Worker thread in the metastore to run outside of it > > > Diffs > - > > > hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java > b290a40734 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > beb36d7674 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 18253c9bab > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > c6cb7c5254 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java > f5b901d6e8 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java > cdcc0e9548 > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java > PRE-CREATION > > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java > PRE-CREATION > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 4a1cac123c > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java dc39f5ef61 > ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java > 52453a2ec4 > service/src/java/org/apache/hive/service/server/HiveServer2.java 0c55654475 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java > PRE-CREATION > > standalone-metastore/metastore-common/src/gen/thrift/gen-ja
[jira] [Created] (HIVE-21036) extend OpenTxnRequest with transaction type
Eugene Koifman created HIVE-21036: - Summary: extend OpenTxnRequest with transaction type Key: HIVE-21036 URL: https://issues.apache.org/jira/browse/HIVE-21036 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman There is a {{TXN_TYPE}} field in {{TXNS}} table. There is {{TxnHandler.TxnType}} with legal values. It would be useful to TxnType a {{Thrift}}, add a new {{COMPACTION}} type object and allow setting it in {{OpenTxnRequest}}. Since HIVE-20823 compactor starts a txn and should set this. Down the road we may want to set READ_ONLY either based on parsing of the query or user input which can make {{TxnHandler.commitTxn}} faster. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21025) LLAP IO fails on read if partition column is included in the table and the query has a predicate on the partition column
Eugene Koifman created HIVE-21025: - Summary: LLAP IO fails on read if partition column is included in the table and the query has a predicate on the partition column Key: HIVE-21025 URL: https://issues.apache.org/jira/browse/HIVE-21025 Project: Hive Issue Type: Bug Components: llap Affects Versions: 2.3.4 Reporter: Eugene Koifman Hive doesn't officially support the case when a partitioning column is also included in the data itself, though it works in some cases. Hive would never write a data file with partition column in it but this can happen for external tables where data is added by the end user. Consider improving validation (at least for schema-aware files) on read to produce a better error than {[ArrayIndexOutOfBoundsException}} {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException ], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : attempt_1539023000868_24675_3_01_07_3:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189) ... 15 more Caused by: java.io.IOException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) ... 17 more Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.rethrowErrorIfAny(LlapRecordReader.java:355) at org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.nextCvb(LlapRecordReader.java:310) at org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:250) at org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:67) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) ... 23 more Caused by: java.lang.ArrayIndexOutOfBoundsException {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-21020) log which table/partition is being processed by a txn in Worker
Eugene Koifman created HIVE-21020: - Summary: log which table/partition is being processed by a txn in Worker Key: HIVE-21020 URL: https://issues.apache.org/jira/browse/HIVE-21020 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman Make sure we have info in the log that ties cat.table.part with txnid of the compactor -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20960) remove CompactorMR.createCompactorMarker()
Eugene Koifman created HIVE-20960: - Summary: remove CompactorMR.createCompactorMarker() Key: HIVE-20960 URL: https://issues.apache.org/jira/browse/HIVE-20960 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Now that we have HIVE-20941, we know if a dir is produced by compactor from the name and {{CompactorMR.createCompactorMarker()}} can be removed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/#review210740 --- itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java Line 165 (original), 165 (patched) <https://reviews.apache.org/r/69367/#comment295490> ? itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java Line 170 (original), 170 (patched) <https://reviews.apache.org/r/69367/#comment295491> why are all these test made non-tests? or does this do somethign else? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 533 (patched) <https://reviews.apache.org/r/69367/#comment295492> were you going to do "0+validate_acid_sort_order(...)" instead? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 54 (patched) <https://reviews.apache.org/r/69367/#comment295494> I'm guessing if compareTo returns 0 that's bad - we should have unique row ids ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 61 (patched) <https://reviews.apache.org/r/69367/#comment295493> should this return 0? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java Lines 80 (patched) <https://reviews.apache.org/r/69367/#comment295489> I think comparison should include 'bucketProperty' since we sort on 'bucketProperty' not just bucketId. In particular, if you have > 1 statement per txn, we expect that rows from 2nd stmt follow those from 1st. - Eugene Koifman On Nov. 19, 2018, 3:49 a.m., Vaibhav Gumashta wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/69367/ > ----------- > > (Updated Nov. 19, 2018, 3:49 a.m.) > > > Review request for hive and Eugene Koifman. > > > Bugs: HIVE-20699 > https://issues.apache.org/jira/browse/HIVE-20699 > > > Repository: hive-git > > > Description > --- > > https://jira.apache.org/jira/browse/HIVE-20699 > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java > 40dd992455 > pom.xml 26b662e4c3 > ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 578b16cc7c > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java > 8cabf960db > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 > ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java > 6e7c78bd17 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java > 92c74e1d06 > > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java > PRE-CREATION > > > Diff: https://reviews.apache.org/r/69367/diff/2/ > > > Testing > --- > > > Thanks, > > Vaibhav Gumashta > >
[jira] [Created] (HIVE-20948) Eliminate file rename in compactor
Eugene Koifman created HIVE-20948: - Summary: Eliminate file rename in compactor Key: HIVE-20948 URL: https://issues.apache.org/jira/browse/HIVE-20948 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman Once HIVE-20823 is committed, we should investigate if it's possible to have compactor write directly to base_x_cZ or delta_x_y_cZ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20943) Handle Compactor transaction abort properly
Eugene Koifman created HIVE-20943: - Summary: Handle Compactor transaction abort properly Key: HIVE-20943 URL: https://issues.apache.org/jira/browse/HIVE-20943 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman A transactions in which the Worker runs may fail after base_x_cZ (delta_x_y_xZ) is created but before files are fully written. Need to make sure to write to TXN_COMPONENTS an entry for corresponding to Z so "_cZ" directories are not read by anyone and cleaned by Cleaner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20942) Worker should heartbeat its own txn
Eugene Koifman created HIVE-20942: - Summary: Worker should heartbeat its own txn Key: HIVE-20942 URL: https://issues.apache.org/jira/browse/HIVE-20942 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman Since HIVE-20823 \{{Worker.java}} starts a txn - should either add a heartbeat thread to it or use HiveTxnManager to start txn which will set up heartbeat automatically. In the later case make sure it's properly cancelled on failures. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20941) Compactor produces a delete_delta_x_y even if there are no input delete events
Eugene Koifman created HIVE-20941: - Summary: Compactor produces a delete_delta_x_y even if there are no input delete events Key: HIVE-20941 URL: https://issues.apache.org/jira/browse/HIVE-20941 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 69367: Query based compactor for full CRUD Acid tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/69367/#review210589 --- common/src/java/org/apache/hadoop/hive/conf/HiveConf.java Lines 2685 (patched) <https://reviews.apache.org/r/69367/#comment295290> "And minor compaction will be disabled." - should make sure Initiator doesn't start minor and that Alter Table commands requesting Minor are no-op or throw so that these don't get into the compactor queue. We should also, perhaps think about how Initiator triggers Major compactions - are current config params adequate? Should do at least the 2nd part in a follow up jira, maybe both. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Line 180 (original), 183 (patched) <https://reviews.apache.org/r/69367/#comment295291> I guess all this should be no-op for compactor since it only looks at 1 partition at a time and for acid serde and IF/OF don't change. ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 197 (patched) <https://reviews.apache.org/r/69367/#comment295292> bucketSplitMultiMap? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 206 (patched) <https://reviews.apache.org/r/69367/#comment295293> the error should include table name if easily available here or if not maybe a file path from any of the splits... ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 214 (patched) <https://reviews.apache.org/r/69367/#comment295294> should we assert that schemaSplitMultiMap has size=1 since that is what we expect for compactor? ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java Lines 276 (patched) <https://reviews.apache.org/r/69367/#comment295295> Add a comment that this is trully a bucketId (rather than bucket property - BucketCodec.java since 3.0) that is derived from file name WriteId is also from containing file name and for files that have min/max wrieid, it's the starting one. Now that I look at the code in TransactionMetadata.findWriteIDForSynthetcRowIDs() - the assert there will throw. It should be removed since where we have to handle files that come from compacted dirs so min <> max for all deltas. maybe these comments should be on OrcSplit where getter methods are defined. ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java Lines 68 (patched) <https://reviews.apache.org/r/69367/#comment295296> mark these transient for clarity since we don't serialize them ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 245 (patched) <https://reviews.apache.org/r/69367/#comment295297> Ideally this should be prevented before it gets into the compction_queue. throwing here will cause failed compactions to accumulate in SHOW COMPACTIONS and prevent auto-scheduling of new ones. ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 399 (patched) <https://reviews.apache.org/r/69367/#comment295298> should this be in a finally{}? SessionState is threadLocal so it may get reused... or do we shutdown the session each time? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 481 (patched) <https://reviews.apache.org/r/69367/#comment295299> current write id should always be the same as original. Only delete event can have these be different but major compaction absorbs delete events. ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 503 (patched) <https://reviews.apache.org/r/69367/#comment295300> what's the value of specifying location for tmp table? I'm surprised it's even legal. Would this be a security hole potentially? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 510 (patched) <https://reviews.apache.org/r/69367/#comment295302> why overwrite? ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 513 (patched) <https://reviews.apache.org/r/69367/#comment295301> why do you need partition key/values in the query? we are always reading a single partition. This is achieved by getAcidState() which takes partition dir as input (i.e. all the files it returns are within a given partition) ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 542 (patched) <https://reviews.apache.org/r/69367/#comment295303> need to think about this. maybe it's ok... ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java Lines 565 (patched) <https://reviews.apache.org/r/69367/#comment295304> there should be something in AcidUtils to parse original bucket file name - Eugene Koifman On Nov. 15, 2018, 4:
[jira] [Created] (HIVE-20901) running compactor when there is nothing to do produces duplicate data
Eugene Koifman created HIVE-20901: - Summary: running compactor when there is nothing to do produces duplicate data Key: HIVE-20901 URL: https://issues.apache.org/jira/browse/HIVE-20901 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman suppose we run minor compaction 2 times, via alter table The 2nd request to compaction should have nothing to do but I don't think there is a check for that. It's visible in the context of HIVE-20823, where each compactor run produces a delta with new visibility suffix so we end up with something like {noformat} target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ ├── delete_delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delete_delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_001_ │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v019 │ ├── _orc_acid_version │ └── bucket_0 ├── delta_001_002_v021 │ ├── _orc_acid_version │ └── bucket_0 └── delta_002_002_ ├── _orc_acid_version └── bucket_0{noformat} i.e. 2 deltas with the same write ID range this is bad. Probably happens today as well but new run produces a delta with the same name and clobbers the previous one, which may interfere with writers need to investigate -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20885) ql.txn.compactor.TestCompactor runs most tests 2 times
Eugene Koifman created HIVE-20885: - Summary: ql.txn.compactor.TestCompactor runs most tests 2 times Key: HIVE-20885 URL: https://issues.apache.org/jira/browse/HIVE-20885 Project: Hive Issue Type: Improvement Components: Streaming, Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman HIVE-19211 added {{@RunWith(Parameterized.class)}} so that it runs once with {{newStreamingAPI=true}} and once with \{{newStreamingAPI==false}} but only about 5 tests out of 23 make use of this variable. All other tests are executed 2 times for no reason cc [~prasanth_j] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20874) Add ability to to run high priority compaction
Eugene Koifman created HIVE-20874: - Summary: Add ability to to run high priority compaction Key: HIVE-20874 URL: https://issues.apache.org/jira/browse/HIVE-20874 Project: Hive Issue Type: New Feature Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman currently all compaction requests (via Alter Table command or auto initiated (\{{Initiator.java}}) land in a queue (\{{COMPACTION_QUEUE}} metastore DB table) and are executed in order. If the queue is long and some table/partition needs to e compacted urgently, there is no way to send it to the beginning of the queue. Need a way to address this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20862) QueryId no longer shows up in the logs
Eugene Koifman created HIVE-20862: - Summary: QueryId no longer shows up in the logs Key: HIVE-20862 URL: https://issues.apache.org/jira/browse/HIVE-20862 Project: Hive Issue Type: Bug Components: Logging Affects Versions: 4.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20863) remove dead code
Eugene Koifman created HIVE-20863: - Summary: remove dead code Key: HIVE-20863 URL: https://issues.apache.org/jira/browse/HIVE-20863 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20859) clean up invocation of Worker/Cleaner/Initiator in test code
Eugene Koifman created HIVE-20859: - Summary: clean up invocation of Worker/Cleaner/Initiator in test code Key: HIVE-20859 URL: https://issues.apache.org/jira/browse/HIVE-20859 Project: Hive Issue Type: Improvement Components: Test, Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman there are many places like {{CompactorTest}} that use {code:java|title=CompactorTest.java} AtomicBoolean stop = new AtomicBoolean(true); Worker t = new Worker(); t.setThreadId((int) t.getId()); t.setConf(hiveConf); AtomicBoolean looped = new AtomicBoolean(); t.init(stop, looped); t.run(); {code} should instead standardize on {{TestTxnCommands2.runWorker()}} same for {{Cleaner}} and {{Initiator}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20856) ValidReaderWriteIdList() is not valid in most places
Eugene Koifman created HIVE-20856: - Summary: ValidReaderWriteIdList() is not valid in most places Key: HIVE-20856 URL: https://issues.apache.org/jira/browse/HIVE-20856 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Most of the time it's something like this: {code:java} String txnString = conf.get(ValidWriteIdList.VALID_WRITEIDS_KEY); this.validWriteIdList = (txnString == null) ? new ValidReaderWriteIdList() : new ValidReaderWriteIdList(txnString); {code} but ValidReaderWriteIdList() (no arg c'tor) creates a write ID list that considers every base/delta valid - this unlikely to be the correct for a general read of acid data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20823) Make Compactor run in a transaction
Eugene Koifman created HIVE-20823: - Summary: Make Compactor run in a transaction Key: HIVE-20823 URL: https://issues.apache.org/jira/browse/HIVE-20823 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Have compactor open a transaction and run the job in that transaction. # make compactor produced base/delta include this txn id in the folder name, e.g. base_7_c17 where 17 is the txnid. # add {{CQ_TXN_ID bigint}} to COMPACTION_QUEUE and COMPLETED_COMPACTIONS to record this txn id # make sure {{AcidUtils.getAcidState()}} pays attention to this transaction on read and ignores this dir if this txn id is not committed in the current snapshot ## this means not only validWriteIdList but ValidTxnIdList should be passed along in config (if it isn't yet) # once this is done, {{CompactorMR.createCompactorMarker()}} can be eliminated and {{AcidUtils.isValidBase}} modified accordingly # modify Cleaner so that it doesn't clean old files until new file is visible to all readers # -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20769) TxnHandler.checkLock() will re-acquire the same lock
Eugene Koifman created HIVE-20769: - Summary: TxnHandler.checkLock() will re-acquire the same lock Key: HIVE-20769 URL: https://issues.apache.org/jira/browse/HIVE-20769 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 3.1.0 Reporter: Eugene Koifman as currently implemented, this will acquire the same type of lock on the same resource if requested by another stmt in the same txn. Need to fix that. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 68805: HIVE-20538
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68805/#review209558 --- Ship it! Ship It! - Eugene Koifman On Sept. 21, 2018, 3:51 p.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68805/ > --- > > (Updated Sept. 21, 2018, 3:51 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > HIVE-20538: Allow to store a key value together with a transaction. > > > Diffs > - > > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CommitTxnKeyValue.java > PRE-CREATION > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CommitTxnRequest.java > db47f9db8b > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 936f7c5a40 > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 958f13c18e > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > a3dddf54e4 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java > d226db50a5 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java > 54e7eda0da > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > ad83162ec3 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 1df1ebce49 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java > 080cc5284b > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java > ce590d0f55 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java > db4dd9ec42 > > > Diff: https://reviews.apache.org/r/68805/diff/3/ > > > Testing > --- > > > Thanks, > > Jaume Marhuenda > >
[jira] [Created] (HIVE-20738) Enable Delete Event filtering in VectorizedOrcAcidRowBatchReader
Eugene Koifman created HIVE-20738: - Summary: Enable Delete Event filtering in VectorizedOrcAcidRowBatchReader Key: HIVE-20738 URL: https://issues.apache.org/jira/browse/HIVE-20738 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Currently DeleteEventRegistry loads all delete events which can take time and use a lot of memory. Should minimize the number of deletes loaded based on the insert events included in the Split. This is an umbrella jira for several tasks that make up the work. See individual tasks for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20730) Do delete event filtering even if hive.acid.index is not there
Eugene Koifman created HIVE-20730: - Summary: Do delete event filtering even if hive.acid.index is not there Key: HIVE-20730 URL: https://issues.apache.org/jira/browse/HIVE-20730 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman since HIVE-16812 {{VectorizedOrcAcidRowBatchReader}} filters delete events based on min/max ROW__ID in the split which relies on {{hive.acid.index}} to be in the ORC footer. There is no way to generate {{hive.acid.index}} from a plain query as in HIVE-20699 and so we need to make sure that we generate a SARG into delete_delta/bucket_x based on stripe stats even the index is missing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20723) Allow per table specification of compaction yarn queue
Eugene Koifman created HIVE-20723: - Summary: Allow per table specification of compaction yarn queue Key: HIVE-20723 URL: https://issues.apache.org/jira/browse/HIVE-20723 Project: Hive Issue Type: New Feature Components: Transactions Affects Versions: 2.0.0 Reporter: Eugene Koifman Currently compactions of full CRUD transactional tables are Map-Reduce jobs submitted to a yarn queue defined by hive.compactor.job.queue property. If would be useful to be able to override this on per table basis by putting it into table properties so that compactions for different tables can use different queues. There is already ability to override other compaction related configs via table props, though this will need additional handling. [https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-TableProperties] See {{CopactorMR.COMPACTOR_PREFIX}} and {{Initiator.COMPACTORTHRESHOLD_PREFIX}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 68805: HIVE-20538
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68805/#review209387 --- ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsWithSplitUpdateAndVectorization.java Line 25 (original), 30 (patched) <https://reviews.apache.org/r/68805/#comment293781> What is this change for? TestTxnCommands is a subclass of TxnCommandsBaseForTests. I think this means none of the TestTxnCommands tests run in vectorized mode any more More generally, what is the point of other changes in this class? - Eugene Koifman On Sept. 21, 2018, 3:51 p.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68805/ > --- > > (Updated Sept. 21, 2018, 3:51 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > HIVE-20538: Allow to store a key value together with a transaction. > > > Diffs > - > > > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsWithSplitUpdateAndVectorization.java > a013230025 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CommitTxnKeyValue.java > PRE-CREATION > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CommitTxnRequest.java > db47f9db8b > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 936f7c5a40 > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 958f13c18e > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > a3dddf54e4 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java > d226db50a5 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java > 54e7eda0da > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > ad83162ec3 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > 1df1ebce49 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java > 080cc5284b > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java > ce590d0f55 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java > db4dd9ec42 > > > Diff: https://reviews.apache.org/r/68805/diff/2/ > > > Testing > --- > > > Thanks, > > Jaume Marhuenda > >
[jira] [Created] (HIVE-20699) Query based compactor for full CRUD Acid tables
Eugene Koifman created HIVE-20699: - Summary: Query based compactor for full CRUD Acid tables Key: HIVE-20699 URL: https://issues.apache.org/jira/browse/HIVE-20699 Project: Hive Issue Type: New Feature Components: Transactions Affects Versions: 3.1.0 Reporter: Eugene Koifman Assignee: Eugene Koifman -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 68834: HIVE-20556
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68834/#review209243 --- Ship it! Ship It! - Eugene Koifman On Sept. 24, 2018, 8:04 p.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68834/ > --- > > (Updated Sept. 24, 2018, 8:04 p.m.) > > > Review request for hive, Daniel Dai and Eugene Koifman. > > > Repository: hive-git > > > Description > --- > > Expose an API to retrieve the TBL_ID from TBLS in the metastore tables > > > Diffs > - > > data/files/exported_table/_metadata 81fbf63a54 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestAuthorizationPreEventListener.java > 05c00094d6 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestMetastoreAuthorizationProvider.java > 767321332c > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java f72e08c14f > ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java ca4d36f30d > > ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java > ff411f62d5 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java > 78ac909f72 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 22deffe1d3 > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 38fac465d7 > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > 0192c6da31 > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > 85a5c601e0 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > ba82a9327c > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java > 64945060f7 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MTable.java > deeb97133d > standalone-metastore/metastore-server/src/main/resources/package.jdo > 2a5f016b1f > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java > 4937d9d861 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStorePartitionSpecs.java > df83171648 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestTablesCreateDropAlterTruncate.java > bf302ed491 > > > Diff: https://reviews.apache.org/r/68834/diff/2/ > > > Testing > --- > > > Thanks, > > Jaume Marhuenda > >
Re: Review Request 68834: HIVE-20556
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68834/#review209103 --- standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java Lines 1808 (patched) <https://reviews.apache.org/r/68834/#comment293363> It would help debugging if the msg included cat.db.table + tblid that was set. standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MTable.java Lines 296 (patched) <https://reviews.apache.org/r/68834/#comment293364> I don't see anyone calling this - is thsi needed? standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java Lines 1715 (patched) <https://reviews.apache.org/r/68834/#comment293365> it would make sense to check in the catch that you are getting the expected error msg - Eugene Koifman On Sept. 24, 2018, 8:04 p.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68834/ > --- > > (Updated Sept. 24, 2018, 8:04 p.m.) > > > Review request for hive, Daniel Dai and Eugene Koifman. > > > Repository: hive-git > > > Description > --- > > Expose an API to retrieve the TBL_ID from TBLS in the metastore tables > > > Diffs > - > > data/files/exported_table/_metadata 81fbf63a54 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestAuthorizationPreEventListener.java > 05c00094d6 > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestMetastoreAuthorizationProvider.java > 767321332c > ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java f72e08c14f > ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java ca4d36f30d > > ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java > ff411f62d5 > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java > 78ac909f72 > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 22deffe1d3 > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 38fac465d7 > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > 0192c6da31 > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > 85a5c601e0 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > ba82a9327c > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java > d27224b235 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MTable.java > deeb97133d > standalone-metastore/metastore-server/src/main/resources/package.jdo > 2a5f016b1f > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java > 4937d9d861 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStorePartitionSpecs.java > df83171648 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestTablesCreateDropAlterTruncate.java > bf302ed491 > > > Diff: https://reviews.apache.org/r/68834/diff/1/ > > > Testing > --- > > > Thanks, > > Jaume Marhuenda > >
[jira] [Created] (HIVE-20655) Optimize arrayCopy in LlapRecordReader
Eugene Koifman created HIVE-20655: - Summary: Optimize arrayCopy in LlapRecordReader Key: HIVE-20655 URL: https://issues.apache.org/jira/browse/HIVE-20655 Project: Hive Issue Type: Improvement Components: llap, Transactions Affects Versions: 4.0.0 Environment: followup to HIVE-19985 See Gopal's comment on 8/3/2018 https://issues.apache.org/jira/browse/HIVE-19985?focusedCommentId=16568707&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16568707 Reporter: Eugene Koifman -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20654) remove masking of "Masked writeid"
Eugene Koifman created HIVE-20654: - Summary: remove masking of "Masked writeid" Key: HIVE-20654 URL: https://issues.apache.org/jira/browse/HIVE-20654 Project: Hive Issue Type: Improvement Components: Test, Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman {{QOutProcessor}} has {noformat} ppm.add(new PatternReplacementPair(Pattern.compile("\\{\"writeid\":[1-9][0-9]*,\"bucketid\":"), "{\"writeid\":### Masked writeid ###,\"bucketid\":")); {noformat} which causes something like {noformat} {"writeid":### Masked writeid ###,"bucketid":536870912,"rowid":0} 2 {"writeid":### Masked writeid ###,"bucketid":536870912,"rowid":1} 3 {noformat} in the {{*.q.out}} file. For example, {{acid_meta_columns_decode.q}} This was needed when the {{ROW__ID}} contained global transaction ID which changed depending on which tests were ran previously. Since 3.0, {{ROW__ID}} uses per table writeId which is stable and is safe to put in the .out file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20640) Upgrade Hive to use ORC 1.5.3
Eugene Koifman created HIVE-20640: - Summary: Upgrade Hive to use ORC 1.5.3 Key: HIVE-20640 URL: https://issues.apache.org/jira/browse/HIVE-20640 Project: Hive Issue Type: Improvement Affects Versions: 4.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20635) VectorizedOrcAcidRowBatchReader doesn't filter delete events for original files
Eugene Koifman created HIVE-20635: - Summary: VectorizedOrcAcidRowBatchReader doesn't filter delete events for original files Key: HIVE-20635 URL: https://issues.apache.org/jira/browse/HIVE-20635 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman this is a followup to HIVE-16812 which adds support for delete event filtering for splits from native acid files need to add the same for {{OrcSplit.isOriginal()}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 68805: HIVE-20538
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68805/#review208955 --- standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java Lines 2929 (patched) <https://reviews.apache.org/r/68805/#comment293223> This comment seems confusing to me. Maybe give Kafka offset as a concrete example of point to some wiki where this API is documented. for example, "...for example to know if a transaction has already been committed" which transaction is this talking about? standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java Lines 1095 (patched) <https://reviews.apache.org/r/68805/#comment293218> I think a MetaException would be better (or IllegalState/Argument). SQLException is generally produced by the DB and has sqlstate/sqlcode that various handlers try to examine. standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java Lines 1105 (patched) <https://reviews.apache.org/r/68805/#comment293219> MetaException. Also, it should at least include info to help identify what exactly failed, i.e. txnid, tableid, param/value. W/o it's impossible to correlate this error batch id, etc. I'ld also add a LOG.warn() so that it's visible in the log file. It seems you have a requirement that the parameter exist. Perhaps as part of the error code path, you can do another query to see if does exist - I bet that would be a common error. standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java Lines 2289 (patched) <https://reviews.apache.org/r/68805/#comment293220> why not make (tbleid,key,value) it's own object. Then this object in CommitTxnRequest can be optional but all 3 fields in it can be mandatory. as it is you are checking if they are set here and in TxnHandler.commit... standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java Lines 101 (patched) <https://reviews.apache.org/r/68805/#comment293221> Nit: what is the advantage of using direct jdbc calls to modify the metastore DBMS. Why not run "cretate table ...", "Alter table..." though Driver and "describe table to see the value" standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java Lines 135 (patched) <https://reviews.apache.org/r/68805/#comment293222> should probably check that you got the right exception not just "any exception", i.e. check the message. - Eugene Koifman On Sept. 21, 2018, 3:51 p.m., Jaume Marhuenda wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68805/ > --- > > (Updated Sept. 21, 2018, 3:51 p.m.) > > > Review request for hive. > > > Repository: hive-git > > > Description > --- > > HIVE-20538: Allow to store a key value together with a transaction. > > > Diffs > - > > > standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CommitTxnRequest.java > db47f9db8b > > standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php > 22deffe1d3 > > standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py > 38fac465d7 > > standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb > 0192c6da31 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java > df6d56b679 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java > 54e7eda0da > standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift > 85a5c601e0 > > standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java > d76049eda1 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java > ce590d0f55 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java > db4dd9ec42 > > > Diff: https://reviews.apache.org/r/68805/diff/1/ > > > Testing > --- > > > Thanks, > > Jaume Marhuenda > >
[jira] [Created] (HIVE-20604) Minor compaction disables ORC column stats
Eugene Koifman created HIVE-20604: - Summary: Minor compaction disables ORC column stats Key: HIVE-20604 URL: https://issues.apache.org/jira/browse/HIVE-20604 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 4.0.0 {noformat} @Override public org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter getRawRecordWriter(Path path, Options options) throws IOException { final Path filename = AcidUtils.createFilename(path, options); final OrcFile.WriterOptions opts = OrcFile.writerOptions(options.getTableProperties(), options.getConfiguration()); if (!options.isWritingBase()) { opts.bufferSize(OrcRecordUpdater.DELTA_BUFFER_SIZE) .stripeSize(OrcRecordUpdater.DELTA_STRIPE_SIZE) .blockPadding(false) .compress(CompressionKind.NONE) .rowIndexStride(0) ; } {noformat} {{rowIndexStride(0)}} makes {{StripeStatistics.getColumnStatistics()}} return objects but with meaningless values, like min/max for {{IntegerColumnStatistics}} set to MIN_LONG/MAX_LONG. This interferes with ability to infer min ROW_ID for a split but also creates inefficient files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20581) Eliminate rename() from full CRUD transactional tables
Eugene Koifman created HIVE-20581: - Summary: Eliminate rename() from full CRUD transactional tables Key: HIVE-20581 URL: https://issues.apache.org/jira/browse/HIVE-20581 Project: Hive Issue Type: Improvement Components: Transactions Reporter: Eugene Koifman The {{MoveTask}} in a query writing to full CRUD transactional table still performs a {{FileSystem.rename()}}. Full CRUD should follow the insert-only transactional table implementation and write directly to delta_x_x in the partition dir. If the txn fails, this delta will be marked aborted and will not be read. There are several places that rely on this rename. For example, support for {{Insert ... select ... Union All ... Select }} which creates multiple dirs, 1 for each leg of the union. Others? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20580) OrcInputFormat.isOriginal() should not rely on hive.acid.key.index
Eugene Koifman created HIVE-20580: - Summary: OrcInputFormat.isOriginal() should not rely on hive.acid.key.index Key: HIVE-20580 URL: https://issues.apache.org/jira/browse/HIVE-20580 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.1.0 Reporter: Eugene Koifman {{org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isOriginal()}} is checking for presence of {{hive.acid.key.index}} in the footer. This is only created when the file is written by {{OrcRecordUpdater}}. It should instead check for presence of Acid metadata columns so that a file can be produced by something other than {{OrcRecordUpater}}. Also, {{hive.acid.key.index}} counts number of different type of events which is not really useful for Acid V2 (as of Hive 3) since each file only has 1 type of event. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20579) VectorizedOrcAcidRowBatchReader.checkBucketId() should run for unbucketed tables
Eugene Koifman created HIVE-20579: - Summary: VectorizedOrcAcidRowBatchReader.checkBucketId() should run for unbucketed tables Key: HIVE-20579 URL: https://issues.apache.org/jira/browse/HIVE-20579 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.1.0 Reporter: Eugene Koifman VectorizedOrcAcidRowBatchReader.checkBucketId() currently bails for unbucketed tables since HIVE-19890 all BucketCodec.decodeWriterId(ROW__ID.bucketid) should match the writer ID in the file name (e.g. bucket_1) so it should still perform the check -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20553) more acid stats tests
Eugene Koifman created HIVE-20553: - Summary: more acid stats tests Key: HIVE-20553 URL: https://issues.apache.org/jira/browse/HIVE-20553 Project: Hive Issue Type: Improvement Components: Statistics, Transactions Affects Versions: 4.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-20553.01.patch -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20460) AcidUtils.Directory.getAbortedDirectories() may be missed for full CRUD tables
Eugene Koifman created HIVE-20460: - Summary: AcidUtils.Directory.getAbortedDirectories() may be missed for full CRUD tables Key: HIVE-20460 URL: https://issues.apache.org/jira/browse/HIVE-20460 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman {{Directory.getAbortedDirectories()}} lists deltas where all txns in the range are aborted. These are then purged by {{Worker}} (\{{CompactorMR}} but only for insert-only tables. Full CRUD tables currently rely on {{FileSystem.rename()}} in {{MoveTask}} and so no reader (or {{Cleaner}} should every see a delta where all data is aborted. Once rename() is eliminated for full CRUD (just like insert-only) transactional tables, Cleaner (or Worker) should take care of these. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20459) add ThriftHiveMetastore.get_open_txns(long txnid)
Eugene Koifman created HIVE-20459: - Summary: add ThriftHiveMetastore.get_open_txns(long txnid) Key: HIVE-20459 URL: https://issues.apache.org/jira/browse/HIVE-20459 Project: Hive Issue Type: Improvement Components: Metastore, Transactions Reporter: Eugene Koifman we currently have {{ThriftHiveMetastore.get_open_txns()}} which maps to {{TxnHandler.getOpenTxns()}}. The usual usage is {{TxnUtils.createValidReadTxnList(GetOpenTxnsResponse txns, long currentTxn)}} where the complete list transactions is obtained from Metastore and then anything above currentTxn is thrown away. Would be useful to add {{ThriftHiveMetastore.get_open_txns(long txnid)}} and {{TxnHandler.getOpenTxns(long)}} to not retrieve things that will be thrown away. Especially when there are a lot of running transactions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20458) hive-schema-3.1.0.postgres.sql - some tables are not quoted
Eugene Koifman created HIVE-20458: - Summary: hive-schema-3.1.0.postgres.sql - some tables are not quoted Key: HIVE-20458 URL: https://issues.apache.org/jira/browse/HIVE-20458 Project: Hive Issue Type: Improvement Components: Standalone Metastore, Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman a number of tables related to transactional metadata are not quoted in this script: COMPACTION_QUEUE, HIVE_LOCKS, etc this causes Postgres to create the tables in lower case. The table creation scripts should follow the same convention as other tables. hive-schema...mysql.sql also doesn't quote Create Table for acid meta tables -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20454) CLONE - extend inheritPerms to ACID in Hive 1.X
Eugene Koifman created HIVE-20454: - Summary: CLONE - extend inheritPerms to ACID in Hive 1.X Key: HIVE-20454 URL: https://issues.apache.org/jira/browse/HIVE-20454 Project: Hive Issue Type: Bug Reporter: Eugene Koifman Assignee: Sergey Shelukhin Fix For: 2.4.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20436) Lock Manager scalability - linear
Eugene Koifman created HIVE-20436: - Summary: Lock Manager scalability - linear Key: HIVE-20436 URL: https://issues.apache.org/jira/browse/HIVE-20436 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Hive TransactionManager currently has a mix of lock based and optimistic concurrency management techniques (which at times overlap). For inserts with Dynamic Partitions that represents update/merge it acquires locks on each existing partition which can flood the metastore DB. Need to clean up the logical model and the implementation. This will be an umbrella Jira for this -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20435) Failed Dynamic Partition Insert into insert only table may looks transaction metadata
Eugene Koifman created HIVE-20435: - Summary: Failed Dynamic Partition Insert into insert only table may looks transaction metadata Key: HIVE-20435 URL: https://issues.apache.org/jira/browse/HIVE-20435 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman {{TxnHandler.enqueueLockWithRetry()}} has an optimization where it doesn't writ to {{TXN_COMPONENTS}} if the write is a dynamic partition insert because it expects to write to this table from {{addDynamicPartitions()}}. For insert-only, transactional tables, we create the target dir and start writing to it before {{addDynamicPartitions()}} is called. So if a txn is aborted, we may have a delta dir in the partition but no corresponding entry in {{TXN_COMPONENTS}}. This means {{TxnStore.cleanEmptyAbortedTxns()}} may clean up {{TXNS}} entry for the aborted transaction before Compactor removes this delta dir, at which point it looks like committed data. Full CRUD are currently immune to this since they rely on "move" operation in MoveTask but longer term they should follow the same model as insert-only tables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20410) aborted Insert Overwrite on transactional table causes "Not enough history available for..." error
Eugene Koifman created HIVE-20410: - Summary: aborted Insert Overwrite on transactional table causes "Not enough history available for..." error Key: HIVE-20410 URL: https://issues.apache.org/jira/browse/HIVE-20410 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman suppose insert overwrite T values(1) is aborted. this creates a base_x directory (for insert-only transactional tables currently and for full CRUD once 'rename' in the MoveTask is eliminated) but subsequent read fails with "Not enough history available for..." error. The problem is that the logic to produce this exception finds this base_x but treats it as if it was produced by a compactor, in which case the error would'v been appropriate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20392) make compaction atomic on S3
Eugene Koifman created HIVE-20392: - Summary: make compaction atomic on S3 Key: HIVE-20392 URL: https://issues.apache.org/jira/browse/HIVE-20392 Project: Hive Issue Type: Bug Components: Transactions Reporter: Eugene Koifman Assignee: Eugene Koifman -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20369) TestPreUpgradeTool not run by ptest
Eugene Koifman created HIVE-20369: - Summary: TestPreUpgradeTool not run by ptest Key: HIVE-20369 URL: https://issues.apache.org/jira/browse/HIVE-20369 Project: Hive Issue Type: Bug Components: Transactions Reporter: Eugene Koifman Assignee: Eugene Koifman TestPreUpgradeTool is not showing up in ptest runs probably because upgrade-acid module is disconnected from root pom how does standalone-metastore work? it's also disconnected also, hive-upgrade jar is not showing up in tar with mvn package -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 68281: HIVE-20354
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68281/#review207051 --- ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java Lines 972 (patched) <https://reviews.apache.org/r/68281/#comment290236> what if some table is named "select_table" - Eugene Koifman On Aug. 9, 2018, 12:19 p.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68281/ > --- > > (Updated Aug. 9, 2018, 12:19 p.m.) > > > Review request for hive, Eugene Koifman and Jason Dere. > > > Bugs: HIVE-20354 > https://issues.apache.org/jira/browse/HIVE-20354 > > > Repository: hive-git > > > Description > --- > > Semijoin hints dont work with merge statements. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f4d12ae564 > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java > 463880587e > > ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java > 8df290435d > ql/src/test/queries/clientpositive/semijoin_hint.q de176affd3 > ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 679916de07 > > > Diff: https://reviews.apache.org/r/68281/diff/2/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
Re: Review Request 68281: HIVE-20354
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/68281/#review207043 --- ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java Lines 1000 (patched) <https://reviews.apache.org/r/68281/#comment290214> Modifying parse tree directly is not a good idea - it messes up internal ANTLR strucutres and may cause issues downstream. You should inject the hint into 'rewrittenQueryStr' so that a complete new statement is parsed - that is the model for all other parts of Merge reparsing. ql/src/test/queries/clientpositive/semijoin_hint.q Lines 116 (patched) <https://reviews.apache.org/r/68281/#comment290215> it may be useful to one statment with hint and another w/o hint - to see clearly the difference in the plan. - Eugene Koifman On Aug. 9, 2018, 10:44 a.m., Deepak Jaiswal wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/68281/ > --- > > (Updated Aug. 9, 2018, 10:44 a.m.) > > > Review request for hive, Eugene Koifman and Jason Dere. > > > Bugs: HIVE-20354 > https://issues.apache.org/jira/browse/HIVE-20354 > > > Repository: hive-git > > > Description > --- > > Semijoin hints dont work with merge statements. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f4d12ae564 > ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java > 463880587e > > ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java > 8df290435d > ql/src/test/queries/clientpositive/semijoin_hint.q de176affd3 > ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 679916de07 > > > Diff: https://reviews.apache.org/r/68281/diff/1/ > > > Testing > --- > > > Thanks, > > Deepak Jaiswal > >
[jira] [Created] (HIVE-20327) Compactor should gracefully handle 0 length files and invalid orc files
Eugene Koifman created HIVE-20327: - Summary: Compactor should gracefully handle 0 length files and invalid orc files Key: HIVE-20327 URL: https://issues.apache.org/jira/browse/HIVE-20327 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 2.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Older versions of Streaming API did not handle interrupts well and could leave 0-length ORC files behind which cannot be read. These should be just skipped. Other cases of file where ORC Reader cannot be created 1. regular write (1 txn delta) where the client died and didn't properly close the file - this delta should be aborted and never read 2. streaming ingest write (delta_x_y, x < y). There should always be a side file if the file was not closed properly. (though it may still indicate that length is 0) If we check these cases and still can't create a reader, it should not silently skip the file since the system thinks it contains at least some committed data but the file is corrupted (and the side file doesn't point at a valid footer) - we should never be in this situation and we should throw so that the end user can try manual intervention (where the only option may be deleting the file) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20324) change hive.compactor.max.num.delta default to 50
Eugene Koifman created HIVE-20324: - Summary: change hive.compactor.max.num.delta default to 50 Key: HIVE-20324 URL: https://issues.apache.org/jira/browse/HIVE-20324 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 2.0.0 Reporter: Eugene Koifman current default is 500 - this is way to hight. OOM is likely at 50 or so. Need to update the default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20313) consider making ROW__ID a 1st class object
Eugene Koifman created HIVE-20313: - Summary: consider making ROW__ID a 1st class object Key: HIVE-20313 URL: https://issues.apache.org/jira/browse/HIVE-20313 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 0.11.0 Reporter: Eugene Koifman ROW__ID, which is a struct that represents a unique row ID within a partition of a full CRUD transactional table is currently modeled as a {{VirtualColumn}}. Acid metadata columns from which ROW__ID is built are actually stored in the data file. There is no end to special handling of acid metadata columns in the code to make this work. Perhaps a better approach is to add struct column to an acid table at creation time and make it a 1st class citizen visible in the metastore. 'select count(*) ' would need special handling to remove it. There may need to be a way to make these columns read-only. For data added via Load Data, Add Partition, etc (i.e. original files in a CRUD table), acid reader would have fill in the values as it does today. This would make schema evolution, PPD, projection pruning work seamlessly. This should also make adding formats other than ORC in full CRUD tables easy. This will likely be painful but should be investigated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20305) LlapRecordReader uses OrcInputFormat.getRootColumn(false)
Eugene Koifman created HIVE-20305: - Summary: LlapRecordReader uses OrcInputFormat.getRootColumn(false) Key: HIVE-20305 URL: https://issues.apache.org/jira/browse/HIVE-20305 Project: Hive Issue Type: Bug Components: llap, Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman LlapRecordReader uses OrcInputFormat.getRootColumn(false) so it seems to assume that if {{AcidUtils.isFullAcidScan(jobConf)}} then the underlying file has acid meta columns in it. That is not true, for data added via Load Data, Add Partition or converting flat table to full CRUD acid via Alter Table (by setting transactional=true tbl property). cc [~teddy.choi] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20234) Add an option to disable stats computation from Compactor
Eugene Koifman created HIVE-20234: - Summary: Add an option to disable stats computation from Compactor Key: HIVE-20234 URL: https://issues.apache.org/jira/browse/HIVE-20234 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Currently \{{Woker.StatsUpdater}} will run \{{analyze table ... compute statistics for columns ...}} at the end of each Major compaction to update stats on columns that already have stats. It would be useful to add a config option that allows better control over this. I could have 3 values: don't update col stats, update existing col stats, update all col stats. Should this have ability to update table level stats? Is that needed given HIVE-19532? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20218) make sure Statement.executeUpdate() returns number of rows affected
Eugene Koifman created HIVE-20218: - Summary: make sure Statement.executeUpdate() returns number of rows affected Key: HIVE-20218 URL: https://issues.apache.org/jira/browse/HIVE-20218 Project: Hive Issue Type: Improvement Components: JDBC, Transactions Affects Versions: 1.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman HiveStatement and HivePreparedStatement currently return 0 in all cases -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 67969: HIVE-20115 Acid tables should not use footer scan for analyze
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67969/#review206245 --- ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java Line 87 (original), 88 (patched) <https://reviews.apache.org/r/67969/#comment289150> This is not introduced in this patch, but maybe this "if" should be a method since the same condition is checked in 3 places - to keep it in sync. ql/src/test/queries/clientpositive/acid_no_buckets.q Lines 37 (patched) <https://reviews.apache.org/r/67969/#comment289149> I don't understand this comment. There was update/insert done (line 25) since last analyze at line 22-23. Shouldn't analyze at 34-35 change stats? Or are they auto updated after each statement? - Eugene Koifman On July 18, 2018, 4:19 p.m., Sergey Shelukhin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67969/ > --- > > (Updated July 18, 2018, 4:19 p.m.) > > > Review request for hive and Eugene Koifman. > > > Repository: hive-git > > > Description > --- > > see jira > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java > 64f9c70f05 > ql/src/java/org/apache/hadoop/hive/ql/parse/ProcessAnalyzeTable.java > 03cceace40 > ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 49709e596e > > ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java > 28d4de7f7b > ql/src/test/queries/clientpositive/acid_no_buckets.q bcf9e0634b > ql/src/test/results/clientpositive/llap/acid_no_buckets.q.out 36a6a5d5d1 > > > Diff: https://reviews.apache.org/r/67969/diff/1/ > > > Testing > --- > > > Thanks, > > Sergey Shelukhin > >
[jira] [Created] (HIVE-20137) Truncate for Transactional tables should use base_x
Eugene Koifman created HIVE-20137: - Summary: Truncate for Transactional tables should use base_x Key: HIVE-20137 URL: https://issues.apache.org/jira/browse/HIVE-20137 Project: Hive Issue Type: Improvement Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman This is a follow up to HIVE-19387. Once we have a lock that blocks writers but not readers (HIVE-19369), it would make sense to make truncate create a new base_x, where is x is a writeId in current txn - the same as Insert Overwrite does. This would mean it can work w/o interfering with existing writers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-20119) permissions on files in transactional tables
Eugene Koifman created HIVE-20119: - Summary: permissions on files in transactional tables Key: HIVE-20119 URL: https://issues.apache.org/jira/browse/HIVE-20119 Project: Hive Issue Type: Bug Components: Transactions Reporter: Eugene Koifman Assignee: Eugene Koifman What should these be? With doAs they end up being owned by the user and then depending on umask cleaner may not be able to delete them - thus compaction is marked as failed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 67712: HIVE-19820 add ACID stats support to background stats updater
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/67712/#review205443 --- itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java Line 424 (original), 424 (patched) <https://reviews.apache.org/r/67712/#comment288363> arg4? arg5? is this decompiled code? ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java Lines 291 (patched) <https://reviews.apache.org/r/67712/#comment288365> there are several read ops in this txn - what semantics is the txn trying to achive here? ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java Line 296 (original), 324 (patched) <https://reviews.apache.org/r/67712/#comment288366> 0 is not a valid transaction id ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java Line 412 (original), 440 (patched) <https://reviews.apache.org/r/67712/#comment288367> 0 is not a valid txn id - Eugene Koifman On June 22, 2018, 5:29 p.m., Sergey Shelukhin wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/67712/ > --- > > (Updated June 22, 2018, 5:29 p.m.) > > > Review request for hive, Eugene Koifman and Seong (Steve) Yeom. > > > Repository: hive-git > > > Description > --- > > see jira > > > Diffs > - > > > itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java > 580bae9c3f1307325842a08275e085a8e31f9351 > ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java > ddca70497a3f51c3ec9ea532fac2a42aa36149b3 > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java > dd0929f2b9748d83d55ccc271cec6aa07933bde1 > ql/src/test/org/apache/hadoop/hive/ql/stats/TestStatsUpdaterThread.java > 14f86eabbcf4bfc38c92294cd5d71d4905eb5c30 > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java > 4296084381df1e109248820b96739a4eb5ee0490 > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java > 51e081b22fa27b013715bb6eddf7fbbcf6bbd061 > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java > 9266879ad0134dbf87598af6f9305b73cc8c40ba > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java > 8cc9d2c586a411712d01d599ff2986f6ad5e0cfd > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java > e4894fa12bfee78f51f3796e0ccaaf51c7ac4136 > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java > 001c3edcff5a4d0ea67b73e83075b1f867342654 > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java > d6a882e8e98f92eefbdb7900bdf43e3274a21c5d > > standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java > c9a6a471cb7fc28845efb6d774601dba0cef2a85 > > > Diff: https://reviews.apache.org/r/67712/diff/1/ > > > Testing > --- > > > Thanks, > > Sergey Shelukhin > >
[jira] [Created] (HIVE-19965) Make HiveEndPoint use IMetaStoreClient.add_partition
Eugene Koifman created HIVE-19965: - Summary: Make HiveEndPoint use IMetaStoreClient.add_partition Key: HIVE-19965 URL: https://issues.apache.org/jira/browse/HIVE-19965 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman it currently uses "alter table add partition if exists..." which since HIVE-18814 requires X lock on the table which blocks other streaming writers from making progress. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19962) remove hive.txn.operational.properties
Eugene Koifman created HIVE-19962: - Summary: remove hive.txn.operational.properties Key: HIVE-19962 URL: https://issues.apache.org/jira/browse/HIVE-19962 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman hive.txn.operational.properties should be removed and refs to it clean up - now that Acid V2 is in, this is no longer needed -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19961) Add partition if exists on transactional CRUD table acquires X lock
Eugene Koifman created HIVE-19961: - Summary: Add partition if exists on transactional CRUD table acquires X lock Key: HIVE-19961 URL: https://issues.apache.org/jira/browse/HIVE-19961 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman This is necessary for correctness since each add partition consists of 2 parts # Add Partition metadata object to metastore # Create a delta dir and copy data there. This means it's neither Atomic not Isolated. Isolation is fixed by using X lock (which is currently on the table. todo: see if it can be made on the partition being created - this may block table level locks...) Atomicity would have to be addressed by adding a write ID to Partition to that it's not visible until Hive transaction has committed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19917) Import of full CRUD transactional table fails if table is not in default database
Eugene Koifman created HIVE-19917: - Summary: Import of full CRUD transactional table fails if table is not in default database Key: HIVE-19917 URL: https://issues.apache.org/jira/browse/HIVE-19917 Project: Hive Issue Type: Bug Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman The actual issues is fixed by HIVE-19861. This is a follow up to add a test case. Issue: {noformat} org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:940) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:945) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.exec.DDLTask.createTableLike(DDLTask.java:5099) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:433) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeAcidExport(UpdateDeleteSemanticAnalyzer.java:195) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeInternal(UpdateDeleteSemanticAnalyzer.java:106) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:658) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1813) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1760) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1755) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:194) ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:257) ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hive.service.cli.operation.Operation.run(Operation.java:243) ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541) ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527) ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312) ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562) ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_112] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_112] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112] Caused by: java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:164) ~[hadoop-common-3.0.0.3.0.0.0-1485.jar
[jira] [Created] (HIVE-19908) Block Insert Overwrite with Union All on full CRUD ACID tables using HIVE_UNION_SUBDIR_
Eugene Koifman created HIVE-19908: - Summary: Block Insert Overwrite with Union All on full CRUD ACID tables using HIVE_UNION_SUBDIR_ Key: HIVE-19908 URL: https://issues.apache.org/jira/browse/HIVE-19908 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman This currently results in data loss. Will block and suggest using truncate + insert. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19800) Handle rename files post HIVE-19751
Eugene Koifman created HIVE-19800: - Summary: Handle rename files post HIVE-19751 Key: HIVE-19800 URL: https://issues.apache.org/jira/browse/HIVE-19800 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman this is a followup to HIVE-19751 which includes HIVE-19751 since it hasn't landed yet this includes file rename logic and HIVE-19750 since it hasn't landed yet either cc [~jdere] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19751) create submodule of hive-upgrade-acid for preUpgrade and postUpgrade
Eugene Koifman created HIVE-19751: - Summary: create submodule of hive-upgrade-acid for preUpgrade and postUpgrade Key: HIVE-19751 URL: https://issues.apache.org/jira/browse/HIVE-19751 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Basically need to produce 2 separate jars: 1 for pre-upgrade step that can be compiled/unit tested with 2.x jars and another can be compiled/tested with 3.x jars. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19750) Initialize NEXT_WRITE_ID. NWI_NEXT on converting an existing table to full acid
Eugene Koifman created HIVE-19750: - Summary: Initialize NEXT_WRITE_ID. NWI_NEXT on converting an existing table to full acid Key: HIVE-19750 URL: https://issues.apache.org/jira/browse/HIVE-19750 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 3.1.0 Need to set this to a reasonably high value the the table. This will reserve a range of write IDs that will be treated by the system as committed. This is needed so that we can assign unique ROW__IDs to each row in files that already exist in the table. For example, if the value is initialized to the number of files currently in the table, we can think of each file as written by a separate transaction and thus a free to assign bucketProperty (BucketCodec) of ROW_ID in whichever way is convenient. it's guaranteed that all rows get unique ROW_IDs this way. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19749) Acid V1 to V2 upgrade
Eugene Koifman created HIVE-19749: - Summary: Acid V1 to V2 upgrade Key: HIVE-19749 URL: https://issues.apache.org/jira/browse/HIVE-19749 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman umbrella jira -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19735) Transactional table: rename partition
Eugene Koifman created HIVE-19735: - Summary: Transactional table: rename partition Key: HIVE-19735 URL: https://issues.apache.org/jira/browse/HIVE-19735 Project: Hive Issue Type: Bug Reporter: Eugene Koifman Assignee: Eugene Koifman Hive supports renaming a partiton [https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenamePartition] is this addressed by HIVE-18748? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19714) TransactionalValidationListene.conformToAcid() only checks table level StorageDescriptor
Eugene Koifman created HIVE-19714: - Summary: TransactionalValidationListene.conformToAcid() only checks table level StorageDescriptor Key: HIVE-19714 URL: https://issues.apache.org/jira/browse/HIVE-19714 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 2.2.0 Reporter: Eugene Koifman A table may actually have different SD for each partition so a proper check to for full CRUD table would check all of them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19606) Straggler thread in HS2 for rename directory operation stuck in loop causing performance issue and cluster slowdown
Eugene Koifman created HIVE-19606: - Summary: Straggler thread in HS2 for rename directory operation stuck in loop causing performance issue and cluster slowdown Key: HIVE-19606 URL: https://issues.apache.org/jira/browse/HIVE-19606 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 1.0.0 Reporter: Eugene Koifman -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19599) Release Notes : Highlighting backwards incompatible changes
Eugene Koifman created HIVE-19599: - Summary: Release Notes : Highlighting backwards incompatible changes Key: HIVE-19599 URL: https://issues.apache.org/jira/browse/HIVE-19599 Project: Hive Issue Type: Bug Components: Documentation Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Vineet Garg We need to highlight backwards incompatible changes. A list Jira titles won't be sufficient. For example, tables with Acid V1 (pre 3.0) data has to be major compacted before upgrade and may not process any update/delete/merge until after upgrade. Not doing so may result in data corruption/loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19598) Acid V1 to V2 upgrade
Eugene Koifman created HIVE-19598: - Summary: Acid V1 to V2 upgrade Key: HIVE-19598 URL: https://issues.apache.org/jira/browse/HIVE-19598 Project: Hive Issue Type: Bug Components: Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman Assignee: Eugene Koifman The on-disk layout for full acid (transactional) tables has changed 3.0. Any transactional table that has any update/delete events in any deltas that have not been Major compacted, must go through a Major compaction before upgrading to 3.0. No more update/delete/merge should be run after/during major compaction. Not doing so will result in data corruption/loss. Need to create a utility tool to help with this process. HIVE-19233 started this but it needs more work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19569) alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable()
Eugene Koifman created HIVE-19569: - Summary: alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable() Key: HIVE-19569 URL: https://issues.apache.org/jira/browse/HIVE-19569 Project: Hive Issue Type: Bug Components: Metastore, Standalone Metastore, Transactions Affects Versions: 3.0.0 Reporter: Eugene Koifman When renaming a table within the same DB, this operation causes {{MetaStoreEventListener.onAlterTable()}} to fire but when changing DB name for a table it causes {{MetaStoreEventListener.onDropTable()}} + {{MetaStoreEventListener.onCreateTable()}}. The files from original table are moved to new table location. This creates confusing semantics since any logic in {{onDropTable()}} doesn't know about the larger context, i.e. that there will be a matching {{onCreateTable()}}. In particular, this causes a problem for Acid tables since files moved from old table use WriteIDs that are not meaningful with the context of new table. Current implementation is due to replication. This should ideally be changed to raise a "not supported" error for tables that are marked for replication. cc [~sankarh] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (HIVE-19387) CLONE - Truncate table for Acid tables
Eugene Koifman created HIVE-19387: - Summary: CLONE - Truncate table for Acid tables Key: HIVE-19387 URL: https://issues.apache.org/jira/browse/HIVE-19387 Project: Hive Issue Type: New Feature Components: Transactions Reporter: Eugene Koifman Assignee: Eugene Koifman How should this work? Should it work like Insert Overwrite T select * from T where 1=2? This should create a new empty base_x/ and thus operate w/o violating Snapshot Isolation semantics. This makes sense for specific partition or unpartitioned table. What about "Truncate T" where T is partitioned? Is the expectation to wipe out all partition info or to make each partition empty? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Review Request 66645: HIVE-19211: New streaming ingest API and support for dynamic partitioning
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66645/#review202231 --- Ship it! Ship It! - Eugene Koifman On May 1, 2018, 2:53 p.m., Prasanth_J wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66645/ > --- > > (Updated May 1, 2018, 2:53 p.m.) > > > Review request for hive, Ashutosh Chauhan and Eugene Koifman. > > > Bugs: HIVE-19211 > https://issues.apache.org/jira/browse/HIVE-19211 > > > Repository: hive-git > > > Description > --- > > HIVE-19211: New streaming ingest API and support for dynamic partitioning > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5a13726 > > hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java > 90dbdac > itests/hive-unit/pom.xml 3ae7f2f > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > 8ee033d > metastore/src/java/org/apache/hadoop/hive/metastore/HiveClientCache.java > PRE-CREATION > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreUtils.java > a66c135 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 09f8802 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 76569d5 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java f6608eb > serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java PRE-CREATION > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java > 8c159e9 > streaming/pom.xml b58ec01 > streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java > 25998ae > streaming/src/java/org/apache/hive/streaming/ConnectionError.java 668bffb > streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/DelimitedInputWriter.java > 898b3f9 > streaming/src/java/org/apache/hive/streaming/HeartBeatFailure.java b1f9520 > streaming/src/java/org/apache/hive/streaming/HiveEndPoint.java b04e137 > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/ImpersonationFailed.java > 23e17e7 > streaming/src/java/org/apache/hive/streaming/InvalidColumn.java 0011b14 > streaming/src/java/org/apache/hive/streaming/InvalidPartition.java f1f9804 > streaming/src/java/org/apache/hive/streaming/InvalidTable.java ef1c91d > streaming/src/java/org/apache/hive/streaming/InvalidTransactionState.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/InvalidTrasactionState.java > 762f5f8 > streaming/src/java/org/apache/hive/streaming/PartitionCreationFailed.java > 5f9aca6 > streaming/src/java/org/apache/hive/streaming/PartitionHandler.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/PartitionInfo.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/QueryFailedException.java > ccd3ae0 > streaming/src/java/org/apache/hive/streaming/RecordWriter.java dc6d70e > streaming/src/java/org/apache/hive/streaming/SerializationError.java > a57ba00 > streaming/src/java/org/apache/hive/streaming/StreamingConnection.java > 2f760ea > streaming/src/java/org/apache/hive/streaming/StreamingException.java > a7f84c1 > streaming/src/java/org/apache/hive/streaming/StreamingIOFailure.java > 0dfbfa7 > > streaming/src/java/org/apache/hive/streaming/StrictDelimitedInputWriter.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/StrictJsonWriter.java 0077913 > streaming/src/java/org/apache/hive/streaming/StrictRegexWriter.java c0b7324 > streaming/src/java/org/apache/hive/streaming/TransactionBatch.java 2b05771 > > streaming/src/java/org/apache/hive/streaming/TransactionBatchUnAvailable.java > a8c8cd4 > streaming/src/java/org/apache/hive/streaming/TransactionError.java a331b20 > streaming/src/test/org/apache/hive/streaming/TestDelimitedInputWriter.java > f0843a1 > streaming/src/test/org/apache/hive/streaming/TestStreaming.java 0ec3048 > > streaming/src/test/org/apache/hive/streaming/TestStreamingDynamicPartitioning.java > PRE-CREATION > > > Diff: https://reviews.apache.org/r/66645/diff/12/ > > > Testing > --- > > > Thanks, > > Prasanth_J > >
Re: Review Request 66645: HIVE-19211: New streaming ingest API and support for dynamic partitioning
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/66645/#review202201 --- streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Line 490 (original), 414 (patched) <https://reviews.apache.org/r/66645/#comment283952> should these be final? streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java Lines 442 (patched) <https://reviews.apache.org/r/66645/#comment283953> what does this achieve? you said that this API doesn't support concurrency and begin/commit/close/ect have be sequntial. That means that minTxnId of the batch can only change linearly. So what does atomicReference buy you over 'volotile' for example? commitImpl, is not atomic - it calls msClient.commitTxn() and then adjusts minTxn. But a hreartbeat between these 2 will end up heartbeating a committed txn... It seems that maxTxn never changes, and the only thing that needs to be updated in the HeartbeatRunnable is the minTxn which needs to be volatile. Is there something else that this is trying to solve? streaming/src/test/org/apache/hive/streaming/TestStreaming.java Lines 589 (patched) <https://reviews.apache.org/r/66645/#comment283950> followup jira? hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java Line 442 (original), 442 (patched) <https://reviews.apache.org/r/66645/#comment283943> is there a follow up jira for this? itests/hive-unit/pom.xml Lines 79 (patched) <https://reviews.apache.org/r/66645/#comment283944> what change requires this? ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java Lines 471 (patched) <https://reviews.apache.org/r/66645/#comment283945> if you have only 1 txn in a batch, why call flush at all? (this flush() is called when commit() is called) . Won't closign the file do the right thing? - Eugene Koifman On April 30, 2018, 4:10 p.m., Prasanth_J wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/66645/ > --- > > (Updated April 30, 2018, 4:10 p.m.) > > > Review request for hive, Ashutosh Chauhan and Eugene Koifman. > > > Bugs: HIVE-19211 > https://issues.apache.org/jira/browse/HIVE-19211 > > > Repository: hive-git > > > Description > --- > > HIVE-19211: New streaming ingest API and support for dynamic partitioning > > > Diffs > - > > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6e35653 > > hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java > 90dbdac > itests/hive-unit/pom.xml 3ae7f2f > > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java > 8ee033d > metastore/src/java/org/apache/hadoop/hive/metastore/HiveClientCache.java > PRE-CREATION > metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreUtils.java > a66c135 > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 09f8802 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 76569d5 > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 4661881 > serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java PRE-CREATION > > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java > 8c159e9 > streaming/pom.xml b58ec01 > streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java > 25998ae > streaming/src/java/org/apache/hive/streaming/ConnectionError.java 668bffb > streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/DelimitedInputWriter.java > 898b3f9 > streaming/src/java/org/apache/hive/streaming/HeartBeatFailure.java b1f9520 > streaming/src/java/org/apache/hive/streaming/HiveEndPoint.java b04e137 > streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/ImpersonationFailed.java > 23e17e7 > streaming/src/java/org/apache/hive/streaming/InvalidColumn.java 0011b14 > streaming/src/java/org/apache/hive/streaming/InvalidPartition.java f1f9804 > streaming/src/java/org/apache/hive/streaming/InvalidTable.java ef1c91d > streaming/src/java/org/apache/hive/streaming/InvalidTransactionState.java > PRE-CREATION > streaming/src/java/org/apache/hive/streaming/InvalidTrasactionState.java > 762f5f8 &
[jira] [Created] (HIVE-19377) TestTxnExIm - did not produce a TEST-*.xml file (likely timed out) (batchId=286)
Eugene Koifman created HIVE-19377: - Summary: TestTxnExIm - did not produce a TEST-*.xml file (likely timed out) (batchId=286) Key: HIVE-19377 URL: https://issues.apache.org/jira/browse/HIVE-19377 Project: Hive Issue Type: Sub-task Reporter: Eugene Koifman {{TestTxnExIm - did not produce a TEST-*.xml file (likely timed out) (batchId=286)}} appears routinely in runs. {{mvn test -Dtest=TestExIm}} fails locally with {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on project hive-exec: There are test failures. [ERROR] [ERROR] Please refer to /Users/ekoifman/IdeaProjects/hive/ql/target/surefire-reports for the individual test results. [ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, [date].dumpstream and [date]-jvmRun[N].dumpstream. [ERROR] ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /Users/ekoifman/IdeaProjects/hive/ql && /Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/bin/java -Xmx2048m -jar /Users/ekoifman/IdeaProjects/hive/ql/target/surefire/surefirebooter4071469472847953044.jar /Users/ekoifman/IdeaProjects/hive/ql/target/surefire 2018-05-01T10-56-27_610-jvmRun1 surefire8633599521000249236tmp surefire_02958205604336140780tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 1 [ERROR] Crashed tests: [ERROR] org.apache.hadoop.hive.ql.TestTxnExIm [ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: ExecutionException The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /Users/ekoifman/IdeaProjects/hive/ql && /Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/bin/java -Xmx2048m -jar /Users/ekoifman/IdeaProjects/hive/ql/target/surefire/surefirebooter4071469472847953044.jar /Users/ekoifman/IdeaProjects/hive/ql/target/surefire 2018-05-01T10-56-27_610-jvmRun1 surefire8633599521000249236tmp surefire_02958205604336140780tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 1 [ERROR] Crashed tests: [ERROR] org.apache.hadoop.hive.ql.TestTxnExIm [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:494) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:441) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:293) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:245) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1149) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:978) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:854) [ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81) [ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194) [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107) [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:955) [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:290) [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:194) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [ERROR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [ERROR] at java.lang.reflect.Method.invoke(Method.java:498) [E