[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501516#comment-14501516 ] Alan Gates commented on HIVE-10228: --- +1 Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.4.patch, HIVE-10228.5.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. Also, import currently does not work with dbname.tablename kind of specification, this should be fixed to work. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property repl.last.id, which when tagged on to table properties or partition properties on a replication-destination, holds the effective state identifier of the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14501585#comment-14501585 ] Lefty Leverenz commented on HIVE-10228: --- Doc note: HIVE-10264 will document everything related to replication, including the configuration parameter added here (*hive.exim.strict.repl.tables*). Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Fix For: 1.2.0 Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.4.patch, HIVE-10228.5.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. Also, import currently does not work with dbname.tablename kind of specification, this should be fixed to work. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property repl.last.id, which when tagged on to table properties or partition properties on a replication-destination, holds the effective state identifier of the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499370#comment-14499370 ] Alan Gates commented on HIVE-10228: --- Ok, so once the comments are added on why DROP TABLE FOR REPLICATION is different than IF EXISTS and the new JIRAs referenced in review board I filed I'm +1 on this patch. Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.4.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. Also, import currently does not work with dbname.tablename kind of specification, this should be fixed to work. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property repl.last.id, which when tagged on to table properties or partition properties on a replication-destination, holds the effective state identifier of the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500436#comment-14500436 ] Sushanth Sowmyan commented on HIVE-10228: - Thanks Alan, I have created HIVE-10381 for that other issue, and added comments in code to expand on what DROP TABLE FOR REPLICATION is doing. Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.4.patch, HIVE-10228.5.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. Also, import currently does not work with dbname.tablename kind of specification, this should be fixed to work. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property repl.last.id, which when tagged on to table properties or partition properties on a replication-destination, holds the effective state identifier of the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500822#comment-14500822 ] Sushanth Sowmyan commented on HIVE-10228: - The reported failed tests are not related to this patch. Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.4.patch, HIVE-10228.5.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. Also, import currently does not work with dbname.tablename kind of specification, this should be fixed to work. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property repl.last.id, which when tagged on to table properties or partition properties on a replication-destination, holds the effective state identifier of the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500790#comment-14500790 ] Hive QA commented on HIVE-10228: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12726232/HIVE-10228.5.patch {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 8727 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_decimal_precision2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3480/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3480/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3480/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12726232 - PreCommit-HIVE-TRUNK-Build Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.4.patch, HIVE-10228.5.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. Also, import currently does not work with dbname.tablename kind of specification, this should be fixed to work. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes,
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497895#comment-14497895 ] Alan Gates commented on HIVE-10228: --- Wow, when I saw it was a 150K patch I was hoping it was mostly generated code. No such luck. Code level comments on review board, higher level below: This stuff needs some major doc work as you're introducing a new concept of a table being replicated or generated from replication. Is there a doc JIRA for the replication work yet? If so we should link it to this JIRA. Parser changes: I don't understand why DROP TABLE needs the replication clause. As far as I can tell from the changes in DDLSemanticAnalyzer this is semantically equivalent to IF EXISTS. Why not use that? Adding METADATA and REPLICATION as keywords is not backwards compatible. We either need to explicitly note that in this JIRA or add them to the list of reserved keywords allowed as identifiers in IdentifiersParser.g. I suspect the latter is a better choice. Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. Also, import currently does not work with dbname.tablename kind of specification, this should be fixed to work. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property repl.last.id, which when tagged on to table properties or partition properties on a replication-destination, holds the effective state identifier of the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498877#comment-14498877 ] Sushanth Sowmyan commented on HIVE-10228: - Sorry, yeah, this is a big patch. :) It's really a cumulative patch of a bunch of work, but a lot of that was overwriting itself so much that splitting them out into a bunch of patches would have been difficult. Forking hive to do dev of this on a separate branch and merging in one go might have been easier. I'd created https://issues.apache.org/jira/browse/HIVE-10264 as a doc jira, and I've attached a presentation-like document there outlining various points of why we're doing a bunch of what we're doing, but that still needs some wiki-fication that I am working on. I've also attached the replay-protocol document on that jira after updating it slightly with your question on DROP TABLE here. I'll reply to code-level comments on review board, and reply to your higher-level comments here. DROP TABLE : This is not quite a DROP TABLE IF EXISTS, it's a DROP TABLE IF OLDER THAN(x). There are a couple of cases this can happen in: a) To make it more resilient in cases of parallelization of events (in the cases of a worker that times out and does not respond back, for eg., but might still be running, albeit slowly in the background), one of the goals of all Commands generated by Replication is that they should be idempotent, and reprocessing of events older than the state of an object should not cause any error. So, if one drone that's processing events (41,42,43) might perform 41 and then not respond back for a significant amount of time, causing Falcon to queue another HiveDR job that starts performing (41,42,43), and 43 might return successfully before the other job performs 42, and then failing. So, one of the early design goals was that all commands should be resilient to repeats. This is a way of achieving that goal. b) In the case of a CREATE1-DROP1-CREATE2-REPL(CREATE1)-REPL(DROP1)-REPL(CREATE2), since the REPL(CREATE1) occurs after CREATE2, it picks up a newer state of the table, and the destination is at a newer state than the table which was dropped. Thus, by making the DROP ignore the destination table if it's already newer than the event that spawned the DROP, we can optimize away a bit of re-importing that REPL(CREATE2) would have needed to do. In the future, we'll add in event-nullification, and can do it at a higher level if we batch events, but this helps out even when processing at an individual level. c) In addition to a DROP-IF-OLDER, it also acts like a recursive DROP-TABLE-IF-OLDER for cases where it doesn't result in the dropping of the table, it will still result in dropping older partitions in a newer table. For eg., if a T(state=50) has partitions P1(state=45) and P2(state=53), then DROP_TABLE_IF_OLDER_THAN(47) will drop P1 but not P2. This is because a Drop-table event does not result in a series of DropPtn events that are associated with the appropriate table. So, given that our replication works on an per-object basis, if DropTable should not drop the destination table because the destination table is newer than the origin table at the time of the drop, it might still contain older partitions which should be nuked. (This mode is tested in one of the tests in TestCommands in HIVE-10227 if you want to have a look at an example of what's expected) -- Regarding the kewword addition, thanks for the feedback, it was not my intent to make them reserved keywords. I talked to [~pxiong] and [~ashutoshc] about it, and the latter is the way that makes sense. As long as I add them to the nonReserved entry in IdentifiersParser.g, it should be good. So, I'll add that in and have another update here. Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496376#comment-14496376 ] Sushanth Sowmyan commented on HIVE-10228: - RB link : https://reviews.apache.org/r/33224/ Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. Also, import currently does not work with dbname.tablename kind of specification, this should be fixed to work. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property repl.last.id, which when tagged on to table properties or partition properties on a replication-destination, holds the effective state identifier of the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495671#comment-14495671 ] Vaibhav Gumashta commented on HIVE-10228: - [~sushanth] If possible an RB entry will help a lot. Thanks. Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property repl.last.id, which when tagged on to table properties or partition properties on a replication-destination, holds the effective state identifier of the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14492982#comment-14492982 ] Hive QA commented on HIVE-10228: {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12724991/HIVE-10228.2.patch Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3408/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3408/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3408/ Messages: {noformat} This message was trimmed, see log for full details Reverted 'service/src/gen/thrift/gen-py/hive_service/constants.py' Reverted 'service/src/gen/thrift/gen-py/hive_service/ThriftHive-remote' Reverted 'service/src/gen/thrift/gen-py/TCLIService/ttypes.py' Reverted 'service/src/gen/thrift/gen-py/TCLIService/TCLIService.py' Reverted 'service/src/gen/thrift/gen-py/TCLIService/constants.py' Reverted 'service/src/gen/thrift/gen-py/TCLIService/TCLIService-remote' Reverted 'service/src/gen/thrift/gen-cpp/hive_service_types.cpp' Reverted 'service/src/gen/thrift/gen-cpp/TCLIService_types.cpp' Reverted 'service/src/gen/thrift/gen-cpp/TCLIService.h' Reverted 'service/src/gen/thrift/gen-cpp/ThriftHive.h' Reverted 'service/src/gen/thrift/gen-cpp/hive_service_types.h' Reverted 'service/src/gen/thrift/gen-cpp/TCLIService_types.h' Reverted 'service/src/gen/thrift/gen-cpp/hive_service_constants.cpp' Reverted 'service/src/gen/thrift/gen-cpp/TCLIService_constants.cpp' Reverted 'service/src/gen/thrift/gen-cpp/TCLIService.cpp' Reverted 'service/src/gen/thrift/gen-cpp/ThriftHive.cpp' Reverted 'service/src/gen/thrift/gen-cpp/hive_service_constants.h' Reverted 'service/src/gen/thrift/gen-cpp/TCLIService_constants.h' Reverted 'service/src/gen/thrift/gen-rb/hive_service_types.rb' Reverted 'service/src/gen/thrift/gen-rb/t_c_l_i_service_constants.rb' Reverted 'service/src/gen/thrift/gen-rb/hive_service_constants.rb' Reverted 'service/src/gen/thrift/gen-rb/t_c_l_i_service.rb' Reverted 'service/src/gen/thrift/gen-rb/thrift_hive.rb' Reverted 'service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/HiveClusterStatus.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/HiveServerException.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/JobTrackerState.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TCancelOperationReq.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStatusCode.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeQualifierValue.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetFunctionsReq.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeDesc.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TCloseSessionReq.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TRowSet.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TStringColumn.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTableTypesReq.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TCLIServiceConstants.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetCatalogsResp.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetColumnsReq.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TI16Value.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TByteValue.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TMapTypeEntry.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetFunctionsResp.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TBinaryColumn.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TTypeEntry.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchOrientation.java' Reverted 'service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TGetTableTypesResp.java' Reverted
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493208#comment-14493208 ] Hive QA commented on HIVE-10228: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12725034/HIVE-10228.3.patch {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 8677 tests executed *Failed tests:* {noformat} TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more - did not produce a TEST-*.xml file TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce a TEST-*.xml file {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3411/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3411/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3411/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12725034 - PreCommit-HIVE-TRUNK-Build Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493210#comment-14493210 ] Sushanth Sowmyan commented on HIVE-10228: - Note : Visiting the test report page http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3411/testReport/ shows 0 failures. The issues reported above are from TestMinimrCliDriver not producing TEST-*.xml files, which are unrelated to this patch. Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.2.patch, HIVE-10228.3.patch, HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property repl.last.id, which when tagged on to table properties or partition properties on a replication-destination, holds the effective state identifier of the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics
[ https://issues.apache.org/jira/browse/HIVE-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490439#comment-14490439 ] Sushanth Sowmyan commented on HIVE-10228: - Jumbo patch. [~alangates], could you please take a look? Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics -- Key: HIVE-10228 URL: https://issues.apache.org/jira/browse/HIVE-10228 Project: Hive Issue Type: Sub-task Components: Import/Export Affects Versions: 1.2.0 Reporter: Sushanth Sowmyan Assignee: Sushanth Sowmyan Attachments: HIVE-10228.patch We need to update a couple of hive commands to support replication semantics. To wit, we need the following: EXPORT ... [FOR [METADATA] REPLICATION(“comment”)] Export will now support an extra optional clause to tell it that this export is being prepared for the purpose of replication. There is also an additional optional clause here, that allows for the export to be a metadata-only export, to handle cases of capturing the diff for alter statements, for example. Also, if done for replication, the non-presence of a table, or a table being a view/offline table/non-native table is not considered an error, and instead, will result in a successful no-op. IMPORT ... (as normal) – but handles new semantics No syntax changes for import, but import will have to change to be able to handle all the permutations of export dumps possible. Also, import will have to ensure that it should update the object only if the update being imported is not older than the state of the object. DROP TABLE ... FOR REPLICATION('eventid') Drop Table now has an additional clause, to specify that this drop table is being done for replication purposes, and that the dop should not actually drop the table if the table is newer than that event id specified. ALTER TABLE ... DROP PARTITION (...) FOR REPLICATION('eventid') Similarly, Drop Partition also has an equivalent change to Drop Table. = In addition, we introduce a new property repl.last.id, which when tagged on to table properties or partition properties on a replication-destination, holds the effective state identifier of the object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)