[jira] [Commented] (HIVE-17657) export/import for MM tables is broken

Sankar Hariappan (JIRA) Tue, 01 May 2018 23:26:23 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-17657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460575#comment-16460575
 ]


Sankar Hariappan commented on HIVE-17657:
-----------------------------------------

{quote}

[~sankarh] delta directory of the import after the import will be listed as a 
single delta... 
the directories inside the delta directory (export_delta..) are not meaningful 
for ACID, they are just contents of that single delta. It's the same as e.g. 
instead of having two files in the delta directory, "foo_000000" and 
"bar_000000", having the same files in a directory structure, e.g. 
"foo/000000", "bar/000000".

Not sure I understand the 2nd question. Change compared to what? Afair files 
are imported directly so there's no rename.

{quote}

[~sershe],

Did you mean, the directory structure would be, *table/partition_data_location 
-> delta_dir -> export_delta -> 000000_0* ? If yes, then no issues for 
replication. I assumed if "mm_table_import" is also a directory under which 
delta_dir is created.

About 2nd question, what I meant was, once delta directory is moved/created in 
target table/partition_data_location, will the directory/file names or 
directory structure undergo any change (except compaction/cleaner flow)?

Replication related Events are captured in MoveTask which lists the new files 
and in this case all files in all sub-dirs. So, once these files are listed and 
if this path changes for some reason, then incremental REPL LOAD won't work as 
source file is not accessible. However, if cleaner delete these directories, it 
will be archived in CM root and hence REPL LOAD will able to read it from there.

 

> export/import for MM tables is broken
> -------------------------------------
>
>                 Key: HIVE-17657
>                 URL: https://issues.apache.org/jira/browse/HIVE-17657
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Transactions
>            Reporter: Eugene Koifman
>            Assignee: Sergey Shelukhin
>            Priority: Major
>              Labels: mm-gap-2
>         Attachments: HIVE-17657.01.patch, HIVE-17657.02.patch, 
> HIVE-17657.03.patch, HIVE-17657.04.patch, HIVE-17657.05.patch, 
> HIVE-17657.06.patch, HIVE-17657.07.patch, HIVE-17657.patch
>
>
> there is mm_exim.q but it's not clear from the tests what file structure it 
> creates 
> On import the txnids in the directory names would have to be remapped if 
> importing to a different cluster.  Perhaps export can be smart and export 
> highest base_x and accretive deltas (minus aborted ones).  Then import can 
> ...?  It would have to remap txn ids from the archive to new txn ids.  This 
> would then mean that import is made up of several transactions rather than 1 
> atomic op.  (all locks must belong to a transaction)
> One possibility is to open a new txn for each dir in the archive (where 
> start/end txn of file name is the same) and commit all of them at once (need 
> new TMgr API for that).  This assumes using a shared lock (if any!) and thus 
> allows other inserts (not related to import) to occur.
> What if you have delta_6_9, such as a result of concatenate?  If we stipulate 
> that this must mean that there is no delta_6_6 or any other "obsolete" delta 
> in the archive we can map it to a new single txn delta_x_x.
> Add read_only mode for tables (useful in general, may be needed for upgrade 
> etc) and use that to make the above atomic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-17657) export/import for MM tables is broken

Reply via email to