[
https://issues.apache.org/jira/browse/HIVE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125502#comment-13125502
]
John Sichi commented on HIVE-1496:
----------------------------------
Discussion from IRC:
{noformat}
ssalbiz: jsichi: I was looking at ashutosh 's patch for 1496, and I was
wondering if the problem with it the lack of atomicity? It seems to chain the
map-red tasks to populate the index with the DDLTask correctly if I'm reading
the patch/testcase right
[4:37pm] jsichi: ssalbiz: you're right; I misread the patch--didn't notice the
addIdxTasks part. ashutosh, sorry about that.
[4:40pm] jsichi: but I don't think we should be calling db.createIndex directly
from DDLSemanticAnalyzer...should still be chaining in the DDLWork for that
[4:41pm] ssalbiz: right, I agree
[4:43pm] ashutosh: jsichi: I rememeber you commenting on the jira that
atomicity will be an issue, but its ok to solve it seperately in a followup work
[4:44pm] jsichi: ashutosh: agreed. But we should still be following the usual
pattern for executing the metastore update from within a task (rather than
analyzer)
[4:45pm] jsichi: Another followup is to support a mode whereby updates to a
table trigger a rebuild of the corresponding index partitions.
[4:47pm] jsichi: I guess the reason you had to do it early (in the analyzer) is
that the build-task generation requires the metastore to already be populated.
[4:47pm] ashutosh: yeah.. correct thats the reason
[4:47pm] jsichi: hmmm
[4:48pm] ashutosh: build-task assumes all the data to be populated
[4:55pm] jsichi: I guess the only way to resolve that would be to factor out
the code that knows how to make up an Index object from a CreateIndexDesc, and
then create a temp during analysis (then discard it), then later create the
real one when the task executes.
[4:56pm] ashutosh: yeah.. i think temp object approach may work
[4:57pm] ashutosh: but probably will churn around lot of code
[4:58pm] jsichi: having EXPLAIN able to show what's gonna be done without
actually doing it seems like a valuable guarantee to preserve
[4:58pm] jsichi: (see e.g. HIVE-2478)
[5:01pm] ashutosh: ⁃actually temp obj approach may not work because createIndex
task connects to metastore to get the metadata … so it must exist in metastore
[5:02pm] jsichi: that's only for verifying that the index name does not
conflict, rigth?
[5:05pm] jsichi: woohoo, finally gonna get a clean trunk build on Jenkins since
I'm about to commit HIVE-2493!
[5:05pm] ashutosh: awesome
[5:05pm] jsichi: already got a clean run on 0.8
[5:07pm] ashutosh: for me HBase tests always fail
[5:07pm] ashutosh: with exception NoRegionServerFound exception
[5:07pm] ashutosh: whats the magic there ?
[5:08pm] ssalbiz: looking at the TableBasedIndexHandler code, it does a bunch
of checks to ensure that the partition specs of the table and index are in sync
in the metastore. I think it would be possible to write a helper method in
TableBasedIndexHandler that can be used to generate Index Map-Red Tasks without
relying on the metastore at all if we can assume that the Index ms partition
spec and the base table partition spec are going to be consistent when the da
[5:09pm] ssalbiz: seems like less code churn than trying to feed the current
method mock metastore/Index objects to make those checks pass
[5:10pm] jsichi: ashutosh: hmm, dunno...I'll bet there's a real exception
buried somewhere deep in the logs...
[5:10pm] jsichi: ssalbiz: I don't think we want to change the index handler
interface though
[5:15pm] ssalbiz: hmmm, ok, in that case I guess we will have to feed temp
objects to the existing generateIndexBuildTaskList method
[5:18pm] ashutosh: one possibility to avoid explain problem is to not execute
ms operation in semantic analyzer if its an explain query
[5:19pm] jsichi: There's already precedent for the temp objects....e.g.
Hive.createIndex already calls indexHandler.analyzeIndexDefinition with
indexDesc and tt params which haven't actually been written to the metastore
yet.
[5:20pm] jsichi: ashutosh: that wouldn't work, would it, since the explain is
supposed to show the index build tasks too
[5:22pm] ashutosh: hmm.. right.. it wont show it.. but will atleast prevent
execution of unwanted ms operation in case of explain ...
[5:22pm] ashutosh: i can take a look at temp object approach
[5:23pm] jsichi: ssalbiz is actually going to work on indexing again as part of
a school project, so if you're OK with it, we can assign back to him.
[5:25pm] ashutosh: ya.. thats fine..
[5:25pm] ashutosh: he can take it up
[5:25pm] jsichi: OK cool. I'll copy-paste this conversation into JIRA in case
we forget anything later.
{noformat}
> enhance CREATE INDEX to support immediate index build
> -----------------------------------------------------
>
> Key: HIVE-1496
> URL: https://issues.apache.org/jira/browse/HIVE-1496
> Project: Hive
> Issue Type: Improvement
> Components: Indexing
> Affects Versions: 0.7.0, 0.8.0
> Reporter: John Sichi
> Assignee: Syed S. Albiz
> Attachments: hive-1496.patch
>
>
> Currently we only support WITH DEFERRED REBUILD.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira