[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128407#comment-16128407 ] Rui Li commented on HIVE-17321: --- [~kellyzly], w/o the patch, analyze table w/o noscan/partialscan will launch a job containing only a TS. Therefore there won't be an FS to update the stats. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-13532) Mapjoin should set realuser's username
[ https://issues.apache.org/jira/browse/HIVE-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] feiwei reassigned HIVE-13532: - Assignee: feiwei > Mapjoin should set realuser's username > -- > > Key: HIVE-13532 > URL: https://issues.apache.org/jira/browse/HIVE-13532 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 > Environment: HADOOP_PROXY_USER is set. >Reporter: Zhiwen Sun >Assignee: feiwei > > Map join set HADOOP_USER_NAME should be realuser's username. > Current, hive set HADOOP_USER_NAME env for mapjoin local process according: > {quote} >String endUserName = Utils.getUGI().getShortUserName(); > {quote} > suppose set HADOOP_PROXY_USER=abc in shell. > map join local job will have following env: > {quote} > HADOOP_USER_NAME=abc > HADOOP_PROXY_NAME=abc > {quote} > this will cause such exception: > {quote} > java.io.IOException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): > User: abc is not allowed to impersonate > {quote} > I think we should set HADOOP_USER_NAME to realuser: > {quote} >String endUserName = Utils.getUGI().getRealUser().getShortUserName(); > {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-13532) Mapjoin should set realuser's username
[ https://issues.apache.org/jira/browse/HIVE-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128399#comment-16128399 ] feiwei commented on HIVE-13532: --- you can do this in MapredLocalTask.java UserGroupInformation ug = Utils.getUGI().getRealUser(); String endUserName = ""; if(ug == null){ endUserName = Utils.getUGI().getShortUserName(); } else{ endUserName = ug.getShortUserName(); } or String endUserName = ""; UserGroupInformation ug1 = Utils.getUGI(); if (ug1.getAuthenticationMethod().equals(AuthenticationMethod.PROXY)){ endUserName = ug.getRealUser().getShortUserName(); } else{ endUserName = ug.getShortUserName(); } because when getAuthenticationMethod() return is not PROXY, getRealUser() will return null. > Mapjoin should set realuser's username > -- > > Key: HIVE-13532 > URL: https://issues.apache.org/jira/browse/HIVE-13532 > Project: Hive > Issue Type: Bug >Affects Versions: 1.1.0 > Environment: HADOOP_PROXY_USER is set. >Reporter: Zhiwen Sun > > Map join set HADOOP_USER_NAME should be realuser's username. > Current, hive set HADOOP_USER_NAME env for mapjoin local process according: > {quote} >String endUserName = Utils.getUGI().getShortUserName(); > {quote} > suppose set HADOOP_PROXY_USER=abc in shell. > map join local job will have following env: > {quote} > HADOOP_USER_NAME=abc > HADOOP_PROXY_NAME=abc > {quote} > this will cause such exception: > {quote} > java.io.IOException: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): > User: abc is not allowed to impersonate > {quote} > I think we should set HADOOP_USER_NAME to realuser: > {quote} >String endUserName = Utils.getUGI().getRealUser().getShortUserName(); > {quote} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17330) refactor TezSessionPoolManager to separate its multiple functions
[ https://issues.apache.org/jira/browse/HIVE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128396#comment-16128396 ] Hive QA commented on HIVE-17330: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12882065/HIVE-17330.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 10378 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=163) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.ql.TestAcidOnTez.testMapJoinOnTez (batchId=215) org.apache.hadoop.hive.ql.TestAcidOnTez.testMergeJoinOnTez (batchId=215) org.apache.hadoop.hive.ql.TestAcidOnTezWithSplitUpdate.testMapJoinOnTez (batchId=219) org.apache.hadoop.hive.ql.TestAcidOnTezWithSplitUpdate.testMergeJoinOnTez (batchId=219) org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testGetNonDefaultSession (batchId=277) org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testReturn (batchId=277) org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testSessionPoolGetInOrder (batchId=277) org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testSessionPoolThreads (batchId=277) org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testSessionReopen (batchId=277) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=222) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressWithHiveServer2ProgressBarDisabled (batchId=222) org.apache.hive.hcatalo
[jira] [Comment Edited] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128384#comment-16128384 ] anishek edited comment on HIVE-16886 at 8/16/17 6:21 AM: - Yeh we can do that though we have to explicitly parse and typecast the data-store identity in metastore code. Additionally sql query in datanuclues has to be used instead of object query for {code}public NotificationEventResponse getNextNotification(NotificationEventRequest rqst){code} in object store. I have the code in place which will address the current issue with the use of {{NL_ID}} as event id and remove the use of * {{MNotificationNextId}} * {{EVENT_ID}} from {{MNotificationLog}} such that without modifying the metastore db schema, we just populate a default of value "0" for this column in db. though the problem is how do we manage deployments who are using repl v1 who are dependent on {{EVENT_ID}} and with the new release suddenly will move to {{NL_ID}} * one way is we map both {{NL_ID}} and {{EVENT_ID}} in {{MNotificationLog}} and the external tool based on the value of {{EVENT_ID=0}} switches to using id's from {{NL_ID}} * other way is to completely redo the whole replication deployment with repl v2 rather than repl v1. was (Author: anishek): Yeh we can do that though we have to explicitly parse and typecast the data-store identity in metastore code. Additionally sql query from datastore has to be used for {code}public NotificationEventResponse getNextNotification(NotificationEventRequest rqst){code} in object store. I have the code in place which will address the current issue with the use of {{NL_ID}} as event id and remove the use of * {{MNotificationNextId}} * {{EVENT_ID}} from {{MNotificationLog}} such that without modifying the metastore db schema, we just populate a default of value "0" for this column in db. though the problem is how do we manage deployments who are using repl v1 who are dependent on {{EVENT_ID}} and with the new release suddenly will move to {{NL_ID}} * one way is we map both {{NL_ID}} and {{EVENT_ID}} in {{MNotificationLog}} and the external tool based on the value of {{EVENT_ID=0}} switches to using id's from {{NL_ID}} * other way is to completely redo the whole replication deployment with repl v2 rather than repl v1. > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i])
[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently
[ https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128384#comment-16128384 ] anishek commented on HIVE-16886: Yeh we can do that though we have to explicitly parse and typecast the data-store identity in metastore code. Additionally sql query from datastore has to be used for {code}public NotificationEventResponse getNextNotification(NotificationEventRequest rqst){code} in object store. I have the code in place which will address the current issue with the use of {{NL_ID}} as event id and remove the use of * {{MNotificationNextId}} * {{EVENT_ID}} from {{MNotificationLog}} such that without modifying the metastore db schema, we just populate a default of value "0" for this column in db. though the problem is how do we manage deployments who are using repl v1 who are dependent on {{EVENT_ID}} and with the new release suddenly will move to {{NL_ID}} * one way is we map both {{NL_ID}} and {{EVENT_ID}} in {{MNotificationLog}} and the external tool based on the value of {{EVENT_ID=0}} switches to using id's from {{NL_ID}} * other way is to completely redo the whole replication deployment with repl v2 rather than repl v1. > HMS log notifications may have duplicated event IDs if multiple HMS are > running concurrently > > > Key: HIVE-16886 > URL: https://issues.apache.org/jira/browse/HIVE-16886 > Project: Hive > Issue Type: Bug > Components: Hive, Metastore >Reporter: Sergio Peña >Assignee: anishek > > When running multiple Hive Metastore servers and DB notifications are > enabled, I could see that notifications can be persisted with a duplicated > event ID. > This does not happen when running multiple threads in a single HMS node due > to the locking acquired on the DbNotificationsLog class, but multiple HMS > could cause conflicts. > The issue is in the ObjectStore#addNotificationEvent() method. The event ID > fetched from the datastore is used for the new notification, incremented in > the server itself, then persisted or updated back to the datastore. If 2 > servers read the same ID, then these 2 servers write a new notification with > the same ID. > The event ID is not unique nor a primary key. > Here's a test case using the TestObjectStore class that confirms this issue: > {noformat} > @Test > public void testConcurrentAddNotifications() throws ExecutionException, > InterruptedException { > final int NUM_THREADS = 2; > CountDownLatch countIn = new CountDownLatch(NUM_THREADS); > CountDownLatch countOut = new CountDownLatch(1); > HiveConf conf = new HiveConf(); > conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, > MockPartitionExpressionProxy.class.getName()); > ExecutorService executorService = > Executors.newFixedThreadPool(NUM_THREADS); > FutureTask tasks[] = new FutureTask[NUM_THREADS]; > for (int i=0; i final int n = i; > tasks[i] = new FutureTask(new Callable() { > @Override > public Void call() throws Exception { > ObjectStore store = new ObjectStore(); > store.setConf(conf); > NotificationEvent dbEvent = > new NotificationEvent(0, 0, > EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n); > System.out.println("ADDING NOTIFICATION"); > countIn.countDown(); > countOut.await(); > store.addNotificationEvent(dbEvent); > System.out.println("FINISH NOTIFICATION"); > return null; > } > }); > executorService.execute(tasks[i]); > } > countIn.await(); > countOut.countDown(); > for (int i = 0; i < NUM_THREADS; ++i) { > tasks[i].get(); > } > NotificationEventResponse eventResponse = > objectStore.getNextNotification(new NotificationEventRequest()); > Assert.assertEquals(2, eventResponse.getEventsSize()); > Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId()); > // This fails because the next notification has an event ID = 1 > Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId()); > } > {noformat} > The last assertion fails expecting an event ID 1 instead of 2. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128375#comment-16128375 ] liyunzhang_intel commented on HIVE-17321: - [~lirui]: understand, but i am very curious why the raw data size of orc table is zero? When executing "INSERT OVERWRITE TABLE xxx SELECT * xxx",hive with orc will update statistics from orc footer in [FileSinkOperator#closeOp|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L1081] > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128187#comment-16128187 ] liyunzhang_intel edited comment on HIVE-17321 at 8/16/17 5:51 AM: -- [~lirui]: for orc, we need not compute raw data size by using noscan/partialscan. Because the statistic about raw data size is written to the metastore when the data load finish. More detail about how to collect raw data statistic you can see HIVE-17108. was (Author: kellyzly): [~lirui]: for orc, we need not compute raw data size by using noscan/partialscan. Because the statistic about raw data size is written to the metastore when the data load finish. More detail about how to collect raw data statistic you can see HIVE-17018. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17205) add functional support
[ https://issues.apache.org/jira/browse/HIVE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128339#comment-16128339 ] Hive QA commented on HIVE-17205: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12882063/HIVE-17205.09.patch {color:green}SUCCESS:{color} +1 due to 5 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10979 tests executed *Failed tests:* {noformat} TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=281) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[delete_non_acid_table] (batchId=90) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[update_non_acid_table] (batchId=90) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6413/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6413/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6413/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 13 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12882063 - PreCommit-HIVE-Build > add functional support > -- > > Key: HIVE-17205 > URL: https://issues.apache.org/jira/browse/HIVE-17205 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17205.01.patch, HIVE-17205.02.patch, > HIVE-17205.03.patch, HIVE-17205.09.patch > > > make sure unbucketed tables can be marked transactional=true > make insert/update/delete/compaction work -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128332#comment-16128332 ] Rui Li commented on HIVE-17321: --- [~kellyzly], the problem is if you run analyze table w/o noscan/partialscan, the raw data size will be set to 0. HIVE-9560 solved the issue but it was only for MR and Tez. So Spark and MR will have different query plan for the analyze command. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128310#comment-16128310 ] Rui Li commented on HIVE-17292: --- I'm not sure if it's worth the efforts to update the golden files. It seems the only benefit is to have the test results consistent with our configuration. There may be more benefit to the mini-yarn test, because currently we only have 1 executor while we intend to have 2 for the tests. Does it make sense to update the yarn test and leave the local-cluster test as is? [~xuefuz] what do you think? > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, > HIVE-17292.3.patch, HIVE-17292.5.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17100: Status: Patch Available (was: Open) > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, > HIVE-17100.03.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progress. Format is Event sequence no/ Total number of > events.{color} > * After completion of all event loads, will add a log as follows to > consolidate the load. > {color:#59afe1}* Target Database Name. > * Load Type (INCREMENTAL). > * Load End Time. > * Total number of events loaded. > * Last Repl ID of the loaded database.{color} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17100: Attachment: HIVE-17100.03.patch > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, > HIVE-17100.03.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progress. Format is Event sequence no/ Total number of > events.{color} > * After completion of all event loads, will add a log as follows to > consolidate the load. > {color:#59afe1}* Target Database Name. > * Load Type (INCREMENTAL). > * Load End Time. > * Total number of events loaded. > * Last Repl ID of the loaded database.{color} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17100: Attachment: (was: HIVE-17100.03.patch) > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progress. Format is Event sequence no/ Total number of > events.{color} > * After completion of all event loads, will add a log as follows to > consolidate the load. > {color:#59afe1}* Target Database Name. > * Load Type (INCREMENTAL). > * Load End Time. > * Total number of events loaded. > * Last Repl ID of the loaded database.{color} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17100: Status: Open (was: Patch Available) > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, > HIVE-17100.03.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progress. Format is Event sequence no/ Total number of > events.{color} > * After completion of all event loads, will add a log as follows to > consolidate the load. > {color:#59afe1}* Target Database Name. > * Load Type (INCREMENTAL). > * Load End Time. > * Total number of events loaded. > * Last Repl ID of the loaded database.{color} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128283#comment-16128283 ] Hive QA commented on HIVE-8472: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12882051/HIVE-8472.2-branch-2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10597 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=144) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=125) org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228) org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6412/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6412/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6412/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12882051 - PreCommit-HIVE-Build > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0, 2.4.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, > HIVE-8472.2-branch-2.patch, HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.
[ https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-16990: - Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to master. Thanks for the patch [~sankarh], and for the review [~anishek]! > REPL LOAD should update last repl ID only after successful copy of data files. > -- > > Key: HIVE-16990 > URL: https://issues.apache.org/jira/browse/HIVE-16990 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch, > HIVE-16990.03.patch, HIVE-16990.04.patch, HIVE-16990.05.patch > > > For REPL LOAD operations that includes both metadata and data changes should > follow the below rule. > 1. Copy the metadata excluding the last repl ID. > 2. Copy the data files > 3. If Step 1 and 2 are successful, then update the last repl ID of the object. > This rule will allow the the failed events to be re-applied by REPL LOAD and > ensures no data loss due to failures. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.
[ https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128279#comment-16128279 ] Thejas M Nair commented on HIVE-16990: -- +1 > REPL LOAD should update last repl ID only after successful copy of data files. > -- > > Key: HIVE-16990 > URL: https://issues.apache.org/jira/browse/HIVE-16990 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch, > HIVE-16990.03.patch, HIVE-16990.04.patch, HIVE-16990.05.patch > > > For REPL LOAD operations that includes both metadata and data changes should > follow the below rule. > 1. Copy the metadata excluding the last repl ID. > 2. Copy the data files > 3. If Step 1 and 2 are successful, then update the last repl ID of the object. > This rule will allow the the failed events to be re-applied by REPL LOAD and > ensures no data loss due to failures. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17100: Status: Patch Available (was: Open) > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, > HIVE-17100.03.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progress. Format is Event sequence no/ Total number of > events.{color} > * After completion of all event loads, will add a log as follows to > consolidate the load. > {color:#59afe1}* Target Database Name. > * Load Type (INCREMENTAL). > * Load End Time. > * Total number of events loaded. > * Last Repl ID of the loaded database.{color} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17100: Attachment: HIVE-17100.03.patch Added 03.patch after rebasing with master. > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch, > HIVE-17100.03.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progress. Format is Event sequence no/ Total number of > events.{color} > * After completion of all event loads, will add a log as follows to > consolidate the load. > {color:#59afe1}* Target Database Name. > * Load Type (INCREMENTAL). > * Load End Time. > * Total number of events loaded. > * Last Repl ID of the loaded database.{color} -- This message was sent by Atlassi
[jira] [Updated] (HIVE-17100) Improve HS2 operation logs for REPL commands.
[ https://issues.apache.org/jira/browse/HIVE-17100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-17100: Status: Open (was: Patch Available) > Improve HS2 operation logs for REPL commands. > - > > Key: HIVE-17100 > URL: https://issues.apache.org/jira/browse/HIVE-17100 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-17100.01.patch, HIVE-17100.02.patch > > > It is necessary to log the progress the replication tasks in a structured > manner as follows. > *+Bootstrap Dump:+* > * At the start of bootstrap dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (BOOTSTRAP) > * (Estimated) Total number of tables/views to dump > * (Estimated) Total number of functions to dump. > * Dump Start Time{color} > * After each table dump, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table dump end time > * Table dump progress. Format is Table sequence no/(Estimated) Total number > of tables and views.{color} > * After each function dump, will add a log as follows > {color:#59afe1}* Function Name > * Function dump end time > * Function dump progress. Format is Function sequence no/(Estimated) Total > number of functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > dump. > {color:#59afe1}* Database Name. > * Dump Type (BOOTSTRAP). > * Dump End Time. > * (Actual) Total number of tables/views dumped. > * (Actual) Total number of functions dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The actual and estimated number of tables/functions may not match if > any table/function is dropped when dump in progress. > *+Bootstrap Load:+* > * At the start of bootstrap load, will add one log with below details. > {color:#59afe1}* Database Name > * Dump directory > * Load Type (BOOTSTRAP) > * Total number of tables/views to load > * Total number of functions to load. > * Load Start Time{color} > * After each table load, will add a log as follows > {color:#59afe1}* Table/View Name > * Type (TABLE/VIEW/MATERIALIZED_VIEW) > * Table load completion time > * Table load progress. Format is Table sequence no/Total number of tables and > views.{color} > * After each function load, will add a log as follows > {color:#59afe1}* Function Name > * Function load completion time > * Function load progress. Format is Function sequence no/Total number of > functions.{color} > * After completion of all dumps, will add a log as follows to consolidate the > load. > {color:#59afe1}* Database Name. > * Load Type (BOOTSTRAP). > * Load End Time. > * Total number of tables/views loaded. > * Total number of functions loaded. > * Last Repl ID of the loaded database.{color} > *+Incremental Dump:+* > * At the start of database dump, will add one log with below details. > {color:#59afe1}* Database Name > * Dump Type (INCREMENTAL) > * (Estimated) Total number of events to dump. > * Dump Start Time{color} > * After each event dump, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event dump end time > * Event dump progress. Format is Event sequence no/ (Estimated) Total number > of events.{color} > * After completion of all event dumps, will add a log as follows. > {color:#59afe1}* Database Name. > * Dump Type (INCREMENTAL). > * Dump End Time. > * (Actual) Total number of events dumped. > * Dump Directory. > * Last Repl ID of the dump.{color} > *Note:* The estimated number of events can be terribly inaccurate with actual > number as we don’t have the number of events upfront until we read from > metastore NotificationEvents table. > *+Incremental Load:+* > * At the start of incremental load, will add one log with below details. > {color:#59afe1}* Target Database Name > * Dump directory > * Load Type (INCREMENTAL) > * Total number of events to load > * Load Start Time{color} > * After each event load, will add a log as follows > {color:#59afe1}* Event ID > * Event Type (CREATE_TABLE, DROP_TABLE, ALTER_TABLE, INSERT etc) > * Event load end time > * Event load progress. Format is Event sequence no/ Total number of > events.{color} > * After completion of all event loads, will add a log as follows to > consolidate the load. > {color:#59afe1}* Target Database Name. > * Load Type (INCREMENTAL). > * Load End Time. > * Total number of events loaded. > * Last Repl ID of the loaded database.{color} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.
[ https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16990: Status: Patch Available (was: Open) > REPL LOAD should update last repl ID only after successful copy of data files. > -- > > Key: HIVE-16990 > URL: https://issues.apache.org/jira/browse/HIVE-16990 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch, > HIVE-16990.03.patch, HIVE-16990.04.patch, HIVE-16990.05.patch > > > For REPL LOAD operations that includes both metadata and data changes should > follow the below rule. > 1. Copy the metadata excluding the last repl ID. > 2. Copy the data files > 3. If Step 1 and 2 are successful, then update the last repl ID of the object. > This rule will allow the the failed events to be re-applied by REPL LOAD and > ensures no data loss due to failures. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs
[ https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128256#comment-16128256 ] Hive QA commented on HIVE-17181: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12882046/HIVE-17181.1-branch-2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10584 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=144) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=102) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=125) org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228) org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6411/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6411/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6411/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12882046 - PreCommit-HIVE-Build > HCatOutputFormat should expose complete output-schema (including > partition-keys) for dynamic-partitioning MR jobs > - > > Key: HIVE-17181 > URL: https://issues.apache.org/jira/browse/HIVE-17181 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 2.2.0, 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17181.1-branch-2.patch, HIVE-17181.1.patch, > HIVE-17181.2.patch, HIVE-17181.3.patch > > > Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic > partitioning are expected to call the following API methods: > # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to > write to. This call populates the {{OutputJobInfo}} with details fetched from > the Metastore. > # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data > being written. > It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows: > {code:java} > HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf)); > {code} > Unfortunately, {{getTableSchema()}} returns only the record-schema, not the > entire table's schema. We'll need a better API for use in M/R jobs to get the > complete table-schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.
[ https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16990: Attachment: HIVE-16990.05.patch Added 05.patch after rebasing with master > REPL LOAD should update last repl ID only after successful copy of data files. > -- > > Key: HIVE-16990 > URL: https://issues.apache.org/jira/browse/HIVE-16990 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch, > HIVE-16990.03.patch, HIVE-16990.04.patch, HIVE-16990.05.patch > > > For REPL LOAD operations that includes both metadata and data changes should > follow the below rule. > 1. Copy the metadata excluding the last repl ID. > 2. Copy the data files > 3. If Step 1 and 2 are successful, then update the last repl ID of the object. > This rule will allow the the failed events to be re-applied by REPL LOAD and > ensures no data loss due to failures. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.
[ https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16990: Status: Open (was: Patch Available) > REPL LOAD should update last repl ID only after successful copy of data files. > -- > > Key: HIVE-16990 > URL: https://issues.apache.org/jira/browse/HIVE-16990 > Project: Hive > Issue Type: Sub-task > Components: Hive, repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, replication > Fix For: 3.0.0 > > Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch, > HIVE-16990.03.patch, HIVE-16990.04.patch > > > For REPL LOAD operations that includes both metadata and data changes should > follow the below rule. > 1. Copy the metadata excluding the last repl ID. > 2. Copy the data files > 3. If Step 1 and 2 are successful, then update the last repl ID of the object. > This rule will allow the the failed events to be re-applied by REPL LOAD and > ensures no data loss due to failures. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128240#comment-16128240 ] Sankar Hariappan commented on HIVE-17289: - Thanks [~daijy] for the review/commit! > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > Attachments: HIVE-17289.01.patch > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17330) refactor TezSessionPoolManager to separate its multiple functions
[ https://issues.apache.org/jira/browse/HIVE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-17330: --- Assignee: Sergey Shelukhin > refactor TezSessionPoolManager to separate its multiple functions > - > > Key: HIVE-17330 > URL: https://issues.apache.org/jira/browse/HIVE-17330 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17330.patch > > > TezSessionPoolManager would retain things specific to current Hive session > management. > The session pool itself, as well as expiration tracking, the pool session > implementation, and some config validation can be separated out and made > independent from the pool. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17330) refactor TezSessionPoolManager to separate its multiple functions
[ https://issues.apache.org/jira/browse/HIVE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17330: Status: Patch Available (was: Open) > refactor TezSessionPoolManager to separate its multiple functions > - > > Key: HIVE-17330 > URL: https://issues.apache.org/jira/browse/HIVE-17330 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17330.patch > > > TezSessionPoolManager would retain things specific to current Hive session > management. > The session pool itself, as well as expiration tracking, the pool session > implementation, and some config validation can be separated out and made > independent from the pool. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17330) refactor TezSessionPoolManager to separate its multiple functions
[ https://issues.apache.org/jira/browse/HIVE-17330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17330: Attachment: HIVE-17330.patch This mostly moves code (see JIRA description). One open question remaining is whether openSessions and closeAll... should also be moved into the pool from the manager. It looks like existing code only adds pool session to openSessions, and not custom user sessions. That might be a bug introduces with one of the previous changes, as the intent (e.g. closeIfNotDefault) seems to be for openSessions to contain both pool and non-pool sessions. If the latter is the case I'll also fix it here, will dig into the history tomorrow. cc [~sseth] > refactor TezSessionPoolManager to separate its multiple functions > - > > Key: HIVE-17330 > URL: https://issues.apache.org/jira/browse/HIVE-17330 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HIVE-17330.patch > > > TezSessionPoolManager would retain things specific to current Hive session > management. > The session pool itself, as well as expiration tracking, the pool session > implementation, and some config validation can be separated out and made > independent from the pool. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used
[ https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128213#comment-16128213 ] Sergey Shelukhin commented on HIVE-17327: - {noformat} 017-08-15T17:54:46,690 ERROR [8eb6300a-10f4-43ca-830b-7f533b8008a8 main] exec.Task: Failed to execute tez graph. java.lang.NullPointerException at org.apache.hadoop.hive.conf.HiveConf.getVarWithoutType(HiveConf.java:4042) ~[hive-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:356) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolManager.getSession(TezSessionPoolManager.java:559) ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:150) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] {noformat} looks like config is null. Might be test specific. Will look tomorrow if something needs to be done other than a null check. The rest of the patch still ready for review :) > LLAP IO: restrict native file ID usage to default FS to avoid hypothetical > collisions when HDFS federation is used > -- > > Key: HIVE-17327 > URL: https://issues.apache.org/jira/browse/HIVE-17327 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-17327.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17225) HoS DPP pruning sink ops can target parallel work objects
[ https://issues.apache.org/jira/browse/HIVE-17225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128212#comment-16128212 ] Hive QA commented on HIVE-17225: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12882037/HIVE17225.1.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10976 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_windowing2] (batchId=10) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[spark_dynamic_partition_pruning_recursive_mapjoin] (batchId=52) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=143) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6410/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6410/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6410/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12882037 - PreCommit-HIVE-Build > HoS DPP pruning sink ops can target parallel work objects > - > > Key: HIVE-17225 > URL: https://issues.apache.org/jira/browse/HIVE-17225 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 3.0.0 >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE17225.1.patch > > > Setup: > {code:sql} > SET hive.spark.dynamic.partition.pruning=true; > SET hive.strict.checks.cartesian.product=false; > SET hive.auto.convert.join=true; > CREATE TABLE partitioned_table1 (col int) PARTITIONED BY (part_col int); > CREATE TABLE regular_table1 (col int); > CREATE TABLE regular_table2 (col int); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 1); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 2); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 3); > INSERT INTO table regular_table1 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO table regular_table2 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 1) VALUES (1); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 2) VALUES (2); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 3) VALUES (3); > SELECT * > FROM partitioned_table1, >regular_table1 rt1, >regular_table2 rt2 > WHERE rt1.col = partitioned_table1.part_col >AND rt2.col = partitioned_table1.part_col; > {code} > Exception: > {code} > 2017-08-01T13:27:47,483 ERROR [b0d354a8-4cdb-4ba9-acec-27d14926aaf4 main] > ql.Driver: FAILED: Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.FileNotFoundException: File > file:/Users/stakiar/Documents/idea/apache-hive/itests/qtest-spark/target/tmp/scratchdir/stakiar/b0d354a8-4cdb-4ba9-acec-27d14926aaf4/hive_2017-08-01_13-27-45_553_1088589686371686526-1/-mr-10004/3/5 > does not exist > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:498) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at > org.a
[jira] [Updated] (HIVE-17205) add functional support
[ https://issues.apache.org/jira/browse/HIVE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17205: -- Attachment: HIVE-17205.09.patch > add functional support > -- > > Key: HIVE-17205 > URL: https://issues.apache.org/jira/browse/HIVE-17205 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17205.01.patch, HIVE-17205.02.patch, > HIVE-17205.03.patch, HIVE-17205.09.patch > > > make sure unbucketed tables can be marked transactional=true > make insert/update/delete/compaction work -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17321) HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan is not specified
[ https://issues.apache.org/jira/browse/HIVE-17321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128187#comment-16128187 ] liyunzhang_intel commented on HIVE-17321: - [~lirui]: for orc, we need not compute raw data size by using noscan/partialscan. Because the statistic about raw data size is written to the metastore when the data load finish. More detail about how to collect raw data statistic you can see HIVE-17018. > HoS: analyze ORC table doesn't compute raw data size when noscan/partialscan > is not specified > - > > Key: HIVE-17321 > URL: https://issues.apache.org/jira/browse/HIVE-17321 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Rui Li >Assignee: Rui Li >Priority: Minor > Attachments: HIVE-17321.1.patch > > > Need to implement HIVE-9560 for Spark. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used
[ https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128185#comment-16128185 ] Hive QA commented on HIVE-17327: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12882026/HIVE-17327.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 47 failed/errored test(s), 10413 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[setop_no_distinct] (batchId=77) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=160) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=161) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=162) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=163) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver (batchId=99) org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistCpAs (batchId=250) org.apache.hadoop.hive.common.TestFileUtils.testCopyWithDistcp (batchId=250) org.apache.hadoop.hive.ql.TestAcidOnTez.testMapJoinOnTez (batchId=215) org.apache.hadoop.hive.ql.TestAcidOnTez.testMergeJoinOnTez (batchId=215) org.apache.hadoop.hive.ql.TestAcidOnTezWithSplitUpdate.testMapJoinOnTez (batchId=219) org.apache.hadoop.hive.ql.TestAcidOnTezWithSplitUpdate.testMergeJoinOnTez (batchId=219) org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testGetNonDefaultSession (batchId=277) org.apache.hadoop.hive.ql.exec.tez.TestTezSessionPool.testSessionReopen (batchId=277) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=222) org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgressWithHiveServer2ProgressBarDisabled (batchId=222) org.apache.hive.hc
[jira] [Updated] (HIVE-17089) make acid 2.0 the default
[ https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17089: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) patch 16 committed to master thanks Sergey for the review cc [~saketj] > make acid 2.0 the default > - > > Key: HIVE-17089 > URL: https://issues.apache.org/jira/browse/HIVE-17089 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 3.0.0 > > Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, > HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, > HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch, > HIVE-17089.12.patch, HIVE-17089.13.patch, HIVE-17089.14.patch, > HIVE-17089.15.patch, HIVE-17089.16.patch > > > acid 2.0 is introduced in HIVE-14035. It replaces Update events with a > combination of Delete + Insert events. This now makes U=D+I the default (and > only) supported acid table type in Hive 3.0. > The expectation for upgrade is that Major compaction has to be run on all > acid tables in the existing Hive cluster and that no new writes to these > table take place since the start of compaction (Need to add a mechanism to > put a table in read-only mode - this way it can still be read while it's > being compacted). Then upgrade to Hive 3.0 can take place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17065) You can not successfully deploy hive clusters with Hive guidance documents
[ https://issues.apache.org/jira/browse/HIVE-17065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128164#comment-16128164 ] ZhangBing Lin commented on HIVE-17065: -- [~xuefuz],sorry,E-mail is not convenient, so I did not modify it on wiki > You can not successfully deploy hive clusters with Hive guidance documents > -- > > Key: HIVE-17065 > URL: https://issues.apache.org/jira/browse/HIVE-17065 > Project: Hive > Issue Type: Improvement > Components: Documentation >Reporter: ZhangBing Lin >Priority: Minor > Attachments: screenshot-1.png > > > When I follow the official document from cwiki > [https://cwiki.apache.org/confluence/display/Hive/GettingStarted] to build > Hive2.1.1 single node service encountered several problems:: > 1, the following to create the HIVE warehouse directory needs to be modified > A $ HADOOP_HOME / bin / hadoop fs -mkdir /user/hive/warehouse > B $ HADOOP_HOME / bin / hadoop fs -mkdir -p /user/hive/warehouse > Using B instead of A might be better > 2, the following two description positions need to be adjusted > A.Running Hive CLI > To use the Hive command line interface (CLI) from the shell: > $ $HIVE_HOME/bin/hive > B.Running HiveServer2 and Beeline > Starting from Hive 2.1, we need to run the schematool command below as an > initialization step. For example, we can use "derby" as db type. > $ $HIVE_HOME/bin/schematool -dbType -initSchema > When I execute the $HIVE_HOME/bin/hive command, the following error occurs: > !screenshot-1.png! > When I execute the following order, and then the implementation of hive order > problem solving: > $ HIVE_HOME/bin/schematool -dbType derby -initSchema -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()
[ https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128140#comment-16128140 ] Hive QA commented on HIVE-17169: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12882012/HIVE-17169.1-branch-2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10583 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=139) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=103) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=125) org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228) org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6408/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6408/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6408/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12882012 - PreCommit-HIVE-Build > Avoid extra call to KeyProvider::getMetadata() > -- > > Key: HIVE-17169 > URL: https://issues.apache.org/jira/browse/HIVE-17169 > Project: Hive > Issue Type: Bug > Components: Shims >Affects Versions: 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17169.1-branch-2.patch, HIVE-17169.1.patch > > > Here's the code from {{Hadoop23Shims}}: > {code:title=Hadoop23Shims.java|borderStyle=solid} > @Override > public int comparePathKeyStrength(Path path1, Path path2) throws > IOException { > EncryptionZone zone1, zone2; > zone1 = hdfsAdmin.getEncryptionZoneForPath(path1); > zone2 = hdfsAdmin.getEncryptionZoneForPath(path2); > if (zone1 == null && zone2 == null) { > return 0; > } else if (zone1 == null) { > return -1; > } else if (zone2 == null) { > return 1; > } > return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName()); > } > private int compareKeyStrength(String keyname1, String keyname2) throws > IOException { > KeyProvider.Metadata meta1, meta2; > if (keyProvider == null) { > throw new IOException("HDFS security key provider is not configured > on your server."); > } > meta1 = keyProvider.getMetadata(keyname1); > meta2 = keyProvider.getMetadata(keyname2); > if (meta1.getBitLength() < meta2.getBitLength()) { > return -1; > } else if (meta1.getBitLength() == meta2.getBitLength()) { > return 0; > } else { > return 1; > } > } > } > {code} > It turns out that {{EncryptionZone}} already has the cipher's bit-length > stored in a member variable. One shouldn't need an additional name-node call > ({{KeyProvider::getMetadata()}}) only to fetch it again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128135#comment-16128135 ] Ashutosh Chauhan commented on HIVE-17308: - +1 some minor comments on rb. > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17272) when hive.vectorized.execution.enabled is true, query on empty partitioned table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128126#comment-16128126 ] Vihang Karajgaonkar commented on HIVE-17272: +1 LGTM. I think the other way to fix this would have been in {{Vectorizer#validateInputFormatAndSchemaEvolution}} and return false if {{pathToPartitionInfo}} is empty. > when hive.vectorized.execution.enabled is true, query on empty partitioned > table fails with NPE > --- > > Key: HIVE-17272 > URL: https://issues.apache.org/jira/browse/HIVE-17272 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.1.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-17272.2.patch > > > {noformat} > set hive.vectorized.execution.enabled=true; > CREATE TABLE `tab`(`x` int) PARTITIONED BY ( `y` int) stored as parquet; > select * from tab t1 join tab t2 where t1.x=t2.x; > {noformat} > The query fails with the following exception. > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.createAndInitPartitionContext(VectorMapOperator.java:386) > ~[hive-exec-2.3.0.jar:2.3.0] > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.internalSetChildren(VectorMapOperator.java:559) > ~[hive-exec-2.3.0.jar:2.3.0] > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setChildren(VectorMapOperator.java:474) > ~[hive-exec-2.3.0.jar:2.3.0] > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) > ~[hive-exec-2.3.0.jar:2.3.0] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_101] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_101] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_101] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101] > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > ~[hadoop-common-2.6.0.jar:?] > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_101] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_101] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_101] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101] > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > ~[hadoop-common-2.6.0.jar:?] > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_101] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[?:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[?:1.8.0_101] > at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_101] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Attachment: HIVE-8472.2-branch-2.patch > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0, 2.4.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, > HIVE-8472.2-branch-2.patch, HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Status: Patch Available (was: Open) > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0, 2.4.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, > HIVE-8472.2-branch-2.patch, HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Status: Open (was: Patch Available) Resubmitting a trivial change, to get a baseline for {{branch-2}} failures. > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0, 2.4.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, > HIVE-8472.2-branch-2.patch, HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17256) add a notion of a guaranteed task to LLAP
[ https://issues.apache.org/jira/browse/HIVE-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128100#comment-16128100 ] Siddharth Seth commented on HIVE-17256: --- I did actually mean TaskExecutorService tests, but you say that is already covered. +1. (A short writeup on the overall plan would be useful for reference) > add a notion of a guaranteed task to LLAP > - > > Key: HIVE-17256 > URL: https://issues.apache.org/jira/browse/HIVE-17256 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17256.01.patch, HIVE-17256.patch > > > Tasks are basically on two levels, guaranteed and speculative, with > speculative being the default. As long as noone uses the new flag, the tasks > behave the same. > All the tasks that do have the flag also behave the same with regard to each > other. > The difference is that a guaranteed task is always higher priority, and > preempts, a speculative task. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs
[ https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17181: Attachment: HIVE-17181.1-branch-2.patch Rebased patch for {{branch-2}}. > HCatOutputFormat should expose complete output-schema (including > partition-keys) for dynamic-partitioning MR jobs > - > > Key: HIVE-17181 > URL: https://issues.apache.org/jira/browse/HIVE-17181 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 2.2.0, 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17181.1-branch-2.patch, HIVE-17181.1.patch, > HIVE-17181.2.patch, HIVE-17181.3.patch > > > Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic > partitioning are expected to call the following API methods: > # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to > write to. This call populates the {{OutputJobInfo}} with details fetched from > the Metastore. > # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data > being written. > It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows: > {code:java} > HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf)); > {code} > Unfortunately, {{getTableSchema()}} returns only the record-schema, not the > entire table's schema. We'll need a better API for use in M/R jobs to get the > complete table-schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs
[ https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17181: Attachment: (was: HIVE-17181.branch-2.patch) > HCatOutputFormat should expose complete output-schema (including > partition-keys) for dynamic-partitioning MR jobs > - > > Key: HIVE-17181 > URL: https://issues.apache.org/jira/browse/HIVE-17181 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 2.2.0, 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17181.1-branch-2.patch, HIVE-17181.1.patch, > HIVE-17181.2.patch, HIVE-17181.3.patch > > > Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic > partitioning are expected to call the following API methods: > # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to > write to. This call populates the {{OutputJobInfo}} with details fetched from > the Metastore. > # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data > being written. > It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows: > {code:java} > HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf)); > {code} > Unfortunately, {{getTableSchema()}} returns only the record-schema, not the > entire table's schema. We'll need a better API for use in M/R jobs to get the > complete table-schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128062#comment-16128062 ] Hive QA commented on HIVE-8472: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12882005/HIVE-8472.1-branch-2.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10588 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=144) org.apache.hadoop.hive.cli.TestSparkCliDriver.org.apache.hadoop.hive.cli.TestSparkCliDriver (batchId=103) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=125) org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228) org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6407/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6407/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6407/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12882005 - PreCommit-HIVE-Build > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0, 2.4.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, > HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17275) Auto-merge fails on writes of UNION ALL output to ORC file with dynamic partitioning
[ https://issues.apache.org/jira/browse/HIVE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128012#comment-16128012 ] Mithun Radhakrishnan commented on HIVE-17275: - Still +1. The tests-failures are the usual suspects (HIVE-15058 + HIVE-16908). > Auto-merge fails on writes of UNION ALL output to ORC file with dynamic > partitioning > > > Key: HIVE-17275 > URL: https://issues.apache.org/jira/browse/HIVE-17275 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 2.2.0 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-17275.2-branch-2.2.patch, > HIVE-17275.2-branch-2.patch, HIVE-17275.2.patch, HIVE-17275-branch-2.2.patch, > HIVE-17275-branch-2.patch, HIVE-17275.patch > > > If dynamic partitioning is used to write the output of UNION or UNION ALL > queries into ORC files with hive.merge.tezfiles=true, the merge step fails as > follows: > {noformat} > 2017-08-08T11:27:19,958 ERROR [e7b1f06d-d632-408a-9dff-f7ae042cd25a main] > SessionState: Vertex failed, vertexName=File Merge, > vertexId=vertex_1502216690354_0001_33_00, diagnostics=[Task failed, > taskId=task_1502216690354_0001_33_00_00, diagnostics=[TaskAttempt 0 > failed, info=[Error: Error while running task ( failure ) : > attempt_1502216690354_0001_33_00_00_0:java.lang.RuntimeException: > java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: Multiple partitions for one merge mapper: > hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1 > NOT EQUAL TO > hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2 > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: > Multiple partitions for one merge mapper: > hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1 > NOT EQUAL TO > hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2 > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:225) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:154) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185) > ... 14 more > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: Multiple partitions for one merge mapper: > hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1 > NOT EQUAL TO > hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2 > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:169) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:216) > ... 16 more > Cau
[jira] [Updated] (HIVE-17225) HoS DPP pruning sink ops can target parallel work objects
[ https://issues.apache.org/jira/browse/HIVE-17225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-17225: --- Status: Patch Available (was: In Progress) > HoS DPP pruning sink ops can target parallel work objects > - > > Key: HIVE-17225 > URL: https://issues.apache.org/jira/browse/HIVE-17225 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 3.0.0 >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE17225.1.patch > > > Setup: > {code:sql} > SET hive.spark.dynamic.partition.pruning=true; > SET hive.strict.checks.cartesian.product=false; > SET hive.auto.convert.join=true; > CREATE TABLE partitioned_table1 (col int) PARTITIONED BY (part_col int); > CREATE TABLE regular_table1 (col int); > CREATE TABLE regular_table2 (col int); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 1); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 2); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 3); > INSERT INTO table regular_table1 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO table regular_table2 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 1) VALUES (1); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 2) VALUES (2); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 3) VALUES (3); > SELECT * > FROM partitioned_table1, >regular_table1 rt1, >regular_table2 rt2 > WHERE rt1.col = partitioned_table1.part_col >AND rt2.col = partitioned_table1.part_col; > {code} > Exception: > {code} > 2017-08-01T13:27:47,483 ERROR [b0d354a8-4cdb-4ba9-acec-27d14926aaf4 main] > ql.Driver: FAILED: Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.FileNotFoundException: File > file:/Users/stakiar/Documents/idea/apache-hive/itests/qtest-spark/target/tmp/scratchdir/stakiar/b0d354a8-4cdb-4ba9-acec-27d14926aaf4/hive_2017-08-01_13-27-45_553_1088589686371686526-1/-mr-10004/3/5 > does not exist > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:498) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) >
[jira] [Work started] (HIVE-17225) HoS DPP pruning sink ops can target parallel work objects
[ https://issues.apache.org/jira/browse/HIVE-17225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-17225 started by Janaki Lahorani. -- > HoS DPP pruning sink ops can target parallel work objects > - > > Key: HIVE-17225 > URL: https://issues.apache.org/jira/browse/HIVE-17225 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 3.0.0 >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE17225.1.patch > > > Setup: > {code:sql} > SET hive.spark.dynamic.partition.pruning=true; > SET hive.strict.checks.cartesian.product=false; > SET hive.auto.convert.join=true; > CREATE TABLE partitioned_table1 (col int) PARTITIONED BY (part_col int); > CREATE TABLE regular_table1 (col int); > CREATE TABLE regular_table2 (col int); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 1); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 2); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 3); > INSERT INTO table regular_table1 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO table regular_table2 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 1) VALUES (1); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 2) VALUES (2); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 3) VALUES (3); > SELECT * > FROM partitioned_table1, >regular_table1 rt1, >regular_table2 rt2 > WHERE rt1.col = partitioned_table1.part_col >AND rt2.col = partitioned_table1.part_col; > {code} > Exception: > {code} > 2017-08-01T13:27:47,483 ERROR [b0d354a8-4cdb-4ba9-acec-27d14926aaf4 main] > ql.Driver: FAILED: Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.FileNotFoundException: File > file:/Users/stakiar/Documents/idea/apache-hive/itests/qtest-spark/target/tmp/scratchdir/stakiar/b0d354a8-4cdb-4ba9-acec-27d14926aaf4/hive_2017-08-01_13-27-45_553_1088589686371686526-1/-mr-10004/3/5 > does not exist > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:498) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.R
[jira] [Updated] (HIVE-17225) HoS DPP pruning sink ops can target parallel work objects
[ https://issues.apache.org/jira/browse/HIVE-17225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-17225: --- Attachment: HIVE17225.1.patch > HoS DPP pruning sink ops can target parallel work objects > - > > Key: HIVE-17225 > URL: https://issues.apache.org/jira/browse/HIVE-17225 > Project: Hive > Issue Type: Sub-task > Components: Spark >Affects Versions: 3.0.0 >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE17225.1.patch > > > Setup: > {code:sql} > SET hive.spark.dynamic.partition.pruning=true; > SET hive.strict.checks.cartesian.product=false; > SET hive.auto.convert.join=true; > CREATE TABLE partitioned_table1 (col int) PARTITIONED BY (part_col int); > CREATE TABLE regular_table1 (col int); > CREATE TABLE regular_table2 (col int); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 1); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 2); > ALTER TABLE partitioned_table1 ADD PARTITION (part_col = 3); > INSERT INTO table regular_table1 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO table regular_table2 VALUES (1), (2), (3), (4), (5), (6); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 1) VALUES (1); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 2) VALUES (2); > INSERT INTO TABLE partitioned_table1 PARTITION (part_col = 3) VALUES (3); > SELECT * > FROM partitioned_table1, >regular_table1 rt1, >regular_table2 rt2 > WHERE rt1.col = partitioned_table1.part_col >AND rt2.col = partitioned_table1.part_col; > {code} > Exception: > {code} > 2017-08-01T13:27:47,483 ERROR [b0d354a8-4cdb-4ba9-acec-27d14926aaf4 main] > ql.Driver: FAILED: Execution Error, return code 3 from > org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.FileNotFoundException: File > file:/Users/stakiar/Documents/idea/apache-hive/itests/qtest-spark/target/tmp/scratchdir/stakiar/b0d354a8-4cdb-4ba9-acec-27d14926aaf4/hive_2017-08-01_13-27-45_553_1088589686371686526-1/-mr-10004/3/5 > does not exist > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.init(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:498) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at > org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:246) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:82) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:82) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246) > at scala.Option.getOrElse(Option.scala:121) > at org.apach
[jira] [Commented] (HIVE-17326) Insert into HBase tables fails if hive.llap.execution.mode is set to only
[ https://issues.apache.org/jira/browse/HIVE-17326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128003#comment-16128003 ] Sergey Shelukhin commented on HIVE-17326: - Likely a duplicate of HIVE-16703 > Insert into HBase tables fails if hive.llap.execution.mode is set to only > - > > Key: HIVE-17326 > URL: https://issues.apache.org/jira/browse/HIVE-17326 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 2.1.0 > Environment: HDP 2.6.x >Reporter: Sailaja Navvluru > > Inserting into a table created using HBase storage handler errors out if > hive.llap.execution.mode=only. Works if the hive.llap.execution.mode value is > none or auto or with MR execution engine. > Simple repro script > CREATE TABLE hbase_table_sai(id int, name string) > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' > WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name") > TBLPROPERTIES ("hbase.table.name" = "sai"); > create table hive_tab1(c1 int, c2 string); > insert into hive_tab1 values(1,'abc'); > 0: jdbc:hive2://localhost:10500/default> insert overwrite table > hbase_table_sai select * from hive_tab1; > INFO : Compiling > command(queryId=hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a): > insert overwrite table hbase_table_sai select * from hive_tab1 > INFO : We are setting the hadoop caller context from > HIVE_SSN_ID:7114abad-2ba2-410d-ad73-40d473a647af to > hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: > Schema(fieldSchemas:[FieldSchema(name:hive_tab1.c1, type:int, comment:null), > FieldSchema(name:hive_tab1.c2, type:string, comment:null)], properties:null) > INFO : Completed compiling > command(queryId=hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a); > Time taken: 0.36 seconds > INFO : We are resetting the hadoop caller context to > HIVE_SSN_ID:7114abad-2ba2-410d-ad73-40d473a647af > INFO : Concurrency mode is disabled, not creating a lock manager > INFO : Setting caller context to query id > hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a > INFO : Executing > command(queryId=hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a): > insert overwrite table hbase_table_sai select * from hive_tab1 > INFO : Query ID = hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a > INFO : Total jobs = 1 > INFO : Starting task [Stage-0:DDL] in serial mode > INFO : Starting task [Stage-1:DDL] in serial mode > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-3:MAPRED] in serial mode > INFO : Session is already open > INFO : Tez session missing resources, adding additional necessary resources > INFO : Dag name: insert overwrite table hbase_tab...hive_tab1(Stage-3) > INFO : Dag submit failed due to There is conflicting local resource > (guava-14.0.1.jar) between dag local resource and vertex Map 1 local resource. > Resource of dag : resource { scheme: "hdfs" host: "ulcer1" port: 8020 file: > "/tmp/hive/hive/7114abad-2ba2-410d-ad73-40d473a647af/hive_2017-08-08_12-54-31_225_8109820757632121978-7/hive/_tez_scratch_dir/guava-14.0.1.jar" > } size: 2189117 timestamp: 150072247 type: FILE visibility: PRIVATE > Resource of vertex: resource { scheme: "hdfs" host: "ulcer1" port: 8020 file: > "/tmp/hive/hive/_tez_session_dir/8a93f7fd-b925-4684-a6b1-6561b5c8e344/guava-14.0.1.jar" > } size: 2189117 timestamp: 1502211657919 type: FILE visibility: PRIVATE > stack trace: [org.apache.tez.dag.api.DAG.verify(DAG.java:695), > org.apache.tez.dag.api.DAG.createDag(DAG.java:796), > org.apache.tez.client.TezClientUtils.prepareAndCreateDAGPlan(TezClientUtils.java:718), > org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:555), > org.apache.tez.client.TezClient.submitDAG(TezClient.java:522), > org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:506), > org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:188), > org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197), > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100), > org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1905), > org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1607), > org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1354), > org.apache.hadoop.hive.ql.Driver.run(Driver.java:1123), > org.apache.hadoop.hive.ql.Driver.run(Driver.java:1116), > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:242), > > org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91), > > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:334), > java.security.AccessController.doPriv
[jira] [Assigned] (HIVE-17329) ensure acid side file is not overwritten
[ https://issues.apache.org/jira/browse/HIVE-17329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-17329: - > ensure acid side file is not overwritten > > > Key: HIVE-17329 > URL: https://issues.apache.org/jira/browse/HIVE-17329 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Fix For: 3.0.0 > > > OrcRecordUpdater() has > {noformat} > flushLengths = fs.create(OrcAcidUtils.getSideFile(this.path), true, 8, > options.getReporter()); > {noformat} > this should be the only place where the side file is created but to be safe > we should set "overwrite" parameter to false. If this file already exists > that means there are 2 OrcRecordUpdates trying to write the same (primary) > file - never ok. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification
[ https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127989#comment-16127989 ] Chris Drome commented on HIVE-13989: [~vgumashta], I've done a bunch of testing and rewriting the unittests to ensure they are testing the correct things. I've incorporated your comments about permissions on OTHER getting converted to none. However, your first comment will not work. The problem is that data gets written to a temp directory relative to the table root and then moved to the final location. So the data in the temp directory will inherit permissions/acls from the table directory, which might be different from that of the destination. {{FolderPermissionBase.testInsertSingleDynamicPartition}} tests this use case. Without the additional {{setfacl}} call after the move, the part file acls are in an inconsistent state relative to the parent (partition) directory. I'm in the middle of cleaning things up, so I should have a new patch to review shortly. > Extended ACLs are not handled according to specification > > > Key: HIVE-13989 > URL: https://issues.apache.org/jira/browse/HIVE-13989 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1, 2.0.0 >Reporter: Chris Drome >Assignee: Chris Drome > Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, > HIVE-13989-branch-1.patch, HIVE-13989-branch-2.2.patch, > HIVE-13989-branch-2.2.patch, HIVE-13989-branch-2.2.patch > > > Hive takes two approaches to working with extended ACLs depending on whether > data is being produced via a Hive query or HCatalog APIs. A Hive query will > run an FsShell command to recursively set the extended ACLs for a directory > sub-tree. HCatalog APIs will attempt to build up the directory sub-tree > programmatically and runs some code to set the ACLs to match the parent > directory. > Some incorrect assumptions were made when implementing the extended ACLs > support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the > design documents of extended ACLs in HDFS. These documents model the > implementation after the POSIX implementation on Linux, which can be found at > http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html. > The code for setting extended ACLs via HCatalog APIs is found in > HdfsUtils.java: > {code} > if (aclEnabled) { > aclStatus = sourceStatus.getAclStatus(); > if (aclStatus != null) { > LOG.trace(aclStatus.toString()); > aclEntries = aclStatus.getEntries(); > removeBaseAclEntries(aclEntries); > //the ACL api's also expect the tradition user/group/other permission > in the form of ACL > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, > sourcePerm.getUserAction())); > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, > sourcePerm.getGroupAction())); > aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, > sourcePerm.getOtherAction())); > } > } > {code} > We found that DEFAULT extended ACL rules were not being inherited properly by > the directory sub-tree, so the above code is incomplete because it > effectively drops the DEFAULT rules. The second problem is with the call to > {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended > ACLs. When extended ACLs are used the GROUP permission is replaced with the > extended ACL mask. So the above code will apply the wrong permissions to the > GROUP. Instead the correct GROUP permissions now need to be pulled from the > AclEntry as returned by {{getAclStatus().getEntries()}}. See the > implementation of the new method {{getDefaultAclEntries}} for details. > Similar issues exist with the HCatalog API. None of the API accounts for > setting extended ACLs on the directory sub-tree. The changes to the HCatalog > API allow the extended ACLs to be passed into the required methods similar to > how basic permissions are passed in. When building the directory sub-tree the > extended ACLs of the table directory are inherited by all sub-directories, > including the DEFAULT rules. > Replicating the problem: > Create a table to write data into (I will use acl_test as the destination and > words_text as the source) and set the ACLs as follows: > {noformat} > $ hdfs dfs -setfacl -m > default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx > /user/cdrome/hive/acl_test > $ hdfs dfs -ls -d /user/cdrome/hive/acl_test > drwxrwx---+ - cdrome hdfs 0 2016-07-13 20:36 > /user/cdrome/hive/acl_test > $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test > # file: /user/cdrome/hive/acl_test > # owner: cdrome > # group: hdfs > user::r
[jira] [Commented] (HIVE-17012) ACID Table: Number of reduce tasks should be computed correctly when sort.dynamic.partition is enabled
[ https://issues.apache.org/jira/browse/HIVE-17012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127983#comment-16127983 ] Eugene Koifman commented on HIVE-17012: --- Not sure if this is related but AbstractCorrelationProcCtx sets hive.optimize.reducededuplication.min.reduce=1 for acid > ACID Table: Number of reduce tasks should be computed correctly when > sort.dynamic.partition is enabled > -- > > Key: HIVE-17012 > URL: https://issues.apache.org/jira/browse/HIVE-17012 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Rajesh Balamohan > Labels: performance > Attachments: plan.txt > > > {code} > Map 1: 446/446 Reducer 2: 2/2 Reducer 3: 2/2 > -- > Compile Query 0.24s > Prepare Plan0.35s > Submit Plan 0.18s > Start DAG 0.21s > Run DAG 32332.27s > -- > Task Execution Summary > -- > VERTICES DURATION(ms) CPU_TIME(ms)GC_TIME(ms) INPUT_RECORDS > OUTPUT_RECORDS > -- > Map 11390343.00 0 0 2,879,987,999 > 2,879,987,999 > Reducer 2 31281225.00 0 0 2,750,387,156 > 0 > Reducer 3 751498.00 0 0 129,600,843 > 0 > -- > {code} > Time taken: 32438.42 seconds to insert <3B rows with > {code} > create table store_sales > ( > ss_sold_time_sk bigint, > ss_item_skbigint, > ss_customer_skbigint, > ss_cdemo_sk bigint, > ss_hdemo_sk bigint, > ss_addr_skbigint, > ss_store_sk bigint, > ss_promo_sk bigint, > ss_ticket_number bigint, > ss_quantity int, > ss_wholesale_cost double, > ss_list_price double, > ss_sales_pricedouble, > ss_ext_discount_amt double, > ss_ext_sales_pricedouble, > ss_ext_wholesale_cost double, > ss_ext_list_price double, > ss_ext_taxdouble, > ss_coupon_amt double, > ss_net_paid double, > ss_net_paid_inc_tax double, > ss_net_profit double > ) > partitioned by (ss_sold_date_sk bigint) > CLUSTERED BY (ss_ticket_number) INTO 2 BUCKETS > STORED AS ORC > TBLPROPERTIES ('transactional'='true', 'transactional_properties'='default') > ; > from tpcds_text_1000.store_sales ss > insert into table store_sales partition (ss_sold_date_sk) > select > ss.ss_sold_time_sk, > ss.ss_item_sk, > ss.ss_customer_sk, > ss.ss_cdemo_sk, > ss.ss_hdemo_sk, > ss.ss_addr_sk, > ss.ss_store_sk, > ss.ss_promo_sk, > ss.ss_ticket_number, > ss.ss_quantity, > ss.ss_wholesale_cost, > ss.ss_list_price, > ss.ss_sales_price, > ss.ss_ext_discount_amt, > ss.ss_ext_sales_price, > ss.ss_ext_wholesale_cost, > ss.ss_ext_list_price, > ss.ss_ext_tax, > ss.ss_coupon_amt, > ss.ss_net_paid, > ss.ss_net_paid_inc_tax, > ss.ss_net_profit, > ss.ss_sold_date_sk > where ss.ss_sold_date_sk is not null > insert into table store_sales partition (ss_sold_date_sk) > select > ss.ss_sold_time_sk, > ss.ss_item_sk, > ss.ss_customer_sk, > ss.ss_cdemo_sk, > ss.ss_hdemo_sk, > ss.ss_addr_sk, > ss.ss_store_sk, > ss.ss_promo_sk, > ss.ss_ticket_number, > ss.ss_quantity, > ss.ss_wholesale_cost, > ss.ss_list_price, > ss.ss_sales_price, > ss.ss_ext_discount_amt, > ss.ss_ext_sales_price, > ss.ss_ext_wholesale_cost, > ss.ss_ext_list_price, > ss.ss_ext_tax, > ss.ss_coupon_amt, > ss.ss_net_paid, > ss.ss_net_paid_inc_tax, > ss.ss_net_profit, > ss.ss_sold_date_sk > where ss.ss_sold_date_sk is null > ; > {code} > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17089) make acid 2.0 the default
[ https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127981#comment-16127981 ] Hive QA commented on HIVE-17089: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12882001/HIVE-17089.16.patch {color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10974 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6406/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6406/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6406/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12882001 - PreCommit-HIVE-Build > make acid 2.0 the default > - > > Key: HIVE-17089 > URL: https://issues.apache.org/jira/browse/HIVE-17089 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, > HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, > HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch, > HIVE-17089.12.patch, HIVE-17089.13.patch, HIVE-17089.14.patch, > HIVE-17089.15.patch, HIVE-17089.16.patch > > > acid 2.0 is introduced in HIVE-14035. It replaces Update events with a > combination of Delete + Insert events. This now makes U=D+I the default (and > only) supported acid table type in Hive 3.0. > The expectation for upgrade is that Major compaction has to be run on all > acid tables in the existing Hive cluster and that no new writes to these > table take place since the start of compaction (Need to add a mechanism to > put a table in read-only mode - this way it can still be read while it's > being compacted). Then upgrade to Hive 3.0 can take place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17089) make acid 2.0 the default
[ https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127972#comment-16127972 ] Sergey Shelukhin commented on HIVE-17089: - +1 > make acid 2.0 the default > - > > Key: HIVE-17089 > URL: https://issues.apache.org/jira/browse/HIVE-17089 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, > HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, > HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch, > HIVE-17089.12.patch, HIVE-17089.13.patch, HIVE-17089.14.patch, > HIVE-17089.15.patch, HIVE-17089.16.patch > > > acid 2.0 is introduced in HIVE-14035. It replaces Update events with a > combination of Delete + Insert events. This now makes U=D+I the default (and > only) supported acid table type in Hive 3.0. > The expectation for upgrade is that Major compaction has to be run on all > acid tables in the existing Hive cluster and that no new writes to these > table take place since the start of compaction (Need to add a mechanism to > put a table in read-only mode - this way it can still be read while it's > being compacted). Then upgrade to Hive 3.0 can take place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17328) Remove special handling for Acid tables wherever possible
[ https://issues.apache.org/jira/browse/HIVE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17328: -- Description: There are various places in the code that do something like {noformat} if(acid update or delete) { do something } else { do something else } {noformat} this complicates the code and makes it so that acid code path is not properly tested in many new non-acid features or bug fixes. Some work to simplify this was done in HIVE-15844. _SortedDynPartitionOptimizer_ has some special logic _ReduceSinkOperator_ relies on partitioning columns for update/delete be _UDFToInteger(RecordIdentifier)_ which is set up in _SemanticAnalyzer_. Consequently _SemanticAnalyzer_ has special logic to set it up. _FileSinkOperator_ has some specialization. _AbstractCorrelationProcCtx_ makes changes specific to acid writes setting hive.optimize.reducededuplication.min.reducer=1 With acid 2.0 (HIVE-17089) a lot more of it can simplified/removed. Generally, Acid Insert follows the same code path as regular insert except that the writer in _FileSinkOperator_ is Acid specific. So all the specialization is to route Update/Delete events to the right place. We can do the U=D+I early in the operator pipeline so that an Update is a Hive multi-insert with 1 leg being the Insert leg and the other being the Delete leg (like Merge stmt). The Delete events themselves don't need to be routed in any particular way if we always ship all delete_delta files for each split. This is ok since delete events are very small and highly compressible. What is shipped is independent of what needs to be loaded into memory. This would allow removing almost all special code paths. If need be we can also have the compactor rewrite the delete files so that the name of the file matches the contents and make it as if they were bucketed properly and use it reduce what needs to be shipped for each split. This may help with some extreme cases where someone updates 1B rows. was: There are various places in the code that do something like if(acid update or delete) { do something } else { do something else } this complicates the code and makes it so that acid code path is not properly tested in many new non-acid features or bug fixes. Some work to simplify this was done in HIVE-15844. SortedDynPartitionOptimizer has some special logic ReduceSinkOperator relies on partitioning columns for update/delete be UDFToInteger(RecordIdentifier) which is set up in SemanticAnalyzer. Consequently SemanticAnalyzer has special logic to set it up. FileSinkOperator has some specialization. AbstractCorrelationProcCtx makes changes specific to acid writes setting hive.optimize.reducededuplication.min.reducer=1 With acid 2.0 (HIVE-17089) a lot more of it can simplified/removed. Generally, Acid Insert follows the same code path as regular insert except that the writer in FileSinkOperator is Acid specific. So all the specialization is to route Update/Delete events to the right place. We can do the U=D+I early in the operator pipeline so that an Update is a Hive multi-insert with 1 leg being the Insert leg and the other being the Delete leg (like Merge stmt). The Delete events themselves don't need to be routed in any particular way if we always ship all delete_delta files for each split. This is ok since delete events are very small and highly compressible. What is shipped is independent of what needs to be loaded into memory. This would allow removing almost all special code paths. If need be we can also have the compactor rewrite the delete files so that the name of the file matches the contents and make it as if they were bucketed properly and use it reduce what needs to be shipped for each split. This may help with some extreme cases where someone updates 1B rows. > Remove special handling for Acid tables wherever possible > - > > Key: HIVE-17328 > URL: https://issues.apache.org/jira/browse/HIVE-17328 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > There are various places in the code that do something like > {noformat} > if(acid update or delete) { > do something > } > else { > do something else > } > {noformat} > this complicates the code and makes it so that acid code path is not properly > tested in many new non-acid features or bug fixes. > Some work to simplify this was done in HIVE-15844. > _SortedDynPartitionOptimizer_ has some special logic > _ReduceSinkOperator_ relies on partitioning columns for update/delete be > _UDFToInteger(RecordIdentifier)_ which is set up in _SemanticAnalyzer_. > Consequently _SemanticAnalyzer_ has special logic to set it up. > _FileSinkOperator_ has some specializat
[jira] [Assigned] (HIVE-17328) Remove special handling for Acid tables wherever possible
[ https://issues.apache.org/jira/browse/HIVE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-17328: - > Remove special handling for Acid tables wherever possible > - > > Key: HIVE-17328 > URL: https://issues.apache.org/jira/browse/HIVE-17328 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > There are various places in the code that do something like > if(acid update or delete) { > do something > } > else { > do something else > } > this complicates the code and makes it so that acid code path is not properly > tested in many new non-acid features or bug fixes. > Some work to simplify this was done in HIVE-15844. > SortedDynPartitionOptimizer has some special logic > ReduceSinkOperator relies on partitioning columns for update/delete be > UDFToInteger(RecordIdentifier) which is set up in SemanticAnalyzer. > Consequently SemanticAnalyzer has special logic to set it up. > FileSinkOperator has some specialization. > AbstractCorrelationProcCtx makes changes specific to acid writes setting > hive.optimize.reducededuplication.min.reducer=1 > With acid 2.0 (HIVE-17089) a lot more of it can simplified/removed. > Generally, Acid Insert follows the same code path as regular insert except > that the writer in FileSinkOperator is Acid specific. > So all the specialization is to route Update/Delete events to the right place. > We can do the U=D+I early in the operator pipeline so that an Update is a > Hive multi-insert with 1 leg being the Insert leg and the other being the > Delete leg (like Merge stmt). > The Delete events themselves don't need to be routed in any particular way if > we always ship all delete_delta files for each split. This is ok since > delete events are very small and highly compressible. What is shipped is > independent of what needs to be loaded into memory. > This would allow removing almost all special code paths. > If need be we can also have the compactor rewrite the delete files so that > the name of the file matches the contents and make it as if they were > bucketed properly and use it reduce what needs to be shipped for each split. > This may help with some extreme cases where someone updates 1B rows. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127964#comment-16127964 ] Sergey Shelukhin commented on HIVE-17006: - * The fix is not specific to this patch. I noticed it while working on the patch. * Uncopyfying is implied in HIVE-15665, otherwise class names/etc. would collide so it won't be committable without that. BB put will also be added there. * Which error handling? > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.01.patch, HIVE-17006.02.patch, > HIVE-17006.patch, HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17256) add a notion of a guaranteed task to LLAP
[ https://issues.apache.org/jira/browse/HIVE-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127959#comment-16127959 ] Sergey Shelukhin commented on HIVE-17256: - [~sseth] ping? For the scheduler tests, see the next patch > add a notion of a guaranteed task to LLAP > - > > Key: HIVE-17256 > URL: https://issues.apache.org/jira/browse/HIVE-17256 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17256.01.patch, HIVE-17256.patch > > > Tasks are basically on two levels, guaranteed and speculative, with > speculative being the default. As long as noone uses the new flag, the tasks > behave the same. > All the tasks that do have the flag also behave the same with regard to each > other. > The difference is that a guaranteed task is always higher priority, and > preempts, a speculative task. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used
[ https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17327: Summary: LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions when HDFS federation is used (was: LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions with HDFS federation) > LLAP IO: restrict native file ID usage to default FS to avoid hypothetical > collisions when HDFS federation is used > -- > > Key: HIVE-17327 > URL: https://issues.apache.org/jira/browse/HIVE-17327 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-17327.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetiocal collisions with HDFS federation
[ https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17327: Status: Patch Available (was: Open) > LLAP IO: restrict native file ID usage to default FS to avoid hypothetiocal > collisions with HDFS federation > --- > > Key: HIVE-17327 > URL: https://issues.apache.org/jira/browse/HIVE-17327 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-17327.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetiocal collisions with HDFS federation
[ https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17327: Attachment: HIVE-17327.patch The patch. [~gopalv] can you take a look? > LLAP IO: restrict native file ID usage to default FS to avoid hypothetiocal > collisions with HDFS federation > --- > > Key: HIVE-17327 > URL: https://issues.apache.org/jira/browse/HIVE-17327 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-17327.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions with HDFS federation
[ https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17327: Summary: LLAP IO: restrict native file ID usage to default FS to avoid hypothetical collisions with HDFS federation (was: LLAP IO: restrict native file ID usage to default FS to avoid hypothetiocal collisions with HDFS federation) > LLAP IO: restrict native file ID usage to default FS to avoid hypothetical > collisions with HDFS federation > -- > > Key: HIVE-17327 > URL: https://issues.apache.org/jira/browse/HIVE-17327 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-17327.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17327) LLAP IO: restrict native file ID usage to default FS to avoid hypothetiocal collisions with HDFS federation
[ https://issues.apache.org/jira/browse/HIVE-17327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-17327: --- > LLAP IO: restrict native file ID usage to default FS to avoid hypothetiocal > collisions with HDFS federation > --- > > Key: HIVE-17327 > URL: https://issues.apache.org/jira/browse/HIVE-17327 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17325) Clean up intermittently failing uni tests
[ https://issues.apache.org/jira/browse/HIVE-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127894#comment-16127894 ] Alan Gates commented on HIVE-17325: --- In the last 10 CI runs, the following tests have failed: * TestBeeLineDriver.testCliDriver.insert_overwrite_local_directory_1 6 times * TestCliDriver.testCliDriver.union36 3 times * TestMiniLlapCliDriver.testCliDriver.orc_ppd_basic 3 times * TestMiniLlapLocalCliDriver.testCliDriver.vector_if_expr 3 times * TestPerfCliDriver.testCliDriver.query14 7 times * TestPerfCliDriver.testCliDriver.query16 3 times * TestPerfCliDriver.testCliDriver.query23 5 times * TestPerfCliDriver.testCliDriver.query94 3 times * TestBlobstoreCliDriver.testCliDriver.insert_overwrite_dynamic_partitions_merge_move 6 times * TestBlobstoreCliDriver.testCliDriver.insert_overwrite_dynamic_partitions_merge_only 6 times * TestBlobstoreCliDriver.testCliDriver.insert_overwrite_dynamic_partitions_move_only 6 times * TestMiniSparkOnYarnCliDriver.testCliDriver.spark_dynamic_partition_pruning_mapjoin_only 6 times * TestMiniSparkOnYarnCliDriver.testCliDriver.spark_vectorized_dynamic_partition_pruning 7 times * TestHCatClient.testPartitionRegistrationWithCustomSchema 7 times * TestHCatClient.testPartitionSpecRegistrationWithCustomSchema 7 times * TestHCatClient.testTableSchemaPropagation 7 times All of these should be disabled until the reason for their flakiness can be determined. > Clean up intermittently failing uni tests > - > > Key: HIVE-17325 > URL: https://issues.apache.org/jira/browse/HIVE-17325 > Project: Hive > Issue Type: Test > Components: Tests >Reporter: Alan Gates >Assignee: Alan Gates > > We have a number of intermittently failing tests. I propose to disable these > so that we can get clean (or at least cleaner) CI runs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127879#comment-16127879 ] Hive QA commented on HIVE-17308: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881980/HIVE-17308.7.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11010 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=240) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=240) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6405/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6405/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6405/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881980 - PreCommit-HIVE-Build > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17326) Insert into HBase tables fails if hive.llap.execution.mode is set to only
[ https://issues.apache.org/jira/browse/HIVE-17326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sailaja Navvluru updated HIVE-17326: Description: Inserting into a table created using HBase storage handler errors out if hive.llap.execution.mode=only. Works if the hive.llap.execution.mode value is none or auto or with MR execution engine. Simple repro script CREATE TABLE hbase_table_sai(id int, name string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:name") TBLPROPERTIES ("hbase.table.name" = "sai"); create table hive_tab1(c1 int, c2 string); insert into hive_tab1 values(1,'abc'); 0: jdbc:hive2://localhost:10500/default> insert overwrite table hbase_table_sai select * from hive_tab1; INFO : Compiling command(queryId=hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a): insert overwrite table hbase_table_sai select * from hive_tab1 INFO : We are setting the hadoop caller context from HIVE_SSN_ID:7114abad-2ba2-410d-ad73-40d473a647af to hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a INFO : Semantic Analysis Completed INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:hive_tab1.c1, type:int, comment:null), FieldSchema(name:hive_tab1.c2, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a); Time taken: 0.36 seconds INFO : We are resetting the hadoop caller context to HIVE_SSN_ID:7114abad-2ba2-410d-ad73-40d473a647af INFO : Concurrency mode is disabled, not creating a lock manager INFO : Setting caller context to query id hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a INFO : Executing command(queryId=hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a): insert overwrite table hbase_table_sai select * from hive_tab1 INFO : Query ID = hive_20170808125431_652dbcde-96d5-4afd-9359-bd71bfd6b01a INFO : Total jobs = 1 INFO : Starting task [Stage-0:DDL] in serial mode INFO : Starting task [Stage-1:DDL] in serial mode INFO : Launching Job 1 out of 1 INFO : Starting task [Stage-3:MAPRED] in serial mode INFO : Session is already open INFO : Tez session missing resources, adding additional necessary resources INFO : Dag name: insert overwrite table hbase_tab...hive_tab1(Stage-3) INFO : Dag submit failed due to There is conflicting local resource (guava-14.0.1.jar) between dag local resource and vertex Map 1 local resource. Resource of dag : resource { scheme: "hdfs" host: "ulcer1" port: 8020 file: "/tmp/hive/hive/7114abad-2ba2-410d-ad73-40d473a647af/hive_2017-08-08_12-54-31_225_8109820757632121978-7/hive/_tez_scratch_dir/guava-14.0.1.jar" } size: 2189117 timestamp: 150072247 type: FILE visibility: PRIVATE Resource of vertex: resource { scheme: "hdfs" host: "ulcer1" port: 8020 file: "/tmp/hive/hive/_tez_session_dir/8a93f7fd-b925-4684-a6b1-6561b5c8e344/guava-14.0.1.jar" } size: 2189117 timestamp: 1502211657919 type: FILE visibility: PRIVATE stack trace: [org.apache.tez.dag.api.DAG.verify(DAG.java:695), org.apache.tez.dag.api.DAG.createDag(DAG.java:796), org.apache.tez.client.TezClientUtils.prepareAndCreateDAGPlan(TezClientUtils.java:718), org.apache.tez.client.TezClient.submitDAGSession(TezClient.java:555), org.apache.tez.client.TezClient.submitDAG(TezClient.java:522), org.apache.hadoop.hive.ql.exec.tez.TezTask.submit(TezTask.java:506), org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:188), org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197), org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100), org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1905), org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1607), org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1354), org.apache.hadoop.hive.ql.Driver.run(Driver.java:1123), org.apache.hadoop.hive.ql.Driver.run(Driver.java:1116), org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:242), org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91), org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:334), java.security.AccessController.doPrivileged(Native Method), javax.security.auth.Subject.doAs(Subject.java:422), org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866), org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:348), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), java.util.concurrent.FutureTask.run(FutureTask.java:266), java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149), java.util.concurrent.ThreadPoolExecutor$Worker.run(
[jira] [Updated] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()
[ https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-17169: Attachment: HIVE-17169.1-branch-2.patch Patch for {{branch-2}}. > Avoid extra call to KeyProvider::getMetadata() > -- > > Key: HIVE-17169 > URL: https://issues.apache.org/jira/browse/HIVE-17169 > Project: Hive > Issue Type: Bug > Components: Shims >Affects Versions: 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17169.1-branch-2.patch, HIVE-17169.1.patch > > > Here's the code from {{Hadoop23Shims}}: > {code:title=Hadoop23Shims.java|borderStyle=solid} > @Override > public int comparePathKeyStrength(Path path1, Path path2) throws > IOException { > EncryptionZone zone1, zone2; > zone1 = hdfsAdmin.getEncryptionZoneForPath(path1); > zone2 = hdfsAdmin.getEncryptionZoneForPath(path2); > if (zone1 == null && zone2 == null) { > return 0; > } else if (zone1 == null) { > return -1; > } else if (zone2 == null) { > return 1; > } > return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName()); > } > private int compareKeyStrength(String keyname1, String keyname2) throws > IOException { > KeyProvider.Metadata meta1, meta2; > if (keyProvider == null) { > throw new IOException("HDFS security key provider is not configured > on your server."); > } > meta1 = keyProvider.getMetadata(keyname1); > meta2 = keyProvider.getMetadata(keyname2); > if (meta1.getBitLength() < meta2.getBitLength()) { > return -1; > } else if (meta1.getBitLength() == meta2.getBitLength()) { > return 0; > } else { > return 1; > } > } > } > {code} > It turns out that {{EncryptionZone}} already has the cipher's bit-length > stored in a member variable. One shouldn't need an additional name-node call > ({{KeyProvider::getMetadata()}}) only to fetch it again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17325) Clean up intermittently failing uni tests
[ https://issues.apache.org/jira/browse/HIVE-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned HIVE-17325: - > Clean up intermittently failing uni tests > - > > Key: HIVE-17325 > URL: https://issues.apache.org/jira/browse/HIVE-17325 > Project: Hive > Issue Type: Test > Components: Tests >Reporter: Alan Gates >Assignee: Alan Gates > > We have a number of intermittently failing tests. I propose to disable these > so that we can get clean (or at least cleaner) CI runs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17214) check/fix conversion of non-acid to acid
[ https://issues.apache.org/jira/browse/HIVE-17214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127857#comment-16127857 ] Eugene Koifman commented on HIVE-17214: --- Currently in HIVE-17205 conversion is blocked in _TransactionalValidationListener.conformToAcid()_ > check/fix conversion of non-acid to acid > > > Key: HIVE-17214 > URL: https://issues.apache.org/jira/browse/HIVE-17214 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > bucketed tables have stricter rules for file layout on disk - bucket files > are direct children of a partition directory. > for un-bucketed tables I'm not sure there are any rules > for example, CTAS with Tez + Union operator creates 1 directory for each leg > of the union > Supposedly Hive can read table by picking all files recursively. > Can it also write (other than CTAS example above) arbitrarily? > Does it mean Acid write can also write anywhere? > Figure out what can be supported and how can existing layout can be checked? > Examining a full "ls -l -R" for a large table could be expensive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127840#comment-16127840 ] Mithun Radhakrishnan commented on HIVE-8472: P.S. I have [updated the documentation|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Alter/UseDatabase] as per instruction. > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0, 2.4.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, > HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Affects Version/s: 2.4.0 > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0, 2.4.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, > HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Status: Patch Available (was: Open) > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, > HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Attachment: HIVE-8472.1-branch-2.patch Patch for branch-2. > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, > HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8472) Add ALTER DATABASE SET LOCATION
[ https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mithun Radhakrishnan updated HIVE-8472: --- Target Version/s: 2.4.0 Status: Open (was: Patch Available) Resubmitting for branch-2. > Add ALTER DATABASE SET LOCATION > --- > > Key: HIVE-8472 > URL: https://issues.apache.org/jira/browse/HIVE-8472 > Project: Hive > Issue Type: Improvement > Components: Database/Schema >Affects Versions: 2.2.0, 3.0.0 >Reporter: Jeremy Beard >Assignee: Mithun Radhakrishnan > Attachments: HIVE-8472.1-branch-2.patch, HIVE-8472.1.patch, > HIVE-8472.3.patch > > > Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there > was an equivalent for databases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17089) make acid 2.0 the default
[ https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17089: -- Attachment: HIVE-17089.16.patch > make acid 2.0 the default > - > > Key: HIVE-17089 > URL: https://issues.apache.org/jira/browse/HIVE-17089 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, > HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, > HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch, > HIVE-17089.12.patch, HIVE-17089.13.patch, HIVE-17089.14.patch, > HIVE-17089.15.patch, HIVE-17089.16.patch > > > acid 2.0 is introduced in HIVE-14035. It replaces Update events with a > combination of Delete + Insert events. This now makes U=D+I the default (and > only) supported acid table type in Hive 3.0. > The expectation for upgrade is that Major compaction has to be run on all > acid tables in the existing Hive cluster and that no new writes to these > table take place since the start of compaction (Need to add a mechanism to > put a table in read-only mode - this way it can still be read while it's > being compacted). Then upgrade to Hive 3.0 can take place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17089) make acid 2.0 the default
[ https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127768#comment-16127768 ] Eugene Koifman commented on HIVE-17089: --- patch 16 - address RB comments > make acid 2.0 the default > - > > Key: HIVE-17089 > URL: https://issues.apache.org/jira/browse/HIVE-17089 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, > HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, > HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch, > HIVE-17089.12.patch, HIVE-17089.13.patch, HIVE-17089.14.patch, > HIVE-17089.15.patch, HIVE-17089.16.patch > > > acid 2.0 is introduced in HIVE-14035. It replaces Update events with a > combination of Delete + Insert events. This now makes U=D+I the default (and > only) supported acid table type in Hive 3.0. > The expectation for upgrade is that Major compaction has to be run on all > acid tables in the existing Hive cluster and that no new writes to these > table take place since the start of compaction (Need to add a mechanism to > put a table in read-only mode - this way it can still be read while it's > being compacted). Then upgrade to Hive 3.0 can take place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17316) Use String.contains for the hidden configuration variables
[ https://issues.apache.org/jira/browse/HIVE-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127746#comment-16127746 ] Hive QA commented on HIVE-17316: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881966/HIVE-17316.02.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11009 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=240) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=100) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[set_hiveconf_internal_variable0] (batchId=89) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[set_hiveconf_internal_variable1] (batchId=89) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6404/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6404/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6404/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881966 - PreCommit-HIVE-Build > Use String.contains for the hidden configuration variables > -- > > Key: HIVE-17316 > URL: https://issues.apache.org/jira/browse/HIVE-17316 > Project: Hive > Issue Type: Sub-task >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17316.01.patch, HIVE-17316.02.patch > > > Currently HiveConf variables which should not be displayed to the user need > to be enumerated. We should enhance this to be able to hide configuration > variables by substring not just full equality. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17289) EXPORT and IMPORT shouldn't perform distcp with doAs privileged user.
[ https://issues.apache.org/jira/browse/HIVE-17289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-17289: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. Patch pushed to master. > EXPORT and IMPORT shouldn't perform distcp with doAs privileged user. > - > > Key: HIVE-17289 > URL: https://issues.apache.org/jira/browse/HIVE-17289 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, repl >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Export, Import, replication > Fix For: 3.0.0 > > Attachments: HIVE-17289.01.patch > > > Currently, EXPORT uses distcp to dump data files to dump directory and IMPORT > uses distcp to copy the larger files/large number of files from dump > directory to table staging directory. But, this copy fails as distcp is > always done with doAs user specified in hive.distcp.privileged.doAs, which is > "hdfs' by default. > Need to remove usage of doAs user when try to distcp from EXPORT/IMPORT flow. > Privileged user based distcp should be done only for REPL DUMP/LOAD commands. > Also, need to set the default config for hive.distcp.privileged.doAs to > "hive" as "hdfs" super-user is never allowed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17296) Acid tests with multiple splits
[ https://issues.apache.org/jira/browse/HIVE-17296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127719#comment-16127719 ] Eugene Koifman commented on HIVE-17296: --- ORC-228 is in ORC 1.5. Note that MemoryManager is a ThreadLocal so changing this property may affect other tests. See if this will actually work before backporting > Acid tests with multiple splits > --- > > Key: HIVE-17296 > URL: https://issues.apache.org/jira/browse/HIVE-17296 > Project: Hive > Issue Type: Test > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > > data files in an Acid table are ORC files which may have multiple stripes > for such files in base/ or delta/ (and original files with non acid to acid > conversion) are split by OrcInputFormat into multiple (stripe sized) chunks. > There is additional logic in in OrcRawRecordMerger > (discoverKeyBounds/discoverOriginalKeyBounds) that is not tested by any E2E > tests since none of the have enough data to generate multiple stripes in a > single file. > testRecordReaderOldBaseAndDelta/testRecordReaderNewBaseAndDelta/testOriginalReaderPair > in TestOrcRawRecordMerger has some logic to test this but it really needs e2e > tests. > With ORC-228 it will be possible to write such tests. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17065) You can not successfully deploy hive clusters with Hive guidance documents
[ https://issues.apache.org/jira/browse/HIVE-17065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127690#comment-16127690 ] Xuefu Zhang commented on HIVE-17065: Sorry for replying late on this, but [~linzhangbing], are you able to modify the wiki now? > You can not successfully deploy hive clusters with Hive guidance documents > -- > > Key: HIVE-17065 > URL: https://issues.apache.org/jira/browse/HIVE-17065 > Project: Hive > Issue Type: Improvement > Components: Documentation >Reporter: ZhangBing Lin >Priority: Minor > Attachments: screenshot-1.png > > > When I follow the official document from cwiki > [https://cwiki.apache.org/confluence/display/Hive/GettingStarted] to build > Hive2.1.1 single node service encountered several problems:: > 1, the following to create the HIVE warehouse directory needs to be modified > A $ HADOOP_HOME / bin / hadoop fs -mkdir /user/hive/warehouse > B $ HADOOP_HOME / bin / hadoop fs -mkdir -p /user/hive/warehouse > Using B instead of A might be better > 2, the following two description positions need to be adjusted > A.Running Hive CLI > To use the Hive command line interface (CLI) from the shell: > $ $HIVE_HOME/bin/hive > B.Running HiveServer2 and Beeline > Starting from Hive 2.1, we need to run the schematool command below as an > initialization step. For example, we can use "derby" as db type. > $ $HIVE_HOME/bin/schematool -dbType -initSchema > When I execute the $HIVE_HOME/bin/hive command, the following error occurs: > !screenshot-1.png! > When I execute the following order, and then the implementation of hive order > problem solving: > $ HIVE_HOME/bin/schematool -dbType derby -initSchema -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17089) make acid 2.0 the default
[ https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127668#comment-16127668 ] Eugene Koifman commented on HIVE-17089: --- no related failures for patch 15 > make acid 2.0 the default > - > > Key: HIVE-17089 > URL: https://issues.apache.org/jira/browse/HIVE-17089 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, > HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, > HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch, > HIVE-17089.12.patch, HIVE-17089.13.patch, HIVE-17089.14.patch, > HIVE-17089.15.patch > > > acid 2.0 is introduced in HIVE-14035. It replaces Update events with a > combination of Delete + Insert events. This now makes U=D+I the default (and > only) supported acid table type in Hive 3.0. > The expectation for upgrade is that Major compaction has to be run on all > acid tables in the existing Hive cluster and that no new writes to these > table take place since the start of compaction (Need to add a mechanism to > put a table in read-only mode - this way it can still be read while it's > being compacted). Then upgrade to Hive 3.0 can take place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17089) make acid 2.0 the default
[ https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127663#comment-16127663 ] Hive QA commented on HIVE-17089: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881954/HIVE-17089.15.patch {color:green}SUCCESS:{color} +1 due to 13 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10969 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6403/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6403/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6403/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881954 - PreCommit-HIVE-Build > make acid 2.0 the default > - > > Key: HIVE-17089 > URL: https://issues.apache.org/jira/browse/HIVE-17089 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, > HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch, > HIVE-17089.10.patch, HIVE-17089.10.patch, HIVE-17089.11.patch, > HIVE-17089.12.patch, HIVE-17089.13.patch, HIVE-17089.14.patch, > HIVE-17089.15.patch > > > acid 2.0 is introduced in HIVE-14035. It replaces Update events with a > combination of Delete + Insert events. This now makes U=D+I the default (and > only) supported acid table type in Hive 3.0. > The expectation for upgrade is that Major compaction has to be run on all > acid tables in the existing Hive cluster and that no new writes to these > table take place since the start of compaction (Need to add a mechanism to > put a table in read-only mode - this way it can still be read while it's > being compacted). Then upgrade to Hive 3.0 can take place. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17181) HCatOutputFormat should expose complete output-schema (including partition-keys) for dynamic-partitioning MR jobs
[ https://issues.apache.org/jira/browse/HIVE-17181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127642#comment-16127642 ] Mithun Radhakrishnan commented on HIVE-17181: - Yes, sir. I'm lining the commits up right now. I'd like to repeat the {{branch-2}} tests before I commit there. > HCatOutputFormat should expose complete output-schema (including > partition-keys) for dynamic-partitioning MR jobs > - > > Key: HIVE-17181 > URL: https://issues.apache.org/jira/browse/HIVE-17181 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 2.2.0, 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17181.1.patch, HIVE-17181.2.patch, > HIVE-17181.3.patch, HIVE-17181.branch-2.patch > > > Map/Reduce jobs that use HCatalog APIs to write to Hive tables using Dynamic > partitioning are expected to call the following API methods: > # {{HCatOutputFormat.setOutput()}} to indicate which table/partitions to > write to. This call populates the {{OutputJobInfo}} with details fetched from > the Metastore. > # {{HCatOutputFormat.setSchema()}} to indicate the output-schema for the data > being written. > It is a common mistake to invoke {{HCatOUtputFormat.setSchema()}} as follows: > {code:java} > HCatOutputFormat.setSchema(conf, HCatOutputFormat.getTableSchema(conf)); > {code} > Unfortunately, {{getTableSchema()}} returns only the record-schema, not the > entire table's schema. We'll need a better API for use in M/R jobs to get the > complete table-schema. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17218) Canonical-ize hostnames for Hive metastore, and HS2 servers.
[ https://issues.apache.org/jira/browse/HIVE-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127633#comment-16127633 ] Mithun Radhakrishnan commented on HIVE-17218: - Certainly, sir. Thank you for the review. > Canonical-ize hostnames for Hive metastore, and HS2 servers. > > > Key: HIVE-17218 > URL: https://issues.apache.org/jira/browse/HIVE-17218 > Project: Hive > Issue Type: Bug > Components: HiveServer2, Metastore, Security >Affects Versions: 1.2.2, 2.2.0, 3.0.0 >Reporter: Mithun Radhakrishnan >Assignee: Mithun Radhakrishnan > Attachments: HIVE-17218.1.patch > > > Currently, the {{HiveMetastoreClient}} and {{HiveConnection}} do not > canonical-ize the hostnames of the metastore/HS2 servers. In deployments > where there are multiple such servers behind a VIP, this causes a number of > inconveniences: > # The client-side configuration (e.g. {{hive.metastore.uris}} in > {{hive-site.xml}}) needs to specify the VIP's hostname, and cannot use a > simplified CNAME, in the thrift URL. If the > {{hive.metastore.kerberos.principal}} is specified using {{_HOST}}, one sees > GSS failures as follows: > {noformat} > hive --hiveconf hive.metastore.kerberos.principal=hive/_h...@grid.myth.net > --hiveconf > hive.metastore.uris="thrift://simplified-hcat-cname.grid.myth.net:56789" > ... > Exception in thread "main" java.lang.RuntimeException: > java.lang.RuntimeException: Unable to instantiate > org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient > at > org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:542) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) > ... > {noformat} > This is because {{_HOST}} is filled in with the CNAME, and not the > canonicalized name. > # Oozie workflows that use HCat {{}} have to always use the VIP > hostname, and can't use {{_HOST}}-based service principals, if the CNAME > differs from the VIP name. > If the client-code simply canonical-ized the hostnames, it would enable the > use of both simplified CNAMEs, and _HOST in service principals. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17308: --- Status: Patch Available (was: Open) > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17308: --- Status: Open (was: Patch Available) > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17308) Improvement in join cardinality estimation
[ https://issues.apache.org/jira/browse/HIVE-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17308: --- Attachment: HIVE-17308.7.patch > Improvement in join cardinality estimation > -- > > Key: HIVE-17308 > URL: https://issues.apache.org/jira/browse/HIVE-17308 > Project: Hive > Issue Type: Improvement > Components: Query Planning >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17308.1.patch, HIVE-17308.2.patch, > HIVE-17308.3.patch, HIVE-17308.4.patch, HIVE-17308.5.patch, > HIVE-17308.6.patch, HIVE-17308.7.patch > > > Currently during logical planning join cardinality is estimated assuming no > correlation among join keys (This estimation is done using exponential > backoff). Physical planning on the other hand consider correlation for multi > keys and uses different estimation. We should consider correlation during > logical planning as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-17323) Improve upon HIVE-16260
[ https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-17323 started by Deepak Jaiswal. - > Improve upon HIVE-16260 > --- > > Key: HIVE-17323 > URL: https://issues.apache.org/jira/browse/HIVE-17323 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > > HIVE-16260 allows removal of parallel edges of semijoin with mapjoins. > https://issues.apache.org/jira/browse/HIVE-16260 > However, it should also consider dynamic partition pruning edge like semijoin > without removing it while traversing the query tree. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17323) Improve upon HIVE-16260
[ https://issues.apache.org/jira/browse/HIVE-17323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal reassigned HIVE-17323: - > Improve upon HIVE-16260 > --- > > Key: HIVE-17323 > URL: https://issues.apache.org/jira/browse/HIVE-17323 > Project: Hive > Issue Type: Bug >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > > HIVE-16260 allows removal of parallel edges of semijoin with mapjoins. > https://issues.apache.org/jira/browse/HIVE-16260 > However, it should also consider dynamic partition pruning edge like semijoin > without removing it while traversing the query tree. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17292) Change TestMiniSparkOnYarnCliDriver test configuration to use the configured cores
[ https://issues.apache.org/jira/browse/HIVE-17292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127554#comment-16127554 ] Hive QA commented on HIVE-17292: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881950/HIVE-17292.5.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11005 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.testCliDriver[spark_job_max_tasks] (batchId=242) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.testCliDriver[spark_stage_max_tasks] (batchId=242) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6402/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6402/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6402/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881950 - PreCommit-HIVE-Build > Change TestMiniSparkOnYarnCliDriver test configuration to use the configured > cores > -- > > Key: HIVE-17292 > URL: https://issues.apache.org/jira/browse/HIVE-17292 > Project: Hive > Issue Type: Sub-task > Components: Spark, Test >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-17292.1.patch, HIVE-17292.2.patch, > HIVE-17292.3.patch, HIVE-17292.5.patch > > > Currently the {{hive-site.xml}} for the {{TestMiniSparkOnYarnCliDriver}} test > defines 2 cores, and 2 executors, but only 1 is used, because the MiniCluster > does not allows the creation of the 3rd container. > The FairScheduler uses 1GB increments for memory, but the containers would > like to use only 512MB. We should change the fairscheduler configuration to > use only the requested 512MB -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17272) when hive.vectorized.execution.enabled is true, query on empty partitioned table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127515#comment-16127515 ] Aihua Xu commented on HIVE-17272: - patch-2: handle the case of vectorPartDesc is null to avoid NPE exception. Such case can happen for the empty table while internally hive is generating an empty file for it. > when hive.vectorized.execution.enabled is true, query on empty partitioned > table fails with NPE > --- > > Key: HIVE-17272 > URL: https://issues.apache.org/jira/browse/HIVE-17272 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.1.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-17272.2.patch > > > {noformat} > set hive.vectorized.execution.enabled=true; > CREATE TABLE `tab`(`x` int) PARTITIONED BY ( `y` int) stored as parquet; > select * from tab t1 join tab t2 where t1.x=t2.x; > {noformat} > The query fails with the following exception. > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.createAndInitPartitionContext(VectorMapOperator.java:386) > ~[hive-exec-2.3.0.jar:2.3.0] > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.internalSetChildren(VectorMapOperator.java:559) > ~[hive-exec-2.3.0.jar:2.3.0] > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setChildren(VectorMapOperator.java:474) > ~[hive-exec-2.3.0.jar:2.3.0] > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) > ~[hive-exec-2.3.0.jar:2.3.0] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_101] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_101] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_101] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101] > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > ~[hadoop-common-2.6.0.jar:?] > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_101] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_101] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_101] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101] > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > ~[hadoop-common-2.6.0.jar:?] > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_101] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[?:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[?:1.8.0_101] > at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_101] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17272) when hive.vectorized.execution.enabled is true, query on empty partitioned table fails with NPE
[ https://issues.apache.org/jira/browse/HIVE-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-17272: Attachment: (was: HIVE-17272.1.patch) > when hive.vectorized.execution.enabled is true, query on empty partitioned > table fails with NPE > --- > > Key: HIVE-17272 > URL: https://issues.apache.org/jira/browse/HIVE-17272 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.1.1 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-17272.2.patch > > > {noformat} > set hive.vectorized.execution.enabled=true; > CREATE TABLE `tab`(`x` int) PARTITIONED BY ( `y` int) stored as parquet; > select * from tab t1 join tab t2 where t1.x=t2.x; > {noformat} > The query fails with the following exception. > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.createAndInitPartitionContext(VectorMapOperator.java:386) > ~[hive-exec-2.3.0.jar:2.3.0] > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.internalSetChildren(VectorMapOperator.java:559) > ~[hive-exec-2.3.0.jar:2.3.0] > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.setChildren(VectorMapOperator.java:474) > ~[hive-exec-2.3.0.jar:2.3.0] > at > org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:106) > ~[hive-exec-2.3.0.jar:2.3.0] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_101] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_101] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_101] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101] > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > ~[hadoop-common-2.6.0.jar:?] > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_101] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_101] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_101] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101] > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) > ~[hadoop-common-2.6.0.jar:?] > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) > ~[hadoop-common-2.6.0.jar:?] > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:413) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at > org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:268) > ~[hadoop-core-2.6.0-mr1-cdh5.4.2.jar:?] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_101] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[?:1.8.0_101] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > ~[?:1.8.0_101] > at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_101] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17316) Use String.contains for the hidden configuration variables
[ https://issues.apache.org/jira/browse/HIVE-17316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-17316: --- Attachment: HIVE-17316.02.patch Made some small change. Instead of checking String.contains I would use String.startsWith to reduce the number of accidental parameter restrictions. Also fixed failing unit and q tests. > Use String.contains for the hidden configuration variables > -- > > Key: HIVE-17316 > URL: https://issues.apache.org/jira/browse/HIVE-17316 > Project: Hive > Issue Type: Sub-task >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Attachments: HIVE-17316.01.patch, HIVE-17316.02.patch > > > Currently HiveConf variables which should not be displayed to the user need > to be enumerated. We should enhance this to be able to hide configuration > variables by substring not just full equality. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17305) New insert overwrite dynamic partitions qtest need to have the golden file regenerated
[ https://issues.apache.org/jira/browse/HIVE-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17305: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks [~zsombor.klara] for the patch! > New insert overwrite dynamic partitions qtest need to have the golden file > regenerated > -- > > Key: HIVE-17305 > URL: https://issues.apache.org/jira/browse/HIVE-17305 > Project: Hive > Issue Type: Bug > Components: Tests >Affects Versions: 3.0.0 >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-17305.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17322) Serialise BeeLine qtest execution to prevent flakyness
[ https://issues.apache.org/jira/browse/HIVE-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barna Zsombor Klara updated HIVE-17322: --- Attachment: HIVE-17322.04.patch > Serialise BeeLine qtest execution to prevent flakyness > -- > > Key: HIVE-17322 > URL: https://issues.apache.org/jira/browse/HIVE-17322 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara >Priority: Minor > Attachments: HIVE-17322.01.patch, HIVE-17322.02.patch, > HIVE-17322.03.patch, HIVE-17322.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17322) Serialise BeeLine qtest execution to prevent flakyness
[ https://issues.apache.org/jira/browse/HIVE-17322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127444#comment-16127444 ] Hive QA commented on HIVE-17322: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12881935/HIVE-17322.03.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11004 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only] (batchId=243) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only] (batchId=243) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only] (batchId=170) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=169) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=180) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=180) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6401/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6401/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6401/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12881935 - PreCommit-HIVE-Build > Serialise BeeLine qtest execution to prevent flakyness > -- > > Key: HIVE-17322 > URL: https://issues.apache.org/jira/browse/HIVE-17322 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara >Priority: Minor > Attachments: HIVE-17322.01.patch, HIVE-17322.02.patch, > HIVE-17322.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17268) WebUI / QueryPlan: query plan is sometimes null when explain output conf is on
[ https://issues.apache.org/jira/browse/HIVE-17268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17268: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks for your contribution [~klcopp]! > WebUI / QueryPlan: query plan is sometimes null when explain output conf is on > -- > > Key: HIVE-17268 > URL: https://issues.apache.org/jira/browse/HIVE-17268 > Project: Hive > Issue Type: Bug >Reporter: Karen Coppage >Assignee: Karen Coppage >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17268.2.patch, HIVE-17268.3.patch, HIVE-17268.patch > > > The Hive WebUI's Query Plan tab displays "SET hive.log.explain.output TO true > TO VIEW PLAN" even when hive.log.explain.output is set to true, when the > query cannot be compiled, because the plan is null in this case. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17311) Numeric overflow in the HiveConf
[ https://issues.apache.org/jira/browse/HIVE-17311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-17311: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks for the patch [~olegd]! > Numeric overflow in the HiveConf > > > Key: HIVE-17311 > URL: https://issues.apache.org/jira/browse/HIVE-17311 > Project: Hive > Issue Type: Bug >Reporter: Oleg Danilov >Assignee: Oleg Danilov >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17311.patch > > > multiplierFor() method contains a typo, which causes wrong parsing of the > rare suffixes ('tb' & 'pb'). -- This message was sent by Atlassian JIRA (v6.4.14#64029)