[jira] [Created] (HIVE-17700) Update committer list

2017-10-04 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-17700:
---

 Summary: Update committer list
 Key: HIVE-17700
 URL: https://issues.apache.org/jira/browse/HIVE-17700
 Project: Hive
  Issue Type: Bug
Reporter: Aihua Xu
Assignee: Aihua Xu
Priority: Minor


Please update committer list:
Name: Aihua Xu
Apache ID: aihuaxu
Organization: Cloudera

Name: Yongzhi Chen
Apache ID: ychena
Organization: Cloudera



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17095) Long chain repl loads do not complete in a timely fashion

2017-07-13 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-17095:
---

 Summary: Long chain repl loads do not complete in a timely fashion
 Key: HIVE-17095
 URL: https://issues.apache.org/jira/browse/HIVE-17095
 Project: Hive
  Issue Type: Bug
  Components: Query Planning, repl
Reporter: sapin amin
Assignee: Sushanth Sowmyan


Per performance testing done by [~sapinamin] (thus, I'm setting him as 
reporter), we were able to discover an important bug affecting replication. It 
has the potential to affect other large DAGs of Tasks that hive generates as 
well, if those DAGs have multiple paths to child Task nodes.

Basically, we find that incremental REPL LOAD does not finish in a timely 
fashion. The test, in this case was to add 400 partitions, and replicate them. 
Associated with each partition, there was an ADD PTN and a ALTER PTN. For each 
of the ADD PTN tasks, we'd generate a DDLTask, a CopyTask and a MoveTask. For 
each Alter ptn, there'd be a single DDLTask. And order of execution is 
important, so it would chain in dependency collection tasks between phases.

Trying to root cause this shows us that it seems to stall forever at the Driver 
instantiation time, and it almost looks like the thread doesn't proceed past 
that point.

Looking at logs, it seems that the way this is written, it looks for all tasks 
generated that are subtrees of all nodes, without looking for duplicates, and 
this is done simply to get the number of execution tasks!

And thus, the task visitor will visit every subtree of every node, which is 
fine if you have graphs that look like open trees, but is horrible for us, 
since we have dependency collection tasks between each phase. Effectively, this 
is what's happening:

We have a DAG, say, like this:

4 tasks in parallel -> DEP col -> 4 tasks in parallel -> DEP col -> ...

This means that for each of the 4 root tasks, we will do a full traversal of 
every graph (not just every node) past the DEP col, and this happens 
recursively, and this leads to an exponential growth of number of tasks visited 
as the length and breadth of the graph increase. In our case, we had about 800 
tasks in the graph, with roughly a width of about 2-3, with 200 stages, a dep 
collection before and after, and this meant that leaf nodes of this DAG would 
have something like 2^200 - 3^200 ways in which they can be visited, and thus, 
we'd visit them in all those ways. And all this simply to count the number of 
tasks to schedule - we would revisit this function multiple more times, once 
per each hook, once for the MapReduceCompiler and once for the TaskCompiler.

We have not been sending such large DAGs to the Driver, thus it has not yet 
been a problem, and there are upcoming changes to reduce the number of tasks 
replication generates(as part of a memory addressing issue), but we still 
should fix the way we do Task traversal so that a large DAG cannot cripple us.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-17005) Ensure REPL DUMP and REPL LOAD are authorized properly

2017-06-30 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-17005:
---

 Summary: Ensure REPL DUMP and REPL LOAD are authorized properly
 Key: HIVE-17005
 URL: https://issues.apache.org/jira/browse/HIVE-17005
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Currently, we piggyback REPL DUMP and REPL LOAD on EXPORT and IMPORT auth 
privileges. However, work is on to not populate all the relevant objects in 
inputObjs and outputObjs, which then requires that REPL DUMP and REPL LOAD be 
authorized at a higher level, and simply require ADMIN_PRIV to run,



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16918) Skip ReplCopyTask distcp for _metadata copying. Also enable -pb for distcp

2017-06-19 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-16918:
---

 Summary: Skip ReplCopyTask distcp for _metadata copying. Also 
enable -pb for distcp
 Key: HIVE-16918
 URL: https://issues.apache.org/jira/browse/HIVE-16918
 Project: Hive
  Issue Type: Bug
  Components: repl
Affects Versions: 3.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


With HIVE-16686, we switched ReplCopyTask to always use a privileged DistCp. 
This, however, is incorrect for copying _metadata generated from a temporary 
scratch directory to hdfs. We need to change that so that routes to using a 
regular CopyTask.

Also, in the spirit of following up HIVE-16686, we missed adding "-pb" as a 
default for invocations of distcp from hive. Adding that in. This would not be 
necessary if HADOOP-8143 had made it in, but till it doesn't go in, we need it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HIVE-16860) HostUtil.getTaskLogUrl change between hadoop 2.3 and 2.4 breaks at runtime.

2017-06-08 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-16860:
---

 Summary: HostUtil.getTaskLogUrl change between hadoop 2.3 and 2.4 
breaks at runtime.
 Key: HIVE-16860
 URL: https://issues.apache.org/jira/browse/HIVE-16860
 Project: Hive
  Issue Type: Bug
  Components: Shims
Affects Versions: 0.13.0, 0.14.0
Reporter: Chris Drome
Assignee: Jason Dere
 Fix For: 0.14.0


The signature for HostUtil.getTaskLogUrl has changed between Hadoop-2.3 and 
Hadoop-2.4.

Code in 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java works 
with Hadoop-2.3 method and causes compilation failure with Hadoop-2.4.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16686) repli invocations of distcp needs additional handling

2017-05-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-16686:
---

 Summary: repli invocations of distcp needs additional handling
 Key: HIVE-16686
 URL: https://issues.apache.org/jira/browse/HIVE-16686
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


When REPL LOAD invokes distcp, there needs to be a way for the user invoking 
REPL LOAD to pass on arguments to distcp. In addition, there is sometimes a 
need for distcp to be invoked from within an impersonated context, such as 
running as user "hdfs", asking distcp to preserve ownerships of individual 
files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-16642) New Events created as part of replv2 potentially break replv1

2017-05-11 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-16642:
---

 Summary: New Events created as part of replv2 potentially break 
replv1
 Key: HIVE-16642
 URL: https://issues.apache.org/jira/browse/HIVE-16642
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We have a couple of new events introduced, such as 
{CREATE,DROP}{INDEX,FUNCTION} since the introduction of replv1, but those which 
do not have a replv1 ReplicationTask associated with them.

Thus, for users like Falcon, we potentially wind up throwing a 
IllegalStateException if replv1 based HiveDR is running on a cluster with these 
updated events.

Thus, we should be more graceful when encountering them, returning a 
NoopReplicationTask equivalent that they can make use of, or ignore, for such 
newer events.

In addition, we should add additional test cases so that we track whether or 
not the creation of these events leads to any backward incompatibility we 
introduce. To this end, if any of the events should change so that we introduce 
a backward incompatibility, we should have these tests fail, and alert us to 
that possibility.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (HIVE-15668) change REPL DUMP syntax to use "LIMIT" instead of "BATCH" keyword

2017-01-19 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15668:
---

 Summary: change REPL DUMP syntax to use "LIMIT" instead of "BATCH" 
keyword
 Key: HIVE-15668
 URL: https://issues.apache.org/jira/browse/HIVE-15668
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Currently, REPL DUMP syntax goes:

{noformat}
REPL DUMP [[.]] [FROM  [BATCH ]]
{noformat}

The BATCH directive says that when doing an event dump, to not dump out more 
than _batchSize_ number of events. However, there is a clearer keyword for the 
same effect, and that is LIMIT. Thus, rephrasing the syntax as follows makes it 
clearer:

{noformat}
REPL DUMP [[.]] [FROM  [LIMIT ]]
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15652) Optimize(reduce) the number of alter calls made to fix repl.last.id

2017-01-17 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15652:
---

 Summary: Optimize(reduce) the number of alter calls made to fix 
repl.last.id
 Key: HIVE-15652
 URL: https://issues.apache.org/jira/browse/HIVE-15652
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Per code review from HIVE-15534, we might be doing alters to parent objects to 
set repl.last.id when it is not necessary, since some future event might make 
this alter redundant.

There are 3 cases where this might happen:

a) After a CREATE_TABLE event - any prior reference to that table does not need 
an ALTER, since CREATE_TABLE will have a repl.last.id come with it.
b) After a DROP_TABLE event - any prior reference to that table is irrelevant, 
and thus, no alter is needed.
c) After an ALTER_TABLE event, since that dump will itself do a metadata update 
that will get the latest repl.last.id along with this event.

In each of these cases, we can remove the alter call needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15536) Tests failing due to unexpected q.out outputs : udf_coalesce,case_sensitivity,input_testxpath,

2017-01-03 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15536:
---

 Summary: Tests failing due to unexpected q.out outputs : 
udf_coalesce,case_sensitivity,input_testxpath,
 Key: HIVE-15536
 URL: https://issues.apache.org/jira/browse/HIVE-15536
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


All of these tests seem to be failing based on a q.out diff:

{noformat}
Running: diff -a 
/home/hiveptest/162.222.183.40-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/input_testxpath.q.out
 
/home/hiveptest/162.222.183.40-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/input_testxpath.q.out
32a33
> Pruned Column Paths: lintstring.mystring
{noformat}

{noformat}
Running: diff -a 
/home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/case_sensitivity.q.out
 
/home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/case_sensitivity.q.out
32a33
> Pruned Column Paths: lintstring.mystring
{noformat}

{noformat}
Running: diff -a 
/home/hiveptest/104.197.172.185-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/udf_coalesce.q.out
 
/home/hiveptest/104.197.172.185-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/udf_coalesce.q.out
142a143
>   Pruned Column Paths: lintstring.mystring
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15535) Flaky test : TestHS2HttpServer.testContextRootUrlRewrite

2017-01-03 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15535:
---

 Summary: Flaky test : TestHS2HttpServer.testContextRootUrlRewrite
 Key: HIVE-15535
 URL: https://issues.apache.org/jira/browse/HIVE-15535
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Per recent test failure : 
https://builds.apache.org/job/PreCommit-HIVE-Build/2766/testReport/org.apache.hive.service.server/TestHS2HttpServer/testContextRootUrlRewrite/

{noformat}
Stacktrace

org.junit.ComparisonFailure: 
expected:<...d>Tue Jan 03 11:54:4[6] PST 2017
 ...> but was:<...d>Tue Jan 03 11:54:4[7] PST 2017
 ...>
at org.junit.Assert.assertEquals(Assert.java:115)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hive.service.server.TestHS2HttpServer.testContextRootUrlRewrite(TestHS2HttpServer.java:99)
{noformat}

Looks like it is overly picky on an exact string match on a field that contains 
a second difference.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15534) Update db/table repl.last.id at the end of REPL LOAD of a batch of events

2017-01-03 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15534:
---

 Summary: Update db/table repl.last.id at the end of REPL LOAD of a 
batch of events
 Key: HIVE-15534
 URL: https://issues.apache.org/jira/browse/HIVE-15534
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Tracking TODO task in ReplSemanticAnalyzer :

{noformat}
// TODO : Over here, we need to track a Map 
for every db updated
// and update repl.last.id for each, if this is a wh-level load, and if 
it is a db-level load,
// then a single repl.last.id update, and if this is a tbl-lvl load 
which does not alter the
// table itself, we'll need to update repl.last.id for that as well.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15533) Repl rename support adds unnecessary duplication for non-rename alters

2017-01-03 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15533:
---

 Summary: Repl rename support adds unnecessary duplication for 
non-rename alters
 Key: HIVE-15533
 URL: https://issues.apache.org/jira/browse/HIVE-15533
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Currently, the rename events contain a before & after object. For non-rename 
cases, we simply impress the "after" object, and thus have no need of the 
"before" object. Thus, we might want to minimize wastage by not materializing 
"before" if this is a non-rename case.

Also worth considering - if a rename case, do we really need the before object, 
or simply the before & after names?

Having before & after objects is good in that it allows us flexibility, but we 
might not need that much info. From a perf viewpoint, we might want to trim 
things a bit here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15532) Refactor/cleanup TestReplicationScenario

2017-01-03 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15532:
---

 Summary: Refactor/cleanup TestReplicationScenario
 Key: HIVE-15532
 URL: https://issues.apache.org/jira/browse/HIVE-15532
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


TestReplicationScenarios could use a bit of cleanup, based on comments from 
reviews:

a) Separate "setup" phase of each test, so that we don't run unnecessary 
verifications which aren't testing replication itself, but are verifying that 
the env is set up correctly to then test replication. This can be flag-gated so 
as to allow it to be turned on at test-dev time, and off during build/commit 
unit test time.

b) Better comments inside the tests for what is being set up / tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15522) REPL LOAD & DUMP support for incremental ALTER_TABLE/ALTER_PTN including renames

2016-12-28 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15522:
---

 Summary: REPL LOAD & DUMP support for incremental 
ALTER_TABLE/ALTER_PTN including renames
 Key: HIVE-15522
 URL: https://issues.apache.org/jira/browse/HIVE-15522
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15480) Failing test : TestMiniTezCliDriver.testCliDriver : explainanalyze_1

2016-12-20 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15480:
---

 Summary: Failing test : TestMiniTezCliDriver.testCliDriver : 
explainanalyze_1
 Key: HIVE-15480
 URL: https://issues.apache.org/jira/browse/HIVE-15480
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


See recent ptest failure : 
https://builds.apache.org/job/PreCommit-HIVE-Build/2642/testReport/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/testCliDriver_explainanalyze_1_/

{noformat}
Standard Output

Running: diff -a 
/home/hiveptest/104.154.92.121-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/explainanalyze_1.q.out
 
/home/hiveptest/104.154.92.121-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/tez/explainanalyze_1.q.out
248c248
< Group By Operator [GBY_2] (rows=205/500 width=95)
---
> Group By Operator [GBY_2] (rows=205/309 width=95)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15469) Fix REPL DUMP/LOAD DROP_PTN so it works on non-string-ptn-key tables

2016-12-19 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15469:
---

 Summary: Fix REPL DUMP/LOAD DROP_PTN so it works on 
non-string-ptn-key tables
 Key: HIVE-15469
 URL: https://issues.apache.org/jira/browse/HIVE-15469
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


The current implementation of REPL DROP/REPL LOAD for DROP_PTN is limited to 
dropping partitions whose key types are strings. This needs the tableObj to be 
available in the DropPartitionMessage before it can be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15466) REPL LOAD & DUMP support for incremental DROP_TABLE/DROP_PTN

2016-12-19 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15466:
---

 Summary: REPL LOAD & DUMP support for incremental 
DROP_TABLE/DROP_PTN
 Key: HIVE-15466
 URL: https://issues.apache.org/jira/browse/HIVE-15466
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15455) Flaky test : TestHS2HttpServer.testContextRootUrlRewrite

2016-12-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15455:
---

 Summary: Flaky test : TestHS2HttpServer.testContextRootUrlRewrite
 Key: HIVE-15455
 URL: https://issues.apache.org/jira/browse/HIVE-15455
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


This test failed in ptest when testing HIVE-15426 but seems to succeed locally. 
I'm not able to find another recent run which had this test fail as well, and 
the test logs for HIVE-15426 have been rotated out. Creating this jira anyway, 
to track it if it pops up again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15454) Failing test : TestMiniTezCliDriver.testCliDriver : explainanalyze_2

2016-12-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15454:
---

 Summary: Failing test : TestMiniTezCliDriver.testCliDriver : 
explainanalyze_2
 Key: HIVE-15454
 URL: https://issues.apache.org/jira/browse/HIVE-15454
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


This test has failed on some recent ptest runs.

Example : 
https://builds.apache.org/job/PreCommit-HIVE-Build/2611/testReport/junit/org.apache.hadoop.hive.cli/TestMiniTezCliDriver/testCliDriver_explainanalyze_2_/

{noformat}
Standard Output

Running: diff -a 
/home/hiveptest/104.197.114.29-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/explainanalyze_2.q.out
 
/home/hiveptest/104.197.114.29-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/tez/explainanalyze_2.q.out
2095c2095
<   Group By Operator [GBY_16] (rows=500/760 width=280)
---
>   Group By Operator [GBY_16] (rows=500/619 width=280)
2105c2105
<   Group By Operator [GBY_22] (rows=1001/760 width=464)
---
>   Group By Operator [GBY_22] (rows=1001/619 width=464)
2111c2111
<   Group By Operator [GBY_16] (rows=500/760 width=280)
---
>   Group By Operator [GBY_16] (rows=500/619 width=280)
2119c2119
<   Group By Operator [GBY_22] (rows=1001/760 width=464)
---
>   Group By Operator [GBY_22] (rows=1001/619 width=464)
2125c2125
<   Group By Operator [GBY_16] (rows=500/760 width=280)
---
>   Group By Operator [GBY_16] (rows=500/619 width=280)
2142c2142
<   Group By Operator [GBY_22] (rows=1001/760 width=464)
---
>   Group By Operator [GBY_22] (rows=1001/619 width=464)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15453) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : stats_based_fetch_decision

2016-12-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15453:
---

 Summary: Failing test : TestMiniLlapLocalCliDriver.testCliDriver : 
stats_based_fetch_decision
 Key: HIVE-15453
 URL: https://issues.apache.org/jira/browse/HIVE-15453
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


This test has been failing in a couple of ptests off late. A recent example is 
in 
https://builds.apache.org/job/PreCommit-HIVE-Build/2612/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_stats_based_fetch_decision_/

{noformat}
2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_239
2016-12-16 09:42:14 Completed running task attempt: 
attempt_1481909974530_0001_239_00_00_0
2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_239
2016-12-16 09:42:14 Running Dag: dag_1481909974530_0001_240
2016-12-16 09:42:14 Completed running task attempt: 
attempt_1481909974530_0001_240_00_00_0
2016-12-16 09:42:14 Completed Dag: dag_1481909974530_0001_240
Running: diff -a 
/home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/stats_based_fetch_decision.q.out
 
/home/hiveptest/104.154.196.58-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/stats_based_fetch_decision.q.out
153c153
<   Statistics: Num rows: 2000 Data size: 1092000 Basic stats: 
COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 2000 Data size: 1092000 Basic stats: 
> COMPLETE Column stats: COMPLETE
156c156
< Statistics: Num rows: 1 Data size: 546 Basic stats: 
COMPLETE Column stats: PARTIAL
---
> Statistics: Num rows: 1 Data size: 546 Basic stats: 
> COMPLETE Column stats: COMPLETE
160c160
<   Statistics: Num rows: 1 Data size: 543 Basic stats: 
COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: COMPLETE
163c163
< Statistics: Num rows: 1 Data size: 543 Basic stats: 
COMPLETE Column stats: PARTIAL
---
> Statistics: Num rows: 1 Data size: 543 Basic stats: 
> COMPLETE Column stats: COMPLETE
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15452) Failing test : TestMiniLlapLocalCliDriver.testCliDriver : metadataonly1

2016-12-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15452:
---

 Summary: Failing test : TestMiniLlapLocalCliDriver.testCliDriver : 
metadataonly1
 Key: HIVE-15452
 URL: https://issues.apache.org/jira/browse/HIVE-15452
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Test seems to be failing on recent ptest runs.

See 
https://builds.apache.org/job/PreCommit-HIVE-Build/2615/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapLocalCliDriver/testCliDriver_metadataonly1_/
 for recent example.

{noformat}
Running: diff -a 
/home/hiveptest/104.154.236.143-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/metadataonly1.q.out
 
/home/hiveptest/104.154.236.143-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/llap/metadataonly1.q.out
148c148
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
240c240
< NULL
---
> 1
287c287
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
379c379
< 0
---
> 1
971c971
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1016c1016
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1061c1061
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1160a1161
> 1 3
1448c1449
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1492c1493
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1587c1588
< NULL
---
> 2
1690c1691
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1735c1736
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1780c1781
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1825c1826
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1870c1871
<   input format: 
org.apache.hadoop.hive.ql.io.ZeroRowsInputFormat
---
>   input format: 
> org.apache.hadoop.hive.ql.io.OneNullRowInputFormat
1975a1977,1979
> 01:10:10  1
> 01:10:20  1
> 1 3
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15451) Failing test : TestMiniLlapCliDriver.testCliDriver : transform_ppr2

2016-12-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15451:
---

 Summary: Failing test : TestMiniLlapCliDriver.testCliDriver :  
transform_ppr2
 Key: HIVE-15451
 URL: https://issues.apache.org/jira/browse/HIVE-15451
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


This test has been failing on ptest off late.

See 
https://builds.apache.org/job/PreCommit-HIVE-Build/2615/testReport/junit/org.apache.hadoop.hive.cli/TestMiniLlapCliDriver/testCliDriver_transform_ppr2_/
 for a recent example

Fails on stdout diff:
{noformat}
2016-12-16 12:20:11 Completed running task attempt: 
attempt_1481919437560_0001_177_01_00_0
Running: diff -a 
/home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/transform_ppr2.q.out
 
/home/hiveptest/35.184.94.117-hiveptest-0/apache-github-source-source/ql/src/test/results/clientpositive/llap/transform_ppr2.q.out
41c41
<   Statistics: Num rows: 1000 Data size: 178000 Basic stats: 
COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 1000 Data size: 178000 Basic stats: 
> COMPLETE Column stats: COMPLETE
46c46
< Statistics: Num rows: 1000 Data size: 272000 Basic stats: 
COMPLETE Column stats: PARTIAL
---
> Statistics: Num rows: 1000 Data size: 272000 Basic stats: 
> COMPLETE Column stats: COMPLETE
59c59
<   Statistics: Num rows: 1000 Data size: 272000 Basic 
stats: COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 1000 Data size: 272000 Basic 
> stats: COMPLETE Column stats: COMPLETE
63c63
< Statistics: Num rows: 333 Data size: 2664 Basic 
stats: COMPLETE Column stats: PARTIAL
---
> Statistics: Num rows: 333 Data size: 2664 Basic 
> stats: COMPLETE Column stats: COMPLETE
69c69
<   Statistics: Num rows: 333 Data size: 2664 Basic 
stats: COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 333 Data size: 2664 Basic 
> stats: COMPLETE Column stats: COMPLETE
178c178
< Statistics: Num rows: 333 Data size: 2664 Basic stats: 
COMPLETE Column stats: PARTIAL
---
> Statistics: Num rows: 333 Data size: 2664 Basic stats: 
> COMPLETE Column stats: COMPLETE
184c184
<   Statistics: Num rows: 333 Data size: 2664 Basic stats: 
COMPLETE Column stats: PARTIAL
---
>   Statistics: Num rows: 333 Data size: 2664 Basic stats: 
> COMPLETE Column stats: COMPLETE
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15450) Flaky tests : testCliDriver.sample[24679]

2016-12-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15450:
---

 Summary: Flaky tests : testCliDriver.sample[24679]
 Key: HIVE-15450
 URL: https://issues.apache.org/jira/browse/HIVE-15450
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Noted during ptests, the .q out seems to be erroring.

There seems to be a difference in ordering of output that is causing this 
failure.

See https://builds.apache.org/job/PreCommit-HIVE-Build/2615/#showFailuresLink 
for a new-ish job with these failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15449) Failing test : TestVectorizedColumnReaderBase (possibly slow)

2016-12-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15449:
---

 Summary: Failing test : TestVectorizedColumnReaderBase (possibly 
slow)
 Key: HIVE-15449
 URL: https://issues.apache.org/jira/browse/HIVE-15449
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Got the following error from a ptest run:

TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely 
timed out) (batchId=251)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15426) Fix order guarantee of event executions for REPL LOAD

2016-12-13 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15426:
---

 Summary: Fix order guarantee of event executions for REPL LOAD
 Key: HIVE-15426
 URL: https://issues.apache.org/jira/browse/HIVE-15426
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15332) REPL LOAD & DUMP support for incremental CREATE_TABLE/ADD_PTN

2016-12-01 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15332:
---

 Summary: REPL LOAD & DUMP support for incremental 
CREATE_TABLE/ADD_PTN
 Key: HIVE-15332
 URL: https://issues.apache.org/jira/browse/HIVE-15332
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We need to add in support for REPL LOAD and REPL DUMP of incremental events, 
and we need to be able to replicate creates, for a start. This jira tracks the 
inclusion of CREATE_TABLE/ADD_PARTITION event support to REPL DUMP & LOAD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15284) Add junit test to test replication scenarios

2016-11-24 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15284:
---

 Summary: Add junit test to test replication scenarios
 Key: HIVE-15284
 URL: https://issues.apache.org/jira/browse/HIVE-15284
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15151) Bootstrap support for replv2

2016-11-08 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-15151:
---

 Summary: Bootstrap support for replv2
 Key: HIVE-15151
 URL: https://issues.apache.org/jira/browse/HIVE-15151
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We need to support the ability to bootstrap an initial state, dumping out 
currently existing dbs/tables, etc, so that incremental replication can take 
over from that point. To this end, we should implement commands such as REPL 
DUMP, REPL LOAD, REPL STATUS, as described over at 
https://cwiki.apache.org/confluence/display/Hive/HiveReplicationv2Development



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14841) Replication - Phase 2

2016-09-26 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-14841:
---

 Summary: Replication - Phase 2
 Key: HIVE-14841
 URL: https://issues.apache.org/jira/browse/HIVE-14841
 Project: Hive
  Issue Type: New Feature
  Components: repl
Affects Versions: 2.1.0
Reporter: Sushanth Sowmyan


Per email sent out to the dev list, the current implementation of replication 
in hive has certain drawbacks, for instance :

* Replication follows a rubberbanding pattern, wherein different
tables/ptns can be in a different/mixed state on the destination, so
that unless all events are caught up on, we do not have an equivalent
warehouse. Thus, this only satisfies DR cases, not load balancing
usecases, and the secondary warehouse is really only seen as a backup,
rather than as a live warehouse that trails the primary.
* The base implementation is a naive implementation, and has several
performance problems, including a large amount of duplication of data
for subsequent events, as mentioned in HIVE-13348, having to copy out
entire partitions/tables when just a delta of files might be
sufficient/etc. Also, using EXPORT/IMPORT allows us a simple
implementation, but at the cost of tons of temporary space, much of
which is not actually applied at the destination.

Thus, to track this, we now create a new branch (repl2) and a uber-jira(this 
one) to track experimental development towards improvement of this situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14766) ObjectStore.initialize() needs retry mechanisms in case of connection failures

2016-09-15 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-14766:
---

 Summary: ObjectStore.initialize() needs retry mechanisms in case 
of connection failures
 Key: HIVE-14766
 URL: https://issues.apache.org/jira/browse/HIVE-14766
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


RetryingHMSHandler handles retries to most HMSHandler calls. However, one area 
where we do not have retries is in the very instantiation of ObjectStore. The 
lack of retries here sometimes means that a flaky db connect around the time 
the metastore is started yields an unresponsive metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14449) Expand HiveReplication doc as a admin/user-facing doc

2016-08-05 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-14449:
---

 Summary: Expand HiveReplication doc as a admin/user-facing doc
 Key: HIVE-14449
 URL: https://issues.apache.org/jira/browse/HIVE-14449
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Sushanth Sowmyan


https://cwiki.apache.org/confluence/display/Hive/Replication is a good 
user-facing/admin-facing doc for replication, in contrast to the 
https://cwiki.apache.org/confluence/display/Hive/HiveReplicationDevelopment 
which was intended to talk more about the design of it.

We should expand this further with all the knobs that exist, what APIs exist 
for other programs to take advantage of replication, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14394) Reduce excessive INFO level logging

2016-07-31 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-14394:
---

 Summary: Reduce excessive INFO level logging
 Key: HIVE-14394
 URL: https://issues.apache.org/jira/browse/HIVE-14394
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We need to cull down on the number of logs we generate in HMS and HS2 that are 
not needed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14365) Simplify logic for check introduced in HIVE-10022

2016-07-27 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-14365:
---

 Summary: Simplify logic for check introduced in HIVE-10022
 Key: HIVE-14365
 URL: https://issues.apache.org/jira/browse/HIVE-14365
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan


We introduced a parent-check/glob-check/file-check in SQLAuthorizationUtils in 
HIVE-10022, but the logic for that is more convoluted than it needs to be. 
Taking a cue off RANGER-1126 , we should simplify this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-14207) Strip HiveConf hidden params in webui conf

2016-07-11 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-14207:
---

 Summary: Strip HiveConf hidden params in webui conf
 Key: HIVE-14207
 URL: https://issues.apache.org/jira/browse/HIVE-14207
 Project: Hive
  Issue Type: Bug
  Components: Web UI
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


HIVE-12338 introduced a new web ui, which has a page that displays the current 
HiveConf being used by HS2. However, before it displays that config, it does 
not strip entries from it which are considered "hidden" conf parameters, thus 
exposing those values from a web-ui for HS2. We need to add stripping to this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13949) Investigate why Filter mechanism does not work for XSRF filtering from HS2

2016-06-05 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-13949:
---

 Summary: Investigate why Filter mechanism does not work for XSRF 
filtering from HS2
 Key: HIVE-13949
 URL: https://issues.apache.org/jira/browse/HIVE-13949
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2
Reporter: Sushanth Sowmyan


While working on HIVE-13853, it was found that simply using the constructed 
Filter as-is from ThriftHttpCLIService was not working, and thus, needed 
explicit calling of the filtering method from ThriftHttpServlet was needed.

We should investigate why that other method did not work, and make it fall 
inline with filter usage, so as to not need to call functions inside the 
filter. Also, this is a prerequisite for eventually getting rid of our shim if 
we later update to always expecting hadoop versions that contain the filter



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13941) Improve errors returned from SchemaTool

2016-06-03 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-13941:
---

 Summary: Improve errors returned from SchemaTool
 Key: HIVE-13941
 URL: https://issues.apache.org/jira/browse/HIVE-13941
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We've had feedback from Ambari folks on Schematool usage being opaque on errors.

While, yes, the underlying error is present hidden in the stacktrace if you do 
a --verbose, that is often unwieldy and unusable. And without a --verbose, 
there is no indication of what actually went wrong.

Thus, we need to fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13931) Add support for HikariCP and replace BoneCP usage with HikariCP

2016-06-02 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-13931:
---

 Summary: Add support for HikariCP and replace BoneCP usage with 
HikariCP
 Key: HIVE-13931
 URL: https://issues.apache.org/jira/browse/HIVE-13931
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Currently, we use BoneCP as our primary connection pooling mechanism 
(overridable by users). However, BoneCP is no longer being actively developed, 
and is considered deprecated, replaced by HikariCP.

Thus, we should add support for HikariCP, and try to replace our primary usage 
of BoneCP with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13853) Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat

2016-05-25 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-13853:
---

 Summary: Add X-XSRF-Header filter to HS2 HTTP mode and WebHCat
 Key: HIVE-13853
 URL: https://issues.apache.org/jira/browse/HIVE-13853
 Project: Hive
  Issue Type: Bug
  Components: HiveServer2, WebHCat
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


There is a possibility that there may be a CSRF-based attack on various hadoop 
components, and thus, there is an effort to add a block for all incoming http 
requests if they do not contain a X-XSRF-Header header. (See HADOOP-12691 for 
motivation)

This has potential to affect HS2 when running on thrift-over-http mode(if 
cookie-based-auth is used), and webhcat.

We introduce new flags to determine whether or not we're using the filter, and 
if we are, we will automatically reject any http requests which do not contain 
this header.

To allow this to work, we also need to make changes to our JDBC driver to 
automatically inject this header into any requests it makes. Also, any 
client-side programs/api not using the JDBC driver directly will need to make 
changes to add a X-XSRF-Header header to the request to make calls to 
HS2/WebHCat if this filter is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13738) Bump up httpcomponent.*.version deps in branch-1.2 to 4.4

2016-05-11 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-13738:
---

 Summary: Bump up httpcomponent.*.version deps in branch-1.2 to 4.4
 Key: HIVE-13738
 URL: https://issues.apache.org/jira/browse/HIVE-13738
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1, 1.2.2
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


apache-httpcomponents has had certain security issues (see HADOOP-12767) due to 
which upgrading to a newer dep version is recommended.

We've already upped the dep. version to 4.4 in other branches of hive, we 
should do so here as well if we are going to do a new update of 1.2.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13670) Improve Beeline reconnect semantics

2016-05-02 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-13670:
---

 Summary: Improve Beeline reconnect semantics
 Key: HIVE-13670
 URL: https://issues.apache.org/jira/browse/HIVE-13670
 Project: Hive
  Issue Type: Improvement
Affects Versions: 2.1.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


For most users of beeline, chances are that they will be using it with a single 
HS2 instance most of the time. In this scenario, having them type out a jdbc 
uri for HS2 every single time to !connect can get tiresome. Thus, we should 
improve semantics so that if a user does a successful !connect, then we must 
store the last-connected-to-url, so that if they do a !close, and then a 
!reconnect, then !reconnect should attempt to connect to the last successfully 
used url.

Also, if they then do a !save, then that last-successfully-used url must be 
saved, so that in subsequent sessions, they can simply do !reconnect rather 
than specifying a url for !connect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13645) Beeline needs null-guard around hiveVars and hiveConfVars read

2016-04-28 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-13645:
---

 Summary: Beeline needs null-guard around hiveVars and hiveConfVars 
read
 Key: HIVE-13645
 URL: https://issues.apache.org/jira/browse/HIVE-13645
 Project: Hive
  Issue Type: Bug
  Components: Beeline
Affects Versions: 2.1.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Beeline has a bug wherein if a user does a !save ever, then on next load, if 
beeline.hiveVariables or beeline.hiveconfvariables are empty, i.e. \{\} or 
unspecified, then it loads it as null, and then, on next connect, there is no 
null-check on these variables leading to an NPE.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13480) Add hadoop2 metrics reporter for Codahale metrics

2016-04-11 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-13480:
---

 Summary: Add hadoop2 metrics reporter for Codahale metrics
 Key: HIVE-13480
 URL: https://issues.apache.org/jira/browse/HIVE-13480
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan


Multiple other apache components allow sending metrics over to Hadoop2 metrics, 
which allow for monitoring solutions like Ambari Metrics Server to work against 
that to show metrics for components in one place. Our Codahale metrics works 
very well, so ideally, we would like to bridge the two, to allow Codahale to 
add a Hadoop2 reporter that enables us to continue to use Codahale metrics 
(i.e. not write another custom metrics impl) but report using Hadoop2.

Apache Phoenix also had such a recent usecase and were in the process of adding 
in a stub piece that allows this forwarding. We should use the same reporter to 
minimize redundancy while pushing metrics to a centralized solution like 
Hadoop2 Metrics/AMS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13370) Add test for HIVE-11470

2016-03-28 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-13370:
---

 Summary: Add test for HIVE-11470
 Key: HIVE-13370
 URL: https://issues.apache.org/jira/browse/HIVE-13370
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-13348) Add Event Nullification support for Replication

2016-03-23 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-13348:
---

 Summary: Add Event Nullification support for Replication
 Key: HIVE-13348
 URL: https://issues.apache.org/jira/browse/HIVE-13348
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


Replication, as implemented by HIVE-7973 works as follows:

a) For every singly modification to the hive metastore, an event gets triggered 
that logs a notification object.
b) Replication tools such as falcon can consume these notification objects as a 
HCatReplicationTaskIterator from HCatClient.getReplicationTasks(lastEventId, 
maxEvents, dbName, tableName).
c) For each event,  we generate statements and distcp requirements for falcon 
to export, distcp and import to do the replication (along with requisite 
changes to export and import that would allow state management).

The big thing missing from this picture is that while it works, it is pretty 
dumb about how it works in that it will exhaustively process every single event 
generated, and will try to do the export-distcp-import cycle for all 
modifications, irrespective of whether or not that will actually get used at 
import time.

We need to build some sort of filtering logic which can process a batch of 
events to identify events that will result in effective no-ops, and to nullify 
those events from the stream before passing them on. The goal is to minimize 
the number of events that the tools like Falcon would actually have to process.

Examples of cases where event nullification would take place:

a) CREATE-DROP cases: If an object is being created in event#34 that will 
eventually get dropped in event#47, then there is no point in replicating this 
along. We simply null out both these events, and also, any other event that 
references this object between event#34 and event#47.

b) APPEND-APPEND : Some objects are replicated wholesale, which means every 
APPEND that occurs would cause a full export of the object in question. At this 
point, the prior APPENDS would all be supplanted by the last APPEND. Thus, we 
could nullify all the prior such events. 

Additional such cases can be inferred by analysis of the Export-Import relay 
protocol definition at 
https://issues.apache.org/jira/secure/attachment/12725999/EXIMReplicationReplayProtocol.pdf
 or by reasoning out various event processing orders possible.

Replication, as implemented by HIVE-7973 is merely a first step for functional 
support. This work is needed for replication to be efficient at all, and thus, 
usable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12937) DbNotificationListener unable to clean up old notification events

2016-01-26 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-12937:
---

 Summary: DbNotificationListener unable to clean up old 
notification events
 Key: HIVE-12937
 URL: https://issues.apache.org/jira/browse/HIVE-12937
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.1, 1.3.0, 2.0.0, 2.1.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


There is a bug in ObjectStore, where we use pm.deletePersistent instead of 
pm.deletePersistentAll, which causes the persistenceManager to try and drop a 
org.datanucleus.store.rdbms.query.ForwardQueryResult instead of the appropriate 
associated org.apache.hadoop.hive.metastore.model.MNotificationLog.

This results in an error that looks like this:

{noformat}
Exception in thread "CleanerThread" 
org.datanucleus.api.jdo.exceptions.ClassNotPersistenceCapableException: The 
class "org.datanucleus.store.rdbms.query.ForwardQueryResult" is not 
persistable. This means that it either hasnt been enhanced, or that the 
enhanced version of the file is not in the CLASSPATH (or is hidden by an 
unenhanced version), or the Meta-Data/annotations for the class are not found.
at 
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:380)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.jdoDeletePersistent(JDOPersistenceManager.java:807)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.deletePersistent(JDOPersistenceManager.java:820)
at 
org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:7149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
at com.sun.proxy.$Proxy0.cleanNotificationEvents(Unknown Source)
at 
org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:277)
NestedThrowablesStackTrace:
The class "org.datanucleus.store.rdbms.query.ForwardQueryResult" is not 
persistable. This means that it either hasnt been enhanced, or that the 
enhanced version of the file is not in the CLASSPATH (or is hidden by an 
unenhanced version), or the Meta-Data/annotations for the class are not found.
org.datanucleus.exceptions.ClassNotPersistableException: The class 
"org.datanucleus.store.rdbms.query.ForwardQueryResult" is not persistable. This 
means that it either hasnt been enhanced, or that the enhanced version of the 
file is not in the CLASSPATH (or is hidden by an unenhanced version), or the 
Meta-Data/annotations for the class are not found.
at 
org.datanucleus.ExecutionContextImpl.assertClassPersistable(ExecutionContextImpl.java:5698)
at 
org.datanucleus.ExecutionContextImpl.deleteObjectInternal(ExecutionContextImpl.java:2495)
at 
org.datanucleus.ExecutionContextImpl.deleteObjectWork(ExecutionContextImpl.java:2466)
at 
org.datanucleus.ExecutionContextImpl.deleteObject(ExecutionContextImpl.java:2417)
at 
org.datanucleus.ExecutionContextThreadedImpl.deleteObject(ExecutionContextThreadedImpl.java:245)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.jdoDeletePersistent(JDOPersistenceManager.java:802)
at 
org.datanucleus.api.jdo.JDOPersistenceManager.deletePersistent(JDOPersistenceManager.java:820)
at 
org.apache.hadoop.hive.metastore.ObjectStore.cleanNotificationEvents(ObjectStore.java:7149)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
at com.sun.proxy.$Proxy0.cleanNotificationEvents(Unknown Source)
at 
org.apache.hive.hcatalog.listener.DbNotificationListener$CleanerThread.run(DbNotificationListener.java:277)
{noformat}

The end result of this bug is that users of DbNotificationListener will have an 
evergrowing number of notification events that are not cleaned up as they age. 
This is an easy enough fix, but also shows that we have a lack of code coverage 
here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()

2016-01-14 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-12875:
---

 Summary: Verify sem.getInputs() and sem.getOutputs()
 Key: HIVE-12875
 URL: https://issues.apache.org/jira/browse/HIVE-12875
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


For every partition entity object present in sem.getInputs() and 
sem.getOutputs(), we must ensure that the appropriate Table is also added to 
the list of entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12630) Import should create a new WriteEntity for the new table it's creating to mimic CREATETABLE behaviour

2015-12-09 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-12630:
---

 Summary: Import should create a new WriteEntity for the new table 
it's creating to mimic CREATETABLE behaviour
 Key: HIVE-12630
 URL: https://issues.apache.org/jira/browse/HIVE-12630
 Project: Hive
  Issue Type: Bug
  Components: Authorization, Import/Export
Affects Versions: 1.2.0, 1.3.0, 2.0.0, 2.1.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


CREATE-TABLE creates a new WriteEntity for the new table being created, whereas 
IMPORT does not mimic that behaviour.

While SQLStandardAuth itself does not care about this difference, external 
Authorizers, as with Ranger can and do make a distinction on this, and can have 
policies set up on patterns for objects that do not yet exist. Thus, we must 
emit a WriteEntity for the yet-to-be-created table as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12345) Followup for HIVE-9013 : Hidden commands still visible through beeline

2015-11-05 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-12345:
---

 Summary: Followup for HIVE-9013 : Hidden commands still visible 
through beeline
 Key: HIVE-12345
 URL: https://issues.apache.org/jira/browse/HIVE-12345
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


HIVE-9013 introduced the ability to hide certain conf variables when output 
through the "set" command. However, there still exists one further bug in it 
that causes these variables to still be visible through beeline connecting to 
HS2, wherein HS2 exposes hidden variables such as the HS2's metastore password 
when "set" is run.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-12083) HIVE-10965 introduces thrift error if partNames or colNames are empty

2015-10-09 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-12083:
---

 Summary: HIVE-10965 introduces thrift error if partNames or 
colNames are empty
 Key: HIVE-12083
 URL: https://issues.apache.org/jira/browse/HIVE-12083
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


In the fix for HIVE-10965, there is a short-circuit path that causes an empty 
AggrStats object to be returned if partNames is empty or colNames is empty:

{code}
diff --git 
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
index 0a56bac..ed810d2 100644
--- metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
+++ metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
@@ -1100,6 +1100,7 @@ public ColumnStatistics getTableStats(
   public AggrStats aggrColStatsForPartitions(String dbName, String tableName,
   List partNames, List colNames, boolean 
useDensityFunctionForNDVEstimation)
   throws MetaException {
+if (colNames.isEmpty() || partNames.isEmpty()) return new AggrStats(); // 
Nothing to aggregate.
 long partsFound = partsFoundForPartitions(dbName, tableName, partNames, 
colNames);
 List colStatsList;
 // Try to read from the cache first
{code}

This runs afoul of thrift requirements that AggrStats have required fields:

{code}
struct AggrStats {
1: required list colStats,
2: required i64 partsFound // number of partitions for which stats were found
}
{code}

Thus, we get errors as follows:

{noformat}
2015-10-08 00:00:25,413 ERROR server.TThreadPoolServer 
(TThreadPoolServer.java:run(213)) - Thrift error occurred during processing of 
message.
org.apache.thrift.protocol.TProtocolException: Required field 'colStats' is 
unset! Struct:AggrStats(colStats:null, partsFound:0)
at 
org.apache.hadoop.hive.metastore.api.AggrStats.validate(AggrStats.java:389)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.validate(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result$get_aggr_stats_for_resultStandardScheme.write(ThriftHiveMetastore.java)
at 
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_aggr_stats_for_result.write(ThriftHiveMetastore.java)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:53)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:536)
at 
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

Normally, this would not occur since HIVE-10965 does also include a guard on 
the client-side for colNames.isEmpty() to not call the metastore call at all, 
but there is no guard for partNames being empty, and would still cause an error 
on the metastore side if the thrift call were called directly, as would happen 
if the client is from an odler version before this was patched.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11936) Support SQLAnywhere as a backing DB for the hive metastore

2015-09-23 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-11936:
---

 Summary: Support SQLAnywhere as a backing DB for the hive metastore
 Key: HIVE-11936
 URL: https://issues.apache.org/jira/browse/HIVE-11936
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


I've had pings from people interested in enabling the metastore to work on top 
of SQLAnywhere (17+), and thus, opening this jira to track changes needed in 
hive to make SQLAnywhere work as a backing db for the metastore.

I have it working and passing all tests currently in my setup, and will upload 
patches as I'm able to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11852) numRows and rawDataSize table properties are not replicated

2015-09-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-11852:
---

 Summary: numRows and rawDataSize table properties are not 
replicated
 Key: HIVE-11852
 URL: https://issues.apache.org/jira/browse/HIVE-11852
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Affects Versions: 1.2.1
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


numRows and rawDataSize table properties are not replicated when exported for 
replication and re-imported.

{code}
Table drdbnonreplicatabletable.vanillatable has different TblProps from 
drdbnonreplicatabletable.vanillatable expected [{numFiles=1, numRows=2, 
totalSize=560, rawDataSize=440}] but found [{numFiles=1, totalSize=560}]
java.lang.AssertionError: Table drdbnonreplicatabletable.vanillatable has 
different TblProps from drdbnonreplicatabletable.vanillatable expected 
[{numFiles=1, numRows=2, totalSize=560, rawDataSize=440}] but found 
[{numFiles=1, totalSize=560}]
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11697) Add Unit Test to test serializability/deserializability of HCatSplits

2015-08-31 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-11697:
---

 Summary: Add Unit Test to test serializability/deserializability 
of HCatSplits
 Key: HIVE-11697
 URL: https://issues.apache.org/jira/browse/HIVE-11697
 Project: Hive
  Issue Type: Test
Reporter: Sushanth Sowmyan


As HIVE-11344 found, we should have unit tests for this scenario, and we need 
to add one in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11585) Explicitly set pmf.setDetachAllOnCommit on metastore unless configured otherwise

2015-08-17 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-11585:
---

 Summary: Explicitly set pmf.setDetachAllOnCommit on metastore 
unless configured otherwise
 Key: HIVE-11585
 URL: https://issues.apache.org/jira/browse/HIVE-11585
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


datanucleus.detachAllOnCommit has a default value of false. However, we've 
observed a number of objects (especially FieldSchema objects) being retained  
that causes us OOM issues on the metastore. Hive should prefer using a default 
of datanucleus.detachAllOnCommit as true, unless otherwise explicitly 
overridden by users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11344) HIVE-9845 makes HCatSplit.write modify the split so that PartitionInfo objects are unusable after it

2015-07-22 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-11344:
---

 Summary: HIVE-9845 makes HCatSplit.write modify the split so that 
PartitionInfo objects are unusable after it
 Key: HIVE-11344
 URL: https://issues.apache.org/jira/browse/HIVE-11344
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


HIVE-9845 introduced a notion of compression for HCatSplits so that when 
serializing, it finds commonalities between PartInfo and TableInfo objects, and 
if the two are identical, it nulls out that field in PartInfo, thus making sure 
that when PartInfo is then serialized, info is not repeated.

This, however, has the side effect of making the PartInfo object unusable if 
HCatSplit.write has been called.

While this does not affect M/R directly, since they do not know about the 
PartInfo objects and once serialized, the HCatSplit object is recreated by 
deserializing on the backend, which does restore the split and its PartInfo 
objects, this does, however, affect framework users of HCat that try to mimic 
M/R and then use the PartInfo objects to instantiate distinct readers.

Thus, we need to make it so that PartInfo is still usable after HCatSplit.write 
is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11059) hcatalog-server-extensions tests scope should depend on hive-exec

2015-06-19 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-11059:
---

 Summary: hcatalog-server-extensions tests scope should depend on 
hive-exec
 Key: HIVE-11059
 URL: https://issues.apache.org/jira/browse/HIVE-11059
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.1
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11047) Update versions of branch-1.2 to 1.2.1

2015-06-18 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-11047:
---

 Summary: Update versions of branch-1.2 to 1.2.1
 Key: HIVE-11047
 URL: https://issues.apache.org/jira/browse/HIVE-11047
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11039) Write a tool to allow people with datanucelus.identifierFactory=datanucleus2 to migrate their metastore to datanucleus1 naming

2015-06-17 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-11039:
---

 Summary: Write a tool to allow people with 
datanucelus.identifierFactory=datanucleus2 to migrate their metastore to 
datanucleus1 naming
 Key: HIVE-11039
 URL: https://issues.apache.org/jira/browse/HIVE-11039
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Critical


We hit an interesting bug in a case where datanucleus.identifierFactory = 
datanucleus2 .

The problem is that directSql handgenerates SQL strings assuming datanucleus1 
naming scheme. If a user has their metastore JDO managed by 
datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are 
incorrect.

One simple example of what this results in is the following: whenever DN 
persists a field which is held as a ListT, it winds up storing each T as a 
separate line in the appropriate mapping table, and has a column called 
INTEGER_IDX, which holds the position in the list. Then, upon reading, it 
automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which 
results in the list retaining its order. In DN2 naming scheme, the column is 
called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool 
upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX 
and IDX.

Whenever they use JDO, such as with all writes, it will then use the IDX field, 
and when they do any sort of optimized reads, such as through directSQL, it 
will ORDER BY INTEGER_IDX.

An immediate danger is seen when we consider that the schema of a table is 
stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX 
will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch 
schema for the table can come up mixed up in the table's native hashing order, 
rather than sorted by the index.

This can then result in schema ordering being different from the actual table. 
For eg:, if a user has a (a:int,b:string,c:string), a describe on this may 
return (c:string, a:int, b: string), and thus, queries which are inserting 
after selecting from another table can have ClassCastExceptions when trying to 
insert data in the wong order - this is how we discovered this bug. This 
problem, however, can be far worse, if there are no type problems - it is 
possible, for eg., that if a,bc were all strings, that that insert query would 
succeed but mix up the order, which then results in user table data being mixed 
up. This has the potential to be very bad.

We should write a tool to help convert metastores that use datanucleus2 to 
datanucleus1(more difficult, needs more one-time testing) or change directSql 
to support both(easier to code, but increases test-coverage matrix 
significantly and we should really then be testing against both schemes). But 
in the short term, we should disable directSql if we see that the 
identifierfactory is datanucleus2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11023) Disable directSQL if datanucleus.identifierFactory = datanucleus2

2015-06-16 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-11023:
---

 Summary: Disable directSQL if datanucleus.identifierFactory = 
datanucleus2
 Key: HIVE-11023
 URL: https://issues.apache.org/jira/browse/HIVE-11023
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 1.3.0, 1.2.1, 2.0.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


We hit an interesting bug in a case where datanucleus.identifierFactory = 
datanucleus2 .

The problem is that directSql handgenerates SQL strings assuming datanucleus1 
naming scheme. If a user has their metastore JDO managed by 
datanucleus.identifierFactory = datanucleus2 , the SQL strings we generate are 
incorrect.

One simple example of what this results in is the following: whenever DN 
persists a field which is held as a ListT, it winds up storing each T as a 
separate line in the appropriate mapping table, and has a column called 
INTEGER_IDX, which holds the position in the list. Then, upon reading, it 
automatically reads all relevant lines with an ORDER BY INTEGER_IDX, which 
results in the list retaining its order. In DN2 naming scheme, the column is 
called IDX, instead of INTEGER_IDX. If the user has run appropriate metatool 
upgrade scripts, it is highly likely that they have both columns, INTEGER_IDX 
and IDX.

Whenever they use JDO, such as with all writes, it will then use the IDX field, 
and when they do any sort of optimized reads, such as through directSQL, it 
will ORDER BY INTEGER_IDX.

An immediate danger is seen when we consider that the schema of a table is 
stored as a ListFieldSchema , and while IDX has 0,1,2,3,... , INTEGER_IDX 
will contain 0,0,0,0,... and thus, any attempt to describe the table or fetch 
schema for the table can come up mixed up in the table's native hashing order, 
rather than sorted by the index.

This can then result in schema ordering being different from the actual table. 
For eg:, if a user has a (a:int,b:string,c:string), a describe on this may 
return (c:string, a:int, b: string), and thus, queries which are inserting 
after selecting from another table can have ClassCastExceptions when trying to 
insert data in the wong order - this is how we discovered this bug. This 
problem, however, can be far worse, if there are no type problems - it is 
possible, for eg., that if a,bc were all strings, that that insert query would 
succeed but mix up the order, which then results in user table data being mixed 
up. This has the potential to be very bad.

We should write a tool to help convert metastores that use datanucleus2 to 
datanucleus1(more difficult, needs more one-time testing) or change directSql 
to support both(easier to code, but increases test-coverage matrix 
significantly and we should really then be testing against both schemes). But 
in the short term, we should disable directSql if we see that the 
identifierfactory is datanucleus2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10892) TestHCatClient should not accept external metastore param from -Dhive.metastore.uris

2015-06-02 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10892:
---

 Summary: TestHCatClient should not accept external metastore param 
from -Dhive.metastore.uris
 Key: HIVE-10892
 URL: https://issues.apache.org/jira/browse/HIVE-10892
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


HIVE-10074 added the ability to specify a -Dhive.metastore.uris from the 
commandline, so as to run the test against a deployed metastore.

However, because of the way HiveConf is written, this results in that parameter 
always overriding any value specified in the conf passed into it for 
instantiation, since it accepts System Var Overrides. This results in some 
tests, notably those that attempt to connect between two metastores (such as 
TestHCatClient#testPartitionRegistrationWithCustomSchema to fail.

Fixing this in HiveConf is not a good idea, since that behaviour is desired for 
HiveConf. Fixing this in HCatUtil.getHiveConf doesn't really work either, since 
that is a utility wrapper on HiveConf, and is supposed to behave similarly. 
Thus, the fix for this then becomes something to use in all our testcases, 
where we instantiate Configuration objects. It seems more appropriate to change 
the parameter we use to specify test parameters then, than to change each 
config object.

Thus, we should change semantics for running this test against an external 
metastore by specifying the override in a different parameter name, say 
test.hive.metastore.uris, instead of hive.metastore.uris, which has a specific 
meaning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10715) RAT failures - many files do not have ASF licenses

2015-05-14 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10715:
---

 Summary: RAT failures - many files do not have ASF licenses
 Key: HIVE-10715
 URL: https://issues.apache.org/jira/browse/HIVE-10715
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Lots of files do not have proper ASF headers included in. We should add them in.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10674) HIVE-9302 introduces 2 jars in the source control repo

2015-05-11 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10674:
---

 Summary: HIVE-9302 introduces 2 jars in the source control repo
 Key: HIVE-10674
 URL: https://issues.apache.org/jira/browse/HIVE-10674
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0, 1.3.0
Reporter: Sushanth Sowmyan
Assignee: Ferdinand Xu
Priority: Blocker


The 2 jars added by HIVE-9302 run afoul of the source package generation as 
part of the 1.2 release, since a source package is not supposed to contain any 
binaries. If we have binaries, they're supposed to be brought in as a download 
step during the compile or test-compile phase from a well-known published 
location such as a maven repository. The postgres jar we can depend on as a 
download, and it is an open source product that is compatible with the Apache 
License, but DummyDriver is worse, because there is no source attached to it 
either, which makes it not okay to include in the binary release of hive either.

Thus, for branch-1.2, I am going to do a git rm of those two jars right away. 
This, unfortunately, might cause a few tests added here to fail for branch-1.2, 
but this should be acceptable for the time being.

I'm opening this jira to track the following:

a) git rm of the postgres and DummyDriver jar from master
b) adding source code for DummyDriver into master, and changing the build so we 
depend on it being compiled, rather than included from test-resources.
c) changing the postgres inclusion to a download.

This should also be applied to branch-1.2 after release, preferably before 
1.2.1, so that future updates of 1.2 have this fixed as well.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10676) Update Hive's README to mention spark, and to remove jdk1.6

2015-05-11 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10676:
---

 Summary: Update Hive's README to mention spark, and to remove 
jdk1.6
 Key: HIVE-10676
 URL: https://issues.apache.org/jira/browse/HIVE-10676
 Project: Hive
  Issue Type: Task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
Priority: Trivial


a) Hive's README file mentions only 2 execution frameworks, and does not 
mention spark. We should add that in.

b) We should remove jdk1.6 from the README, since hive no longer supports or 
even compiles under jdk1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10638) HIVE-9736 introduces issues with Hadoop23Shims.checkFileAccess

2015-05-06 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10638:
---

 Summary: HIVE-9736 introduces issues with 
Hadoop23Shims.checkFileAccess
 Key: HIVE-10638
 URL: https://issues.apache.org/jira/browse/HIVE-10638
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan


Copy-pasting [~spena]'s comment in HIVE-9736:

Hi [~mithun]

This patch is causing the above tests to fail due to the change on 
{{Hadoop23Shims.checkFileAccess(FileSystem fs, IteratorFileStatus statuses, 
EnumSetFsAction actions)}}. 

The line that fails is {{accessMethod.invoke(fs, statuses.next(), 
combine(actions));}}

I an running hadoop 2.6.0, and the FileSystem.access() object accepts a Path 
and FsAction. When I run the code that checks patch permissions, I get this 
error: 
{noformat}
hive explain select * from a join b on a.id = b.id;
FAILED: SemanticException Unable to determine if 
hdfs://localhost:9000/user/hive/warehouse/a is read only: 
java.lang.IllegalArgumentException: argument type mismatch
{noformat}

Is there a follow-up jira for this error?








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10562) Add version column to NOTIFICATION_LOG table and DbNotificationListener

2015-04-30 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10562:
---

 Summary: Add version column to NOTIFICATION_LOG table and 
DbNotificationListener
 Key: HIVE-10562
 URL: https://issues.apache.org/jira/browse/HIVE-10562
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan


Currently, we have a JSON encoded message being stored in the NOTIFICATION_LOG 
table.

If we want to be future proof, we need to allow for versioning of this message, 
since we might change what gets stored in the message. A prime example of what 
we'd want to change is as in HIVE-10393.

MessageFactory already has stubs to allow for versioning of messages, and we 
could expand on this further in the future. NotificationListener currently 
encodes the message version into the header for the JMS message it sends, which 
seems to be the right place for a message version (instead of being contained 
in the message, for eg.).

So, we should have a similar ability for DbEventListener as well, and the place 
this makes the most sense is to and add a version column to the 
NOTIFICATION_LOG table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10536) HIVE-10223 breaks -Phadoop-1

2015-04-29 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10536:
---

 Summary: HIVE-10223 breaks -Phadoop-1
 Key: HIVE-10536
 URL: https://issues.apache.org/jira/browse/HIVE-10536
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan


Looks like HIVE-10223 broke -Phadoop-1 compatibility for compilation, from the 
looks of it. We need to fix it. Even if we decide to drop support for 
-Phadoop-1 in master, we should fix it for branch-1.2

{noformat}
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/Users/sush/dev/hive.git/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java:[515,19]
 cannot find symbol
  symbol:   method isFile()
  location: variable fileStatus of type org.apache.hadoop.fs.FileStatus
[ERROR] 
/Users/sush/dev/hive.git/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java:[545,26]
 cannot find symbol
  symbol:   method isDirectory()
  location: variable fileStatus of type org.apache.hadoop.fs.FileStatus
[INFO] 2 errors 
[INFO] -
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10509) Bump trunk version to 1.3 as branch-1.2 has been created.

2015-04-27 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10509:
---

 Summary: Bump trunk version to 1.3 as branch-1.2 has been created.
 Key: HIVE-10509
 URL: https://issues.apache.org/jira/browse/HIVE-10509
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10510) Change 1.2.0-SNAPSHOT to 1.2.0 in branch-1.2

2015-04-27 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10510:
---

 Summary: Change 1.2.0-SNAPSHOT to 1.2.0 in branch-1.2
 Key: HIVE-10510
 URL: https://issues.apache.org/jira/browse/HIVE-10510
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10517) HCatPartition should not be created with as location in tests

2015-04-27 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10517:
---

 Summary: HCatPartition should not be created with  as location 
in tests
 Key: HIVE-10517
 URL: https://issues.apache.org/jira/browse/HIVE-10517
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Tests in TestHCatClient and TestCommands wind up instantiating HCatPartition 
with a dummy empty String as location. This causes test failures when run 
against an existing metastore, as introduced by HIVE-10074.

We need to instantiate actual values instead of dummy  strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10426) Rework/simplify ReplicationTaskFactory instantiation

2015-04-21 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10426:
---

 Summary: Rework/simplify ReplicationTaskFactory instantiation
 Key: HIVE-10426
 URL: https://issues.apache.org/jira/browse/HIVE-10426
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Creating a new jira to continue discussions of what ReplicationTask.Factory 
instantiation should look like.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10393) Make AddPartitionMessage and DropPartitionMessage leaner

2015-04-19 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10393:
---

 Summary: Make AddPartitionMessage and DropPartitionMessage leaner
 Key: HIVE-10393
 URL: https://issues.apache.org/jira/browse/HIVE-10393
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan


AddPartitionMessage and DropPartitionMessage currently contain a 
ListMapString,String ptnKeyValues to store a list of partitions, each by 
its set of key-values. This results in a lot of duplication, since the 
partition keys are the same across them.

So, we should split that into two getters:

a) ListString getPtnKeys
b) ListListString getPtnPartVals

That way we store the entire info, but for larger messages, cut the storage 
required nearly by half.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10395) PreAddPartitionEvent should also be updated to use Iterator semantics

2015-04-19 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10395:
---

 Summary: PreAddPartitionEvent should also be updated to use 
Iterator semantics
 Key: HIVE-10395
 URL: https://issues.apache.org/jira/browse/HIVE-10395
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan


HIVE-9609 added Iterator semantics (for better memory footprints where 
possible) for AddPartitionEvent, but not PreAddPartitionEvent. HIVE-9674 adds 
it for DropPartitionEvent and PreDropPartitionEvent. We should update 
PreAddPartitionEvent as well, to include these semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10381) Allow pushing of property-key-value based predicate filter to Metastore dropPartitions

2015-04-17 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10381:
---

 Summary: Allow pushing of property-key-value based predicate 
filter to Metastore dropPartitions
 Key: HIVE-10381
 URL: https://issues.apache.org/jira/browse/HIVE-10381
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan


After HIVE-10228, we now have a case wherein we assume, based on our knowledge 
of how replication will work, that DROP_PARTITION replication will act on a 
per-partition level.

In order to be robust, however, we do handle the case where this might not be 
the case, wherein the DROP_PARTITION replication command generated would work 
across multiple partitions, it's just that that handling is very inefficient if 
ever used, since it has to fetch each of those partitions from the metastore, 
decide whether or not it matches the ReplicationSpec.allowEventReplacementInto 
filter, and then drop it.

Ideally, we should allow pushing this filter to the metastore, and let the 
metastore do a smart drop based on that.

{code}
ReplicationSpec replicationSpec = dropTbl.getReplicationSpec();
if (replicationSpec.isInReplicationScope()){
  // TODO: Current implementation of replication will result in 
DROP_PARTITION under replication
  // scope being called per-partition instead of multiple partitions. 
However, to be robust, we
  // must still handle the case of multiple partitions in case this 
assumption changes in the
  // future. However, if this assumption changes, we will not be very 
performant if we fetch
  // each partition one-by-one, and then decide on inspection whether or 
not this is a candidate
  // for dropping. Thus, we need a way to push this filter 
(replicationSpec.allowEventReplacementInto)
  // to the  metastore to allow it to do drop a partition or not, depending 
on a Predicate on the
  // parameter key values.
  for (DropTableDesc.PartSpec partSpec : dropTbl.getPartSpecs()){
try {
  for (Partition p : Iterables.filter(
  db.getPartitionsByFilter(tbl, 
partSpec.getPartSpec().getExprString()),
  replicationSpec.allowEventReplacementInto())){

db.dropPartition(tbl.getDbName(),tbl.getTableName(),p.getValues(),true);
  }
} catch (NoSuchObjectException e){
  // ignore NSOE because that means there's nothing to drop.
} catch (Exception e) {
  throw new HiveException(e.getMessage(), e);
}
  }
  return;
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10264) Document Replication support on wiki

2015-04-08 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10264:
---

 Summary: Document Replication support on wiki
 Key: HIVE-10264
 URL: https://issues.apache.org/jira/browse/HIVE-10264
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10267) HIVE-9664 makes hive depend on ivysettings.xml

2015-04-08 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10267:
---

 Summary: HIVE-9664 makes hive depend on ivysettings.xml
 Key: HIVE-10267
 URL: https://issues.apache.org/jira/browse/HIVE-10267
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Anant Nag


HIVE-9664 makes hive depend on the existence of ivysettings.xml, and if it is 
not present, it makes hive NPE when instantiating a CLISessionState.

{noformat}
java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.session.DependencyResolver.init(DependencyResolver.java:61)
at org.apache.hadoop.hive.ql.session.SessionState.init(SessionState.java:343)
at org.apache.hadoop.hive.ql.session.SessionState.init(SessionState.java:334)
at org.apache.hadoop.hive.cli.CliSessionState.init(CliSessionState.java:60)
{noformat}

This happens because of the following bit:

{noformat}
// If HIVE_HOME is not defined or file is not found in HIVE_HOME/conf then 
load default ivysettings.xml from class loader
if (ivysettingsPath == null || !(new File(ivysettingsPath).exists())) {
  ivysettingsPath = 
ClassLoader.getSystemResource(ivysettings.xml).getFile();
  _console.printInfo(ivysettings.xml file not found in HIVE_HOME or 
HIVE_CONF_DIR, + ivysettingsPath +  will be used);
}
{noformat}

This makes it so that an attempt to instantiate CliSessionState without an 
ivysettings.xml file will cause hive to fail with an NPE. Hive should not have 
a hard dependency on a ivysettings,xml being present, and this feature should 
gracefully fail in that case instead.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10272) Some HCat tests fail under windows

2015-04-08 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10272:
---

 Summary: Some HCat tests fail under windows
 Key: HIVE-10272
 URL: https://issues.apache.org/jira/browse/HIVE-10272
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


Some HCat tests fail under windows with errors like this:

{noformat}
java.lang.RuntimeException: java.lang.IllegalArgumentException: Pathname 
/D:/w/hv/hcatalog/hcatalog-pig-adapter/target/tmp/scratchdir from 
D:/w/hv/hcatalog/hcatalog-pig-adapter/target/tmp/scratchdir is not a valid DFS 
filename.
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424)
at 
org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:594)
at 
org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:552)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504)
at 
org.apache.hive.hcatalog.pig.TestHCatLoaderEncryption.setup(TestHCatLoaderEncryption.java:185)
{noformat}

We need to sanitize HiveConf objects with 
WindowsPathUtil.convertPathsFromWindowsToHdfs if running under windows before 
we use them to instantiate a SessionState/Driver



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10251) HIVE-9664 makes hive depend on ivysettings.xml

2015-04-07 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10251:
---

 Summary: HIVE-9664 makes hive depend on ivysettings.xml
 Key: HIVE-10251
 URL: https://issues.apache.org/jira/browse/HIVE-10251
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan


HIVE-9664 makes hive depend on the existence of ivysettings.xml, and if it is 
not present, it makes hive NPE when instantiating a CLISessionState.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10227) Concrete implementation of Export/Import based ReplicationTaskFactory

2015-04-06 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10227:
---

 Summary: Concrete implementation of Export/Import based 
ReplicationTaskFactory
 Key: HIVE-10227
 URL: https://issues.apache.org/jira/browse/HIVE-10227
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-10228) Changes to Hive Export/Import/DropTable/DropPartition to support replication semantics

2015-04-06 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-10228:
---

 Summary: Changes to Hive Export/Import/DropTable/DropPartition to 
support replication semantics
 Key: HIVE-10228
 URL: https://issues.apache.org/jira/browse/HIVE-10228
 Project: Hive
  Issue Type: Sub-task
Reporter: Sushanth Sowmyan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9609) AddPartitionMessage.getPartitions() can return null

2015-02-09 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14313359#comment-14313359
 ] 

Sushanth Sowmyan commented on HIVE-9609:


Oh, one more thing. Changing HiveMetaStore.fireMetaStoreAddPartitionEvent() to 
eschew ListPartition does not help us, unless we remove add_partitions 
itself, since we were sent in the ListPartition already, and that is a call 
on which we should not break backward compatibility.

I'm retaining AddPartitionEvent as effectively a union of 
ListPartition/PartitionSpec as it currently stands - but I am changing it so 
that the one true way of accessing AddPartitionEvent is through 
getPartitionIterator, which works correctly for both cases. In addition, I'm 
outright removing the getPartitions() method that was there earlier and forcing 
use of the iterator, since that will not cause a PSpec-based add partitions to 
fail when the event is processed.

 AddPartitionMessage.getPartitions() can return null
 ---

 Key: HIVE-9609
 URL: https://issues.apache.org/jira/browse/HIVE-9609
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9609.2.patch, HIVE-9609.patch


 DbNotificationListener and NotificationListener both depend on 
 AddPartitionEvent.getPartitions() to get their partitions to trigger a 
 message, but this can be null if an AddPartitionEvent was initialized on a 
 PartitionSpec rather than a ListPartition.
 Also, AddPartitionEvent seems to have a duality, where getPartitions() works 
 only if instantiated on a ListPartition, and getPartitionIterator() works 
 only if instantiated on a PartitionSpec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9609) AddPartitionMessage.getPartitions() can return null

2015-02-09 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9609:
---
Status: Open  (was: Patch Available)

 AddPartitionMessage.getPartitions() can return null
 ---

 Key: HIVE-9609
 URL: https://issues.apache.org/jira/browse/HIVE-9609
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9609.patch


 DbNotificationListener and NotificationListener both depend on 
 AddPartitionEvent.getPartitions() to get their partitions to trigger a 
 message, but this can be null if an AddPartitionEvent was initialized on a 
 PartitionSpec rather than a ListPartition.
 Also, AddPartitionEvent seems to have a duality, where getPartitions() works 
 only if instantiated on a ListPartition, and getPartitionIterator() works 
 only if instantiated on a PartitionSpec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9609) AddPartitionMessage.getPartitions() can return null

2015-02-09 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9609:
---
Attachment: HIVE-9609.2.patch

Attaching more aggressive v2, removes the getPartitions() call altogether, 
removes PartitionSpecProxy usage and ListPartition usage from MessageFactory.

 AddPartitionMessage.getPartitions() can return null
 ---

 Key: HIVE-9609
 URL: https://issues.apache.org/jira/browse/HIVE-9609
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9609.2.patch, HIVE-9609.patch


 DbNotificationListener and NotificationListener both depend on 
 AddPartitionEvent.getPartitions() to get their partitions to trigger a 
 message, but this can be null if an AddPartitionEvent was initialized on a 
 PartitionSpec rather than a ListPartition.
 Also, AddPartitionEvent seems to have a duality, where getPartitions() works 
 only if instantiated on a ListPartition, and getPartitionIterator() works 
 only if instantiated on a PartitionSpec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9609) AddPartitionMessage.getPartitions() can return null

2015-02-07 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9609:
---
Attachment: HIVE-9609.patch

Patch attached.

 AddPartitionMessage.getPartitions() can return null
 ---

 Key: HIVE-9609
 URL: https://issues.apache.org/jira/browse/HIVE-9609
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9609.patch


 DbNotificationListener and NotificationListener both depend on 
 AddPartitionEvent.getPartitions() to get their partitions to trigger a 
 message, but this can be null if an AddPartitionEvent was initialized on a 
 PartitionSpec rather than a ListPartition.
 Also, AddPartitionEvent seems to have a duality, where getPartitions() works 
 only if instantiated on a ListPartition, and getPartitionIterator() works 
 only if instantiated on a PartitionSpec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9609) AddPartitionMessage.getPartitions() can return null

2015-02-07 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9609:
---
Status: Patch Available  (was: Open)

 AddPartitionMessage.getPartitions() can return null
 ---

 Key: HIVE-9609
 URL: https://issues.apache.org/jira/browse/HIVE-9609
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9609.patch


 DbNotificationListener and NotificationListener both depend on 
 AddPartitionEvent.getPartitions() to get their partitions to trigger a 
 message, but this can be null if an AddPartitionEvent was initialized on a 
 PartitionSpec rather than a ListPartition.
 Also, AddPartitionEvent seems to have a duality, where getPartitions() works 
 only if instantiated on a ListPartition, and getPartitionIterator() works 
 only if instantiated on a PartitionSpec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9609) AddPartitionMessage.getPartitions() can return null

2015-02-07 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311094#comment-14311094
 ] 

Sushanth Sowmyan commented on HIVE-9609:


Ideally, I'd like to fix this by changing AddPartitionEvent.getPartitions() to 
return an IterablePartition, and make that work either way, but I'm not 
certain if I will tread on any toes if I change that, since this has been 
public interface for a while - also, I'm not certain if expecting null from 
getPartitions might be used in any code to determine if this is a 
ListPartition based or PartitionSpec-based AddPartitionEvent. So, I've not 
messed with the current implementation of getPartitions. That said, [~mithun], 
could you please comment if you're okay with me fixing getPartitions so that it 
doesn't return null in the case where it has been instantiated from a 
PartitionSpec? I could at the very least do that.

Also, to handle the base problem, we should fix 
AddPartitionEvent.getPartitionIterator to correctly work in correctly in both 
cases - this should at least not be controversial.

After that, we should change MessageFactory.buildAddPartitionMessage to work on 
IteratorPartition rather than ListPartition - this is trivially fixable, 
and have JSONMessageFactory use that instead, thereby solving our initial 
problem of getPartitions call from AddPartitionEvent not being usable in cases 
of events fired with PartitionSpec rather than ListPartition

 AddPartitionMessage.getPartitions() can return null
 ---

 Key: HIVE-9609
 URL: https://issues.apache.org/jira/browse/HIVE-9609
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan

 DbNotificationListener and NotificationListener both depend on 
 AddPartitionEvent.getPartitions() to get their partitions to trigger a 
 message, but this can be null if an AddPartitionEvent was initialized on a 
 PartitionSpec rather than a ListPartition.
 Also, AddPartitionEvent seems to have a duality, where getPartitions() works 
 only if instantiated on a ListPartition, and getPartitionIterator() works 
 only if instantiated on a PartitionSpec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9585) AlterPartitionMessage should return getKeyValues instead of getValues

2015-02-07 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9585:
---
Status: Patch Available  (was: Open)

 AlterPartitionMessage should return getKeyValues instead of getValues
 -

 Key: HIVE-9585
 URL: https://issues.apache.org/jira/browse/HIVE-9585
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9585.patch


 HIVE-9175 added in AlterPartitionMessage to use in notification events. 
 However, on trying to write a ReplicationTask implementation on top of that 
 event, I see that I need the key-values from the message, and from a context 
 where I might not have access to a hive client to fetch it myself.
 Thus, the AlterPartitionMessage needs to be changed so as to return 
 getKeyValues as a primary, and we can remove getValues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9609) AddPartitionMessage.getPartitions() can return null

2015-02-07 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9609:
---
Affects Version/s: 1.2.0

 AddPartitionMessage.getPartitions() can return null
 ---

 Key: HIVE-9609
 URL: https://issues.apache.org/jira/browse/HIVE-9609
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9609.patch


 DbNotificationListener and NotificationListener both depend on 
 AddPartitionEvent.getPartitions() to get their partitions to trigger a 
 message, but this can be null if an AddPartitionEvent was initialized on a 
 PartitionSpec rather than a ListPartition.
 Also, AddPartitionEvent seems to have a duality, where getPartitions() works 
 only if instantiated on a ListPartition, and getPartitionIterator() works 
 only if instantiated on a PartitionSpec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9609) AddPartitionMessage.getPartitions() can return null

2015-02-07 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-9609:
--

 Summary: AddPartitionMessage.getPartitions() can return null
 Key: HIVE-9609
 URL: https://issues.apache.org/jira/browse/HIVE-9609
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


DbNotificationListener and NotificationListener both depend on 
AddPartitionEvent.getPartitions() to get their partitions to trigger a message, 
but this can be null if an AddPartitionEvent was initialized on a PartitionSpec 
rather than a ListPartition.

Also, AddPartitionEvent seems to have a duality, where getPartitions() works 
only if instantiated on a ListPartition, and getPartitionIterator() works 
only if instantiated on a PartitionSpec.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9273) Add option to fire metastore event on insert

2015-02-06 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9273:
---
Labels: TODOC1.2  (was: )

 Add option to fire metastore event on insert
 

 Key: HIVE-9273
 URL: https://issues.apache.org/jira/browse/HIVE-9273
 Project: Hive
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Alan Gates
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-9273.2.patch, HIVE-9273.patch


 HIVE-9271 adds the ability for the client to request firing metastore events. 
  This can be used in the MoveTask to fire events when an insert is done that 
 does not add partitions to a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9273) Add option to fire metastore event on insert

2015-02-06 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310028#comment-14310028
 ] 

Sushanth Sowmyan commented on HIVE-9273:


Ah, true, it does add that parameter.

I figured that documentation for this would belong to the replication 
subsystem, and this parameter does not make much sense to override from a 
public point of view, but I guess we should have a line of documentation there 
saying that this enables event firing for DML events, and users can leave it at 
a default value unless they want to enable the replication subsystem to 
replicate appends.

Thanks for the catch!

 Add option to fire metastore event on insert
 

 Key: HIVE-9273
 URL: https://issues.apache.org/jira/browse/HIVE-9273
 Project: Hive
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Alan Gates
  Labels: TODOC1.2
 Fix For: 1.2.0

 Attachments: HIVE-9273.2.patch, HIVE-9273.patch


 HIVE-9271 adds the ability for the client to request firing metastore events. 
  This can be used in the MoveTask to fire events when an insert is done that 
 does not add partitions to a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9577) Fire insert event on HCatalog appends

2015-02-04 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-9577:
--

 Summary: Fire insert event on HCatalog appends
 Key: HIVE-9577
 URL: https://issues.apache.org/jira/browse/HIVE-9577
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan


HIVE-9271 added the ability to fire a listener event on the metastore to notify 
on inserts, and HIVE-9273 added this ability to register data appends in hive 
to trigger this event. HCatalog appends should also trigger this event.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9273) Add option to fire metastore event on insert

2015-02-04 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9273:
---
   Resolution: Fixed
Fix Version/s: 1.2.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Alan!

(Doc note : No docs required for this either, addendum to HIVE-9271, adds in 
implementation on cases where it fires this event)

 Add option to fire metastore event on insert
 

 Key: HIVE-9273
 URL: https://issues.apache.org/jira/browse/HIVE-9273
 Project: Hive
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 1.2.0

 Attachments: HIVE-9273.2.patch, HIVE-9273.patch


 HIVE-9271 adds the ability for the client to request firing metastore events. 
  This can be used in the MoveTask to fire events when an insert is done that 
 does not add partitions to a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9456) Make Hive support unicode with MSSQL as Metastore backend

2015-02-04 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305776#comment-14305776
 ] 

Sushanth Sowmyan commented on HIVE-9456:


Xiaobing, could you please update the patch accordingly to make the 0.14-0.15 
upgrade script a 1.1-1.2 upgrade script?

 Make Hive support unicode with MSSQL as Metastore backend
 -

 Key: HIVE-9456
 URL: https://issues.apache.org/jira/browse/HIVE-9456
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.14.0
Reporter: Xiaobing Zhou
Assignee: Xiaobing Zhou
 Attachments: HIVE-9456.1.patch


 There are significant issues when Hive uses MSSQL as metastore backend to 
 support unicode, since MSSQL handles varchar and nvarchar datatypes 
 differently. Hive 0.14 metastore mssql script DDL was using varchar as 
 datatype, which can't handle multi-bytes/unicode characters, e.g., Chinese 
 chars. This JIRA is going to track implementation of unicode support in that 
 case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9273) Add option to fire metastore event on insert

2015-02-04 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14305795#comment-14305795
 ] 

Sushanth Sowmyan commented on HIVE-9273:


I've filed HIVE-9577 for follow-up work required on HCatalog's end for this.

 Add option to fire metastore event on insert
 

 Key: HIVE-9273
 URL: https://issues.apache.org/jira/browse/HIVE-9273
 Project: Hive
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Alan Gates
 Fix For: 1.2.0

 Attachments: HIVE-9273.2.patch, HIVE-9273.patch


 HIVE-9271 adds the ability for the client to request firing metastore events. 
  This can be used in the MoveTask to fire events when an insert is done that 
 does not add partitions to a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9585) AlterPartitionMessage should return getKeyValues instead of getValues

2015-02-04 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-9585:
--

 Summary: AlterPartitionMessage should return getKeyValues instead 
of getValues
 Key: HIVE-9585
 URL: https://issues.apache.org/jira/browse/HIVE-9585
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan


HIVE-9175 added in AlterPartitionMessage to use in notification events. 
However, on trying to write a ReplicationTask implementation on top of that 
event, I see that I need the key-values from the message, and from a context 
where I might not have access to a hive client to fetch it myself.

Thus, the AlterPartitionMessage needs to be changed so as to return 
getKeyValues as a primary, and we can remove getValues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9585) AlterPartitionMessage should return getKeyValues instead of getValues

2015-02-04 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9585:
---
Attachment: HIVE-9585.patch

Patch attached.

 AlterPartitionMessage should return getKeyValues instead of getValues
 -

 Key: HIVE-9585
 URL: https://issues.apache.org/jira/browse/HIVE-9585
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9585.patch


 HIVE-9175 added in AlterPartitionMessage to use in notification events. 
 However, on trying to write a ReplicationTask implementation on top of that 
 event, I see that I need the key-values from the message, and from a context 
 where I might not have access to a hive client to fetch it myself.
 Thus, the AlterPartitionMessage needs to be changed so as to return 
 getKeyValues as a primary, and we can remove getValues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9273) Add option to fire metastore event on insert

2015-02-03 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303881#comment-14303881
 ] 

Sushanth Sowmyan commented on HIVE-9273:


+1

 Add option to fire metastore event on insert
 

 Key: HIVE-9273
 URL: https://issues.apache.org/jira/browse/HIVE-9273
 Project: Hive
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Alan Gates
 Attachments: HIVE-9273.2.patch, HIVE-9273.patch


 HIVE-9271 adds the ability for the client to request firing metastore events. 
  This can be used in the MoveTask to fire events when an insert is done that 
 does not add partitions to a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-9550) ObjectStore.getNextNotification() can return events inside NotificationEventResponse as null which conflicts with its thrift required tag

2015-02-02 Thread Sushanth Sowmyan (JIRA)
Sushanth Sowmyan created HIVE-9550:
--

 Summary: ObjectStore.getNextNotification() can return events 
inside NotificationEventResponse as null which conflicts with its thrift 
required tag
 Key: HIVE-9550
 URL: https://issues.apache.org/jira/browse/HIVE-9550
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan


Per hive_metastore.thrift, the events list inside NotificationEventResponse 
is a required field that cannot be null.

{code}
struct NotificationEventResponse {
1: required listNotificationEvent events,
}
{code}

However, per ObjectStore.java, this events field can be uninitialized if the 
events retrieved from the metastore is empty instead of null:

{code}
  NotificationEventResponse result = new NotificationEventResponse();
  int maxEvents = rqst.getMaxEvents()  0 ? rqst.getMaxEvents() : 
Integer.MAX_VALUE;
  int numEvents = 0; 
  while (i.hasNext()  numEvents++  maxEvents) {
result.addToEvents(translateDbToThrift(i.next()));
  }
  return result;
{code}

The fix is simple enough - we need to call result.setEvents(new 
ArrayListNotificationEvent()) before we begin the iteration to do 
result.addToEvents(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9550) ObjectStore.getNextNotification() can return events inside NotificationEventResponse as null which conflicts with its thrift required tag

2015-02-02 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9550:
---
Attachment: HIVE-9550.patch

Patch attached. [~alangates], could you please have a look?

 ObjectStore.getNextNotification() can return events inside 
 NotificationEventResponse as null which conflicts with its thrift required 
 tag
 ---

 Key: HIVE-9550
 URL: https://issues.apache.org/jira/browse/HIVE-9550
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
 Attachments: HIVE-9550.patch


 Per hive_metastore.thrift, the events list inside NotificationEventResponse 
 is a required field that cannot be null.
 {code}
 struct NotificationEventResponse {
 1: required listNotificationEvent events,
 }
 {code}
 However, per ObjectStore.java, this events field can be uninitialized if the 
 events retrieved from the metastore is empty instead of null:
 {code}
   NotificationEventResponse result = new NotificationEventResponse();
   int maxEvents = rqst.getMaxEvents()  0 ? rqst.getMaxEvents() : 
 Integer.MAX_VALUE;
   int numEvents = 0; 
   while (i.hasNext()  numEvents++  maxEvents) {
 result.addToEvents(translateDbToThrift(i.next()));
   }
   return result;
 {code}
 The fix is simple enough - we need to call result.setEvents(new 
 ArrayListNotificationEvent()) before we begin the iteration to do 
 result.addToEvents(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9550) ObjectStore.getNextNotification() can return events inside NotificationEventResponse as null which conflicts with its thrift required tag

2015-02-02 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-9550:
---
Assignee: Sushanth Sowmyan
  Status: Patch Available  (was: Open)

 ObjectStore.getNextNotification() can return events inside 
 NotificationEventResponse as null which conflicts with its thrift required 
 tag
 ---

 Key: HIVE-9550
 URL: https://issues.apache.org/jira/browse/HIVE-9550
 Project: Hive
  Issue Type: Bug
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-9550.patch


 Per hive_metastore.thrift, the events list inside NotificationEventResponse 
 is a required field that cannot be null.
 {code}
 struct NotificationEventResponse {
 1: required listNotificationEvent events,
 }
 {code}
 However, per ObjectStore.java, this events field can be uninitialized if the 
 events retrieved from the metastore is empty instead of null:
 {code}
   NotificationEventResponse result = new NotificationEventResponse();
   int maxEvents = rqst.getMaxEvents()  0 ? rqst.getMaxEvents() : 
 Integer.MAX_VALUE;
   int numEvents = 0; 
   while (i.hasNext()  numEvents++  maxEvents) {
 result.addToEvents(translateDbToThrift(i.next()));
   }
   return result;
 {code}
 The fix is simple enough - we need to call result.setEvents(new 
 ArrayListNotificationEvent()) before we begin the iteration to do 
 result.addToEvents(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   9   10   >