[jira] [Commented] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916968#comment-13916968
 ] 

Sergey Shelukhin commented on HIVE-6429:


tez tests pass for me

> MapJoinKey has large memory overhead in typical cases
> -
>
> Key: HIVE-6429
> URL: https://issues.apache.org/jira/browse/HIVE-6429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
> HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
> HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
> HIVE-6429.09.patch, HIVE-6429.10.patch, HIVE-6429.WIP.patch, HIVE-6429.patch
>
>
> The only thing that MJK really needs it hashCode and equals (well, and 
> construction), so there's no need to have array of writables in there. 
> Assuming all the keys for a table have the same structure, for the common 
> case where keys are primitive types, we can store something like a byte array 
> combination of keys to reduce the memory usage. Will probably speed up 
> compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6530) JDK 7 trunk build fails after HIVE-6418 patch

2014-02-28 Thread Prasad Mujumdar (JIRA)
Prasad Mujumdar created HIVE-6530:
-

 Summary: JDK 7 trunk build fails after HIVE-6418 patch
 Key: HIVE-6530
 URL: https://issues.apache.org/jira/browse/HIVE-6530
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0
Reporter: Prasad Mujumdar
Priority: Blocker


JDK7 build fails with following error 
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hive-exec: Compilation failure
[ERROR] 
/home/prasadm/repos/apache/hive-trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/LazyFlatRowContainer.java:[118,15]
 name clash: add(java.util.List) in 
org.apache.hadoop.hive.ql.exec.persistence.LazyFlatRowContainer overrides a 
method whose erasure is the same as another method, yet neither overrides the 
other
[ERROR] first method:  add(E) in java.util.AbstractCollection
[ERROR] second method: add(ROW) in 
org.apache.hadoop.hive.ql.exec.persistence.AbstractRowContainer
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hive-exec
{noformat}

This LazyFlatRowContainer.java is  a new file added as part of  HIVE-6418 
patch. It's extending AbstractCollection and implements AbstractRowContainer. 
Looks like the both these have a add() method that's conflicting.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6521) WebHCat cannot fetch correct percentComplete for Hive jobs

2014-02-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916931#comment-13916931
 ] 

Hive QA commented on HIVE-6521:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12631797/HIVE-6521.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5185 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1560/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1560/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12631797

> WebHCat cannot fetch correct percentComplete for Hive jobs
> --
>
> Key: HIVE-6521
> URL: https://issues.apache.org/jira/browse/HIVE-6521
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.13.0
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Attachments: HIVE-6521.2.patch, HIVE-6521.patch
>
>
> WebHCat E2E test TestHive_7 failed because percentComplete wasn't returned as 
> expected.
> {noformat}
> check_job_percent_complete failed. got percentComplete "map 0% reduce 0%",  
> expected  "map 100% reduce 100%"
> {noformat}
> So, there are two problems here.
> # The log parsing is broken for status of percentComplete. In the stderr of 
> the job we see:
> {noformat}
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_1393486488858_0691, Tracking URL = 
> http://ambari-sec-1393480847-others-2-4.cs1cloud.internal:8088/proxy/application_1393486488858_0691/
> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1393486488858_0691
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0
> 2014-02-27 18:40:50,166 Stage-1 map = 0%,  reduce = 0%
> 2014-02-27 18:40:56,599 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 
> sec
> 2014-02-27 18:40:57,656 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 
> sec
> 2014-02-27 18:40:58,706 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 
> sec
> MapReduce Total cumulative CPU time: 870 msec
> Ended Job = job_1393486488858_0691
> MapReduce Jobs Launched: 
> Job 0: Map: 1   Cumulative CPU: 0.87 sec   HDFS Read: 305 HDFS Write: 0 
> SUCCESS
> Total MapReduce CPU Time Spent: 870 msec
> {noformat}
> The assumption in the code is that the line containing the percent status 
> will end after "reduce = \d+%" but that fails with the above.
> # The last status from Hive job is "map = 100%,  reduce = 0%" instead of 
> expected "map = 100%,  reduce = 100%".



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6529) Tez output files are out of date

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6529:
---

Status: Patch Available  (was: Open)

just ran minitez test on trunk

> Tez output files are out of date
> 
>
> Key: HIVE-6529
> URL: https://issues.apache.org/jira/browse/HIVE-6529
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-6529.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6529) Tez output files are out of date

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6529:
---

Attachment: HIVE-6529.patch

> Tez output files are out of date
> 
>
> Key: HIVE-6529
> URL: https://issues.apache.org/jira/browse/HIVE-6529
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-6529.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6529) Tez output files are out of date

2014-02-28 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-6529:
--

 Summary: Tez output files are out of date
 Key: HIVE-6529
 URL: https://issues.apache.org/jira/browse/HIVE-6529
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-6529.patch





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6492) limit partition number involved in a table scan

2014-02-28 Thread Selina Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Selina Zhang updated HIVE-6492:
---

Attachment: HIVE-6492.2.patch.txt

> limit partition number involved in a table scan
> ---
>
> Key: HIVE-6492
> URL: https://issues.apache.org/jira/browse/HIVE-6492
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> To protect the cluster, a new configure variable 
> "hive.limit.query.max.table.partition" is added to hive configuration to
> limit the table partitions involved in a table scan. 
> The default value will be set to -1 which means there is no limit by default. 
> This variable will not affect "metadata only" query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916765#comment-13916765
 ] 

Selina Zhang commented on HIVE-6492:


The original patch actually has two tasks included:
1. limit the partition number when a table scan happens:
2. a hack to identify the query from Tableau and do special handling for it.
As we discussed, the second task is just a hack and probably it is not helpful 
if commit it to trunk. So I created a new patch which only contains the first 
task. 
The reason of introducing this configure variable is we want to limit the 
number of partitions when do table scan. As for metadata only query, since 
HIVE-1003 has the optimization on this type of query , the table scan is not a 
problem any more. 



> limit partition number involved in a table scan
> ---
>
> Key: HIVE-6492
> URL: https://issues.apache.org/jira/browse/HIVE-6492
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-6492.1.patch.txt, HIVE-6492.2.patch.txt
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> To protect the cluster, a new configure variable 
> "hive.limit.query.max.table.partition" is added to hive configuration to
> limit the table partitions involved in a table scan. 
> The default value will be set to -1 which means there is no limit by default. 
> This variable will not affect "metadata only" query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916755#comment-13916755
 ] 

Gunther Hagleitner commented on HIVE-6429:
--

+1 (assuming tests pass)

> MapJoinKey has large memory overhead in typical cases
> -
>
> Key: HIVE-6429
> URL: https://issues.apache.org/jira/browse/HIVE-6429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
> HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
> HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
> HIVE-6429.09.patch, HIVE-6429.10.patch, HIVE-6429.WIP.patch, HIVE-6429.patch
>
>
> The only thing that MJK really needs it hashCode and equals (well, and 
> construction), so there's no need to have array of writables in there. 
> Assuming all the keys for a table have the same structure, for the common 
> case where keys are primitive types, we can store something like a byte array 
> combination of keys to reduce the memory usage. Will probably speed up 
> compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 18230: HIVE-6429 MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18230/
---

(Updated March 1, 2014, 3:08 a.m.)


Review request for hive, Gunther Hagleitner and Jitendra Pandey.


Repository: hive-git


Description
---

See JIRA


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6802b4d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java 
3cfaacf 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java 988cc57 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 8b25300 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 46e37c2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 9948583 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java 5cf347b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKey.java 
2ac0928 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyBytes.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyObject.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 0279f7c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 295854d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java
 581046e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
2466a3b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java 
997202f 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinEqualityTableContainer.java
 d17b656 
  ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinKey.java 
a103a51 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java
 40bf006 
  ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 22eca50 
  
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
 fcded96 
  
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/OutputByteBuffer.java
 7bfe473 
  serde/src/java/org/apache/hadoop/hive/serde2/io/HiveDecimalWritable.java 
67cb1e8 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
f9b4031 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
c583ae2 

Diff: https://reviews.apache.org/r/18230/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Updated] (HIVE-5568) count(*) on ORC tables with predicate pushdown on partition columns fail

2014-02-28 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5568:
-

Status: Open  (was: Patch Available)

[~owen.omalley] can you take a look at the test failure? that seemed real,

> count(*) on ORC tables with predicate pushdown on partition columns fail
> 
>
> Key: HIVE-5568
> URL: https://issues.apache.org/jira/browse/HIVE-5568
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.12.1
>
> Attachments: D13485.1.patch, D13485.2.patch, D13485.3.patch
>
>
> If the query is:
> {code}
> select count(*) from orc_table where x = 10;
> {code}
> where x is a partition column and predicate pushdown is enabled, you'll get 
> an array out of bounds exception.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.10.patch

RB feedback + some internal discussion; mostly moving some key-specific stuff 
to key, and changing the vectorization path to go thru the elaborate 
writer/writable/oi path, and not raw values. Few tez tests appear to pass, I'll 
run the rest

> MapJoinKey has large memory overhead in typical cases
> -
>
> Key: HIVE-6429
> URL: https://issues.apache.org/jira/browse/HIVE-6429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
> HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
> HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
> HIVE-6429.09.patch, HIVE-6429.10.patch, HIVE-6429.WIP.patch, HIVE-6429.patch
>
>
> The only thing that MJK really needs it hashCode and equals (well, and 
> construction), so there's no need to have array of writables in there. 
> Assuming all the keys for a table have the same structure, for the common 
> case where keys are primitive types, we can store something like a byte array 
> combination of keys to reduce the memory usage. Will probably speed up 
> compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 18230: HIVE-6429 MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin


> On Feb. 28, 2014, 11:26 p.m., Gunther Hagleitner wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java,
> >  line 309
> > 
> >
> > this doesn't seem to belong in the serde. this is a helper for the map 
> > join key only. (e.g.: field < 8, etc) you should be able to just use the 
> > existing public interface, right?

I will have to add at least one static method. But yeah, made it a simple 
pass-thru to already existing private static method; moved all the key-specific 
stuff to keys


> On Feb. 28, 2014, 11:26 p.m., Gunther Hagleitner wrote:
> > serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java,
> >  line 236
> > 
> >
> > this doesn't seem to belong here. it's not a general purpose serde 
> > method... in the vectorizedreducesink we seem to just break the row group 
> > into rows and serialize with the unchanged serde. can we do this here too?

I can see if it works... looks convoluted from perf perspective, writable is 
created, then writer does bunch of stuff to get back raw value. If it works I 
guess we can keep it and speed up by getting raw value later, if needed


- Sergey


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18230/#review35866
---


On Feb. 28, 2014, 10:04 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18230/
> ---
> 
> (Updated Feb. 28, 2014, 10:04 p.m.)
> 
> 
> Review request for hive, Gunther Hagleitner and Jitendra Pandey.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6802b4d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java 
> 3cfaacf 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java 988cc57 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 
> 8b25300 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 46e37c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 9948583 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java 5cf347b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKey.java 
> 2ac0928 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyBytes.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyObject.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
>  0279f7c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 295854d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java
>  581046e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
> 2466a3b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
>  997202f 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinEqualityTableContainer.java
>  d17b656 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinKey.java 
> a103a51 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java
>  40bf006 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 
> 22eca50 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
>  fcded96 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/OutputByteBuffer.java
>  7bfe473 
>   serde/src/java/org/apache/hadoop/hive/serde2/io/HiveDecimalWritable.java 
> 67cb1e8 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
> f9b4031 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
> c583ae2 
> 
> Diff: https://reviews.apache.org/r/18230/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Updated] (HIVE-5193) Columnar Pushdown for RC/ORC File not happening in HCatLoader

2014-02-28 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated HIVE-5193:
-

Attachment: HIVE-5193.2.patch

Updated patch to remove the changes in org.apache.hcatalog

> Columnar Pushdown for RC/ORC File not happening in HCatLoader 
> --
>
> Key: HIVE-5193
> URL: https://issues.apache.org/jira/browse/HIVE-5193
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.10.0, 0.11.0, 0.12.0
>Reporter: Viraj Bhat
>Assignee: Viraj Bhat
>  Labels: hcatalog
> Fix For: 0.13.0
>
> Attachments: HIVE-5193.2.patch, HIVE-5193.patch
>
>
> Currently the HCatLoader is not taking advantage of the 
> ColumnProjectionUtils. where it could skip columns during read. The 
> information is available in Pig it just needs to get to the Readers.
> Viraj



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Resolved] (HIVE-6484) HiveServer2 doAs should be session aware both for secured and unsecured session implementation.

2014-02-28 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-6484.


Resolution: Duplicate

> HiveServer2 doAs should be session aware both for secured and unsecured 
> session implementation.
> ---
>
> Key: HIVE-6484
> URL: https://issues.apache.org/jira/browse/HIVE-6484
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
>
> Currently in unsecured case, the doAs is performed by decorating 
> TProcessor.process method. This has been causing cleanup issues as we end up 
> creating a new clientUgi for each request rather than for each session. This 
> also cleans up the code.
> [~thejas] Probably you can add more if you've seen other issues related to 
> this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-5193) Columnar Pushdown for RC/ORC File not happening in HCatLoader

2014-02-28 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916711#comment-13916711
 ] 

Viraj Bhat commented on HIVE-5193:
--

Hi Sushanth,
 Thanks for your comments. Removed any changes in the org.apache.hcatalog.* 
classes and reattaching the patch.
Viraj

> Columnar Pushdown for RC/ORC File not happening in HCatLoader 
> --
>
> Key: HIVE-5193
> URL: https://issues.apache.org/jira/browse/HIVE-5193
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.10.0, 0.11.0, 0.12.0
>Reporter: Viraj Bhat
>Assignee: Viraj Bhat
>  Labels: hcatalog
> Fix For: 0.13.0
>
> Attachments: HIVE-5193.patch
>
>
> Currently the HCatLoader is not taking advantage of the 
> ColumnProjectionUtils. where it could skip columns during read. The 
> information is available in Pig it just needs to get to the Readers.
> Viraj



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5193) Columnar Pushdown for RC/ORC File not happening in HCatLoader

2014-02-28 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated HIVE-5193:
-

Attachment: (was: HIVE-5193-2.patch)

> Columnar Pushdown for RC/ORC File not happening in HCatLoader 
> --
>
> Key: HIVE-5193
> URL: https://issues.apache.org/jira/browse/HIVE-5193
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.10.0, 0.11.0, 0.12.0
>Reporter: Viraj Bhat
>Assignee: Viraj Bhat
>  Labels: hcatalog
> Fix For: 0.13.0
>
> Attachments: HIVE-5193.patch
>
>
> Currently the HCatLoader is not taking advantage of the 
> ColumnProjectionUtils. where it could skip columns during read. The 
> information is available in Pig it just needs to get to the Readers.
> Viraj



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 18592: HIVE-6137

2014-02-28 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18592/#review35888
---



http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java


I think its better just to catch IOException here, otherwise this may 
result in misleading exception message.


- Ashutosh Chauhan


On Feb. 28, 2014, 7:37 p.m., Hari Sankar Sivarama Subramaniyan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18592/
> ---
> 
> (Updated Feb. 28, 2014, 7:37 p.m.)
> 
> 
> Review request for hive and Thejas Nair.
> 
> 
> Bugs: HIVE-6137
> https://issues.apache.org/jira/browse/HIVE-6137
> 
> 
> Repository: hive
> 
> 
> Description
> ---
> 
> HIVE-6137 Improve error message when file is not found while creating a table.
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  1573040 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/external1.q
>  1573040 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/external1.q.out
>  1573040 
> 
> Diff: https://reviews.apache.org/r/18592/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Hari Sankar Sivarama Subramaniyan
> 
>



Re: Review Request 18065: HIVE-6024 Load data local inpath unnecessarily creates a copy task

2014-02-28 Thread Carl Steinbach


> On Feb. 28, 2014, 10:11 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/load_local_dir_test.q, line 2
> > 
> >
> > This test passes with or without the rest of the patch. It doesn't seem 
> > to demonstrate any change in behavior.
> 
> Mohammad Islam wrote:
> Yes. This JIRA is to change the hive internal data movement for HQL 'LOAD 
> LOCAL ...' -- no new feature is added.
> I understood Ashutosh's concern to verify whether this new internal 
> change would break the HQL like LOAD LOCAL from a *directory*. I didn't find 
> any existing .q file that covered this test. Therefore added a new one to 
> make sure existing behavior doesn't break.
>

Ok, makes sense. Thanks for the explanation.


- Carl


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18065/#review35771
---


On Feb. 28, 2014, 8:03 a.m., Mohammad Islam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18065/
> ---
> 
> (Updated Feb. 28, 2014, 8:03 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-6024
> https://issues.apache.org/jira/browse/HIVE-6024
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Excerpt from the JIRA:
> "Load data command creates an additional copy task only when its loading from 
> local It doesn't create this additional copy task while loading from DFS 
> though."
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
>  8beef09 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java ed7787d 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 05a2da7 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 8318be1 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> 59aeb96 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 407450e 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 5991aae 
>   ql/src/test/queries/clientpositive/load_local_dir_test.q PRE-CREATION 
>   ql/src/test/results/clientpositive/load_local_dir_test.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18065/diff/
> 
> 
> Testing
> ---
> 
> Ran some existing q tests with "LOAD DATA LOCAL INPATH".
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>



[jira] [Updated] (HIVE-6519) Allow optional "as" in subquery definition

2014-02-28 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6519:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.

> Allow optional "as" in subquery definition
> --
>
> Key: HIVE-6519
> URL: https://issues.apache.org/jira/browse/HIVE-6519
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: HIVE-6519.1.patch
>
>
> Allow both:
> select * from (select * from foo) bar 
> select * from (select * from foo) as bar 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6290) Add support for hbase filters for composite keys

2014-02-28 Thread Swarnim Kulkarni (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916654#comment-13916654
 ] 

Swarnim Kulkarni commented on HIVE-6290:


[~brocknoland][~xuefuz] Unless there is something like you guys want me to look 
at with this patch, this should be ready to be merged.

> Add support for hbase filters for composite keys
> 
>
> Key: HIVE-6290
> URL: https://issues.apache.org/jira/browse/HIVE-6290
> Project: Hive
>  Issue Type: Sub-task
>  Components: HBase Handler
>Affects Versions: 0.12.0
>Reporter: Swarnim Kulkarni
>Assignee: Swarnim Kulkarni
> Attachments: HIVE-6290.1.patch.txt, HIVE-6290.2.patch.txt, 
> HIVE-6290.3.patch.txt
>
>
> Add support for filters to be provided via the composite key class



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Status: Patch Available  (was: Open)

> MapJoinKey has large memory overhead in typical cases
> -
>
> Key: HIVE-6429
> URL: https://issues.apache.org/jira/browse/HIVE-6429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
> HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
> HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
> HIVE-6429.09.patch, HIVE-6429.WIP.patch, HIVE-6429.patch
>
>
> The only thing that MJK really needs it hashCode and equals (well, and 
> construction), so there's no need to have array of writables in there. 
> Assuming all the keys for a table have the same structure, for the common 
> case where keys are primitive types, we can store something like a byte array 
> combination of keys to reduce the memory usage. Will probably speed up 
> compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 18464: Support secure Subject.doAs() in HiveServer2 JDBC client

2014-02-28 Thread Shivaraju Gowda


> On Feb. 28, 2014, 12:59 a.m., Vaibhav Gumashta wrote:
> > service/src/java/org/apache/hive/service/auth/KerberosSaslHelper.java, line 
> > 68
> > 
> >
> > Can you push this to 
> > HadoopThriftAuthBridge.Client#createClientTransport just like the way the 
> > else portion does instead of the createSubjectAssumedTransport method? From 
> > within the method you can return the TSubjectAssumingTransport.
> 
> Shivaraju Gowda wrote:
> Again this was in my first cut. I was passing the value as "tokenStrForm" 
> parameter to keep the method signature same. I later moved away from it since 
> it was not elegant and changing the method signature involved broader 
> implications. I felt this functionality didn't belong in Hadoop shim layer. 
> Having the change in there also meant one more jar getting 
> affected(hive-exec.jar)
>

Another issue was the dependency on hadoop.core.jar. The calls  
AuthMethod.valueOf(AuthMethod.class, methodStr) and  
SaslRpcServer.splitKerberosName(serverPrincipal) in 
HadoopThriftAuthBridge.Client#createClientTransport are from hadoop.core.jar


- Shivaraju


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18464/#review35730
---


On Feb. 25, 2014, 2:50 p.m., Kevin Minder wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18464/
> ---
> 
> (Updated Feb. 25, 2014, 2:50 p.m.)
> 
> 
> Review request for hive, Kevin Minder and Vaibhav Gumashta.
> 
> 
> Bugs: HIVE-6486
> https://issues.apache.org/jira/browse/HIVE-6486
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Support secure Subject.doAs() in HiveServer2 JDBC client
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 17b4d39 
>   service/src/java/org/apache/hive/service/auth/KerberosSaslHelper.java 
> 379dafb 
>   
> service/src/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18464/diff/
> 
> 
> Testing
> ---
> 
> Manual testing
> 
> 
> Thanks,
> 
> Kevin Minder
> 
>



[jira] [Updated] (HIVE-5232) Make JDBC use the new HiveServer2 async execution API by default

2014-02-28 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-5232:
---

Status: Patch Available  (was: Open)

> Make JDBC use the new HiveServer2 async execution API by default
> 
>
> Key: HIVE-5232
> URL: https://issues.apache.org/jira/browse/HIVE-5232
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-5232.1.patch, HIVE-5232.2.patch, HIVE-5232.3.patch
>
>
> HIVE-4617 provides support for async execution in HS2. There are some 
> proposed improvements in followup JIRAs:
> HIVE-5217
> HIVE-5229
> HIVE-5230
> HIVE-5441
> There is also [HIVE-5060] which assumes that execute to be asynchronous by 
> default.
>  
> Once they are in, we can think of using the async API as the default for 
> JDBC. This can enable the server to report back error sooner to the client. 
> It can also be useful in cases where a statement.cancel is done in a 
> different thread - the original thread will now be able to detect the cancel, 
> as opposed to the use of the blocking execute calls, in which 
> statement.cancel will be a no-op. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5232) Make JDBC use the new HiveServer2 async execution API by default

2014-02-28 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-5232:
---

Status: Open  (was: Patch Available)

> Make JDBC use the new HiveServer2 async execution API by default
> 
>
> Key: HIVE-5232
> URL: https://issues.apache.org/jira/browse/HIVE-5232
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 0.13.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.13.0
>
> Attachments: HIVE-5232.1.patch, HIVE-5232.2.patch, HIVE-5232.3.patch
>
>
> HIVE-4617 provides support for async execution in HS2. There are some 
> proposed improvements in followup JIRAs:
> HIVE-5217
> HIVE-5229
> HIVE-5230
> HIVE-5441
> There is also [HIVE-5060] which assumes that execute to be asynchronous by 
> default.
>  
> Once they are in, we can think of using the async API as the default for 
> JDBC. This can enable the server to report back error sooner to the client. 
> It can also be useful in cases where a statement.cancel is done in a 
> different thread - the original thread will now be able to detect the cancel, 
> as opposed to the use of the blocking execute calls, in which 
> statement.cancel will be a no-op. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6486) Support secure Subject.doAs() in HiveServer2 JDBC client.

2014-02-28 Thread Shivaraju Gowda (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaraju Gowda updated HIVE-6486:
--

Fix Version/s: 0.13.0
   Status: Patch Available  (was: Open)

> Support secure Subject.doAs() in HiveServer2 JDBC client.
> -
>
> Key: HIVE-6486
> URL: https://issues.apache.org/jira/browse/HIVE-6486
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 0.12.0, 0.11.0
>Reporter: Shivaraju Gowda
> Fix For: 0.13.0
>
> Attachments: HIVE-6486.1.patch, Hive_011_Support-Subject_doAS.patch, 
> TestHive_SujectDoAs.java
>
>
> HIVE-5155 addresses the problem of kerberos authentication in multi-user 
> middleware server using proxy user.  In this mode the principal used by the 
> middle ware server has privileges to impersonate selected users in 
> Hive/Hadoop. 
> This enhancement is to support Subject.doAs() authentication in  Hive JDBC 
> layer so that the end users Kerberos Subject is passed through in the middle 
> ware server. With this improvement there won't be any additional setup in the 
> server to grant proxy privileges to some users and there won't be need to 
> specify a proxy user in the JDBC client. This version should also be more 
> secure since it won't require principals with the privileges to impersonate 
> other users in Hive/Hadoop setup.
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 14950: Make JDBC use the new HiveServer2 async execution API by default

2014-02-28 Thread Vaibhav Gumashta

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14950/
---

(Updated March 1, 2014, 12:43 a.m.)


Review request for hive, Prasad Mujumdar and Thejas Nair.


Bugs: HIVE-5232
https://issues.apache.org/jira/browse/HIVE-5232


Repository: hive-git


Description
---

Should be applied on top of:
HIVE-5217 [Add long polling to asynchronous execution in HiveServer2]
HIVE-5229 [Better thread management for HiveServer2 async threads]
HIVE-5230 [Better error reporting by async threads in HiveServer2]
HIVE-5441 [Async query execution doesn't return resultset status] 


Diffs
-

  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java f0d0c77 

Diff: https://reviews.apache.org/r/14950/diff/


Testing
---

TestJdbcDriver2


Thanks,

Vaibhav Gumashta



[jira] [Updated] (HIVE-6486) Support secure Subject.doAs() in HiveServer2 JDBC client.

2014-02-28 Thread Shivaraju Gowda (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaraju Gowda updated HIVE-6486:
--

Attachment: HIVE-6486.1.patch

Rebased the patch to trunk with the following additions compared to previous 
patch.
+ Change url property "identityContext=fromKerberosSubject" to 
"auth=fromKerberosSubject"
+ Pass SaslProps for creating the client transport. 

> Support secure Subject.doAs() in HiveServer2 JDBC client.
> -
>
> Key: HIVE-6486
> URL: https://issues.apache.org/jira/browse/HIVE-6486
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Shivaraju Gowda
> Attachments: HIVE-6486.1.patch, Hive_011_Support-Subject_doAS.patch, 
> TestHive_SujectDoAs.java
>
>
> HIVE-5155 addresses the problem of kerberos authentication in multi-user 
> middleware server using proxy user.  In this mode the principal used by the 
> middle ware server has privileges to impersonate selected users in 
> Hive/Hadoop. 
> This enhancement is to support Subject.doAs() authentication in  Hive JDBC 
> layer so that the end users Kerberos Subject is passed through in the middle 
> ware server. With this improvement there won't be any additional setup in the 
> server to grant proxy privileges to some users and there won't be need to 
> specify a proxy user in the JDBC client. This version should also be more 
> secure since it won't require principals with the privileges to impersonate 
> other users in Hive/Hadoop setup.
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 14950: Make JDBC use the new HiveServer2 async execution API by default

2014-02-28 Thread Vaibhav Gumashta


> On Feb. 18, 2014, 6:41 p.m., Prasad Mujumdar wrote:
> > @Vaibhav, thanks for the patch and following up on the overall async 
> > execution line items!
> > 
> > The patch itself looks fine on the first pass. I do have a high level 
> > question on the approach. The patch enables async execution by defaults on 
> > client side and adds synchronous wait top of that. This IHMO defeats the 
> > purpose of the aysnc execution. From the application point of view, there's 
> > no difference in the behavior with or without this patch. The execution 
> > will block till the query execution is complete.
> > The rationale of async execution is to return control back to client 
> > immediately so that the client has an option to perform alternate 
> > foreground work while the query is being processed. Have you considered  
> > blocking in fetch for queries with resultset ? That in would give more time 
> > for server to process the query in parallel while client examines the 
> > resultset description and possible interact with end user etc.
> 
> Vaibhav Gumashta wrote:
> @Prasad: Thanks a lot for taking a look! I was basically going for a use 
> case when an error can be detected sooner. For example consider the scenario 
> when a stmt.cancel is called from a separate thread:
> Blocking calls:
> When stmt.execute is called, internally HiveStatement sets a variable 
> stmtHandle, to refer to the Operation object that it created for the query 
> execution. However, for a blocking call, the stmtHandle is set only when the 
> call to Operation.run returns. If we call a stmt.cancel (in a different 
> thread) during this, the internal code will check for the value of stmtHandle 
> and return immediately if it is null, essentially doing a no-op. 
> Async calls:
> Since an async call immediately returns from Operation.run, the 
> stmtHandle is set in HiveStatement while the query is still executing. This 
> means, that a stmt.cancel can actually cancel the underlying operation and 
> the original stmt.execute will throw an error when we poll for the operation 
> status (since the underlying operation handle is gone). This can return a 
> useful response to the client sooner.
> 
> However, I agree blocking on fetch would be more efficient. Do you think 
> I can take that up in a follow up jira since I may not have a lot of time to 
> work on it for Hive 13 release?
> 
> Thanks again!
> 
> Thejas Nair wrote:
> Yes, I think it is ok to address that in a follow up jira. This is an 
> optimization, and the changes are useful even without that. What do you think 
> Prasad ?
>
> 
> Prasad Mujumdar wrote:
> Sounds good. 
> @Viabhav, are you planning to log a follow up ticket or I can do that. 
> This will be especially useful for beeline and normal single thread client. 
> Thanks!

@Prasad: I can do that, but feel free to create one if you're planning to work 
sooner. Thanks a lot for reviewing the patch. 


- Vaibhav


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14950/#review34728
---


On Feb. 20, 2014, 9:18 a.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14950/
> ---
> 
> (Updated Feb. 20, 2014, 9:18 a.m.)
> 
> 
> Review request for hive and Thejas Nair.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Should be applied on top of:
> HIVE-5217 [Add long polling to asynchronous execution in HiveServer2]
> HIVE-5229 [Better thread management for HiveServer2 async threads]
> HIVE-5230 [Better error reporting by async threads in HiveServer2]
> HIVE-5441 [Async query execution doesn't return resultset status] 
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java f0d0c77 
> 
> Diff: https://reviews.apache.org/r/14950/diff/
> 
> 
> Testing
> ---
> 
> TestJdbcDriver2
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



Re: Review Request 14950: Make JDBC use the new HiveServer2 async execution API by default

2014-02-28 Thread Vaibhav Gumashta


> On Feb. 23, 2014, 12:36 a.m., Thejas Nair wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java, line 271
> > 
> >
> > I think we should retry on exceptions that indicate an network 
> > connection error. TProtocolException seems to be the exception that is 
> > thrown in such cases.
> 
> Thejas Nair wrote:
> I think we can address this in a follow up patch. Connection error in 
> case of the current synchronous execution also would result in failure.
>

I'll create a new jira for that. Thanks for taking a look!


- Vaibhav


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14950/#review35234
---


On Feb. 20, 2014, 9:18 a.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14950/
> ---
> 
> (Updated Feb. 20, 2014, 9:18 a.m.)
> 
> 
> Review request for hive and Thejas Nair.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Should be applied on top of:
> HIVE-5217 [Add long polling to asynchronous execution in HiveServer2]
> HIVE-5229 [Better thread management for HiveServer2 async threads]
> HIVE-5230 [Better error reporting by async threads in HiveServer2]
> HIVE-5441 [Async query execution doesn't return resultset status] 
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java f0d0c77 
> 
> Diff: https://reviews.apache.org/r/14950/diff/
> 
> 
> Testing
> ---
> 
> TestJdbcDriver2
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



[jira] [Commented] (HIVE-5155) Support secure proxy user access to HiveServer2

2014-02-28 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916642#comment-13916642
 ] 

Vaibhav Gumashta commented on HIVE-5155:


[~thejas] [~prasadm] I agree it will be very useful in 0.13. 

Prasad, let me know if you'd like me to pitch in; I have some free cycles. 
Thanks! 

> Support secure proxy user access to HiveServer2
> ---
>
> Key: HIVE-5155
> URL: https://issues.apache.org/jira/browse/HIVE-5155
> Project: Hive
>  Issue Type: Improvement
>  Components: Authentication, HiveServer2, JDBC
>Affects Versions: 0.12.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Attachments: HIVE-5155-1-nothrift.patch, HIVE-5155-noThrift.2.patch, 
> HIVE-5155-noThrift.4.patch, HIVE-5155-noThrift.5.patch, 
> HIVE-5155-noThrift.6.patch, HIVE-5155.1.patch, HIVE-5155.2.patch, 
> HIVE-5155.3.patch, ProxyAuth.java, ProxyAuth.out, TestKERBEROS_Hive_JDBC.java
>
>
> The HiveServer2 can authenticate a client using via Kerberos and impersonate 
> the connecting user with underlying secure hadoop. This becomes a gateway for 
> a remote client to access secure hadoop cluster. Now this works fine for when 
> the client obtains Kerberos ticket and directly connects to HiveServer2. 
> There's another big use case for middleware tools where the end user wants to 
> access Hive via another server. For example Oozie action or Hue submitting 
> queries or a BI tool server accessing to HiveServer2. In these cases, the 
> third party server doesn't have end user's Kerberos credentials and hence it 
> can't submit queries to HiveServer2 on behalf of the end user.
> This ticket is for enabling proxy access to HiveServer2 for third party tools 
> on behalf of end users. There are two parts of the solution proposed in this 
> ticket:
> 1) Delegation token based connection for Oozie (OOZIE-1457)
> This is the common mechanism for Hadoop ecosystem components. Hive Remote 
> Metastore and HCatalog already support this. This is suitable for tool like 
> Oozie that submits the MR jobs as actions on behalf of its client. Oozie 
> already uses similar mechanism for Metastore/HCatalog access.
> 2) Direct proxy access for privileged hadoop users
> The delegation token implementation can be a challenge for non-hadoop 
> (especially non-java) components. This second part enables a privileged user 
> to directly specify an alternate session user during the connection. If the 
> connecting user has hadoop level privilege to impersonate the requested 
> userid, then HiveServer2 will run the session as that requested user. For 
> example, user Hue is allowed to impersonate user Bob (via core-site.xml proxy 
> user configuration). Then user Hue can connect to HiveServer2 and specify Bob 
> as session user via a session property. HiveServer2 will verify Hue's proxy 
> user privilege and then impersonate user Bob instead of Hue. This will enable 
> any third party tool to impersonate alternate userid without having to 
> implement delegation token connection.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6486) Support secure Subject.doAs() in HiveServer2 JDBC client.

2014-02-28 Thread Shivaraju Gowda (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaraju Gowda updated HIVE-6486:
--

Status: Open  (was: Patch Available)

> Support secure Subject.doAs() in HiveServer2 JDBC client.
> -
>
> Key: HIVE-6486
> URL: https://issues.apache.org/jira/browse/HIVE-6486
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 0.12.0, 0.11.0
>Reporter: Shivaraju Gowda
> Attachments: Hive_011_Support-Subject_doAS.patch, 
> TestHive_SujectDoAs.java
>
>
> HIVE-5155 addresses the problem of kerberos authentication in multi-user 
> middleware server using proxy user.  In this mode the principal used by the 
> middle ware server has privileges to impersonate selected users in 
> Hive/Hadoop. 
> This enhancement is to support Subject.doAs() authentication in  Hive JDBC 
> layer so that the end users Kerberos Subject is passed through in the middle 
> ware server. With this improvement there won't be any additional setup in the 
> server to grant proxy privileges to some users and there won't be need to 
> specify a proxy user in the JDBC client. This version should also be more 
> secure since it won't require principals with the privileges to impersonate 
> other users in Hive/Hadoop setup.
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6528) Add maven compiler plugin to ptest2 pom

2014-02-28 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916640#comment-13916640
 ] 

Brock Noland commented on HIVE-6528:


+1

> Add maven compiler plugin to ptest2 pom
> ---
>
> Key: HIVE-6528
> URL: https://issues.apache.org/jira/browse/HIVE-6528
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6528.patch
>
>
> NO PRECOMMIT TESTS
> Maven-compiler-plugin and java versions needs to be added to ptest2 pom.
> Without this, will pick up random version of javac when trying to build this 
> project.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6475) Implement support for appending to mutable tables in HCatalog

2014-02-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6475:
---

Attachment: 6475.log.hadoop2

Also attaching test log of tests run against -Phadoop-2 (after patching with 
HIVE-6514.3.patch as well that allows pig unit tests to run)

> Implement support for appending to mutable tables in HCatalog
> -
>
> Key: HIVE-6475
> URL: https://issues.apache.org/jira/browse/HIVE-6475
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: 6475.log, 6475.log.hadoop2, HIVE-6475.2.patch, 
> HIVE-6475.patch
>
>
> Part of HIVE-6405, this is the implementation of the append feature on the 
> HCatalog side. If a table is mutable, we must support being able to append to 
> existing data instead of erroring out as  a duplicate publish.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6499) Using Metastore-side Auth errors on non-resolvable IF/OF/SerDe

2014-02-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6499:
---

Status: Patch Available  (was: Open)

> Using Metastore-side Auth errors on non-resolvable IF/OF/SerDe
> --
>
> Key: HIVE-6499
> URL: https://issues.apache.org/jira/browse/HIVE-6499
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Security
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-6499.patch
>
>
> In cases where a user needs to use a custom IF/OF/SerDe that is not 
> accessible from the metastore, calls like msc.createTable and msc.dropTable 
> should still work without being able to load the class. This is possible as 
> long as one does not enable MetaStore-side authorization, at which point this 
> becomes impossible, erroring out with a ClassNotFoundException.
> The reason this happens is that since the AuthorizationProvider interface is 
> defined against a ql.metadata.Table, we wind up needing to instantiate a 
> ql.metadata.Table object, which, in its constructor tries to instantiate 
> IF/OF/SerDe elements in an attempt to pre-load those fields. And if we do not 
> have access to those classes in the metastore, this is when that fails. The 
> constructor/initialize methods of Table and Partition do not really need to 
> pre-initialize these fields, since the fields are accessed only through the 
> accessor, and will be instantiated on first-use.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6499) Using Metastore-side Auth errors on non-resolvable IF/OF/SerDe

2014-02-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6499:
---

Status: Open  (was: Patch Available)

canceling patch and re-marking as available to make the precommit tests pick it 
up, now that it's running again.

> Using Metastore-side Auth errors on non-resolvable IF/OF/SerDe
> --
>
> Key: HIVE-6499
> URL: https://issues.apache.org/jira/browse/HIVE-6499
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Security
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-6499.patch
>
>
> In cases where a user needs to use a custom IF/OF/SerDe that is not 
> accessible from the metastore, calls like msc.createTable and msc.dropTable 
> should still work without being able to load the class. This is possible as 
> long as one does not enable MetaStore-side authorization, at which point this 
> becomes impossible, erroring out with a ClassNotFoundException.
> The reason this happens is that since the AuthorizationProvider interface is 
> defined against a ql.metadata.Table, we wind up needing to instantiate a 
> ql.metadata.Table object, which, in its constructor tries to instantiate 
> IF/OF/SerDe elements in an attempt to pre-load those fields. And if we do not 
> have access to those classes in the metastore, this is when that fails. The 
> constructor/initialize methods of Table and Partition do not really need to 
> pre-initialize these fields, since the fields are accessed only through the 
> accessor, and will be instantiated on first-use.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6528) Add maven compiler plugin to ptest2 pom

2014-02-28 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6528:


Description: 
NO PRECOMMIT TESTS

Maven-compiler-plugin and java versions needs to be added to ptest2 pom.

Without this, will pick up random version of javac when trying to build this 
project.

  was:
Maven-compiler-plugin and java versions needs to be added to ptest2 pom.

Without this, will pick up random version of javac when trying to build this 
project.


> Add maven compiler plugin to ptest2 pom
> ---
>
> Key: HIVE-6528
> URL: https://issues.apache.org/jira/browse/HIVE-6528
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6528.patch
>
>
> NO PRECOMMIT TESTS
> Maven-compiler-plugin and java versions needs to be added to ptest2 pom.
> Without this, will pick up random version of javac when trying to build this 
> project.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6024) Load data local inpath unnecessarily creates a copy task

2014-02-28 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916614#comment-13916614
 ] 

Mohammad Kamrul Islam commented on HIVE-6024:
-

I didn't find any existing .q file that covered this. Made a comment in RB as 
well.

> Load data local inpath unnecessarily creates a copy task
> 
>
> Key: HIVE-6024
> URL: https://issues.apache.org/jira/browse/HIVE-6024
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-6024.1.patch, HIVE-6024.2.patch, HIVE-6024.3.patch
>
>
> Load data command creates an additional copy task only when its loading from 
> {{local}} It doesn't create this additional copy task while loading from DFS 
> though.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6528) Add maven compiler plugin to ptest2 pom

2014-02-28 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6528:


Affects Version/s: 0.13.0
   Status: Patch Available  (was: Open)

> Add maven compiler plugin to ptest2 pom
> ---
>
> Key: HIVE-6528
> URL: https://issues.apache.org/jira/browse/HIVE-6528
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6528.patch
>
>
> NO PRECOMMIT TESTS
> Maven-compiler-plugin and java versions needs to be added to ptest2 pom.
> Without this, will pick up random version of javac when trying to build this 
> project.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 18065: HIVE-6024 Load data local inpath unnecessarily creates a copy task

2014-02-28 Thread Mohammad Islam


> On Feb. 28, 2014, 10:11 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/load_local_dir_test.q, line 2
> > 
> >
> > This test passes with or without the rest of the patch. It doesn't seem 
> > to demonstrate any change in behavior.

Yes. This JIRA is to change the hive internal data movement for HQL 'LOAD LOCAL 
...' -- no new feature is added.
I understood Ashutosh's concern to verify whether this new internal change 
would break the HQL like LOAD LOCAL from a *directory*. I didn't find any 
existing .q file that covered this test. Therefore added a new one to make sure 
existing behavior doesn't break.


- Mohammad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18065/#review35771
---


On Feb. 28, 2014, 8:03 a.m., Mohammad Islam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18065/
> ---
> 
> (Updated Feb. 28, 2014, 8:03 a.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-6024
> https://issues.apache.org/jira/browse/HIVE-6024
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Excerpt from the JIRA:
> "Load data command creates an additional copy task only when its loading from 
> local It doesn't create this additional copy task while loading from DFS 
> though."
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
>  8beef09 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java ed7787d 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 05a2da7 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 8318be1 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/LoadSemanticAnalyzer.java 
> 59aeb96 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 407450e 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 5991aae 
>   ql/src/test/queries/clientpositive/load_local_dir_test.q PRE-CREATION 
>   ql/src/test/results/clientpositive/load_local_dir_test.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18065/diff/
> 
> 
> Testing
> ---
> 
> Ran some existing q tests with "LOAD DATA LOCAL INPATH".
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>



[jira] [Updated] (HIVE-6528) Add maven compiler plugin to ptest2 pom

2014-02-28 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-6528:


Attachment: HIVE-6528.patch

> Add maven compiler plugin to ptest2 pom
> ---
>
> Key: HIVE-6528
> URL: https://issues.apache.org/jira/browse/HIVE-6528
> Project: Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-6528.patch
>
>
> Maven-compiler-plugin and java versions needs to be added to ptest2 pom.
> Without this, will pick up random version of javac when trying to build this 
> project.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6411) Support more generic way of using composite key for HBaseHandler

2014-02-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916603#comment-13916603
 ] 

Hive QA commented on HIVE-6411:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12631672/HIVE-6411.2.patch.txt

{color:green}SUCCESS:{color} +1 5181 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1559/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1559/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12631672

> Support more generic way of using composite key for HBaseHandler
> 
>
> Key: HIVE-6411
> URL: https://issues.apache.org/jira/browse/HIVE-6411
> Project: Hive
>  Issue Type: Improvement
>  Components: HBase Handler
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-6411.1.patch.txt, HIVE-6411.2.patch.txt
>
>
> HIVE-2599 introduced using custom object for the row key. But it forces key 
> objects to extend HBaseCompositeKey, which is again extension of LazyStruct. 
> If user provides proper Object and OI, we can replace internal key and keyOI 
> with those. 
> Initial implementation is based on factory interface.
> {code}
> public interface HBaseKeyFactory {
>   void init(SerDeParameters parameters, Properties properties) throws 
> SerDeException;
>   ObjectInspector createObjectInspector(TypeInfo type) throws SerDeException;
>   LazyObjectBase createObject(ObjectInspector inspector) throws 
> SerDeException;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6528) Add maven compiler plugin to ptest2 pom

2014-02-28 Thread Szehon Ho (JIRA)
Szehon Ho created HIVE-6528:
---

 Summary: Add maven compiler plugin to ptest2 pom
 Key: HIVE-6528
 URL: https://issues.apache.org/jira/browse/HIVE-6528
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Szehon Ho
Assignee: Szehon Ho


Maven-compiler-plugin and java versions needs to be added to ptest2 pom.

Without this, will pick up random version of javac when trying to build this 
project.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


RE: need advice on debugging into TempletonJobController.java

2014-02-28 Thread Eric Hanson (BIG DATA)
Hey, I found the solution. You need to add this to webhcat-site.xml. -Eric


To attach the debugger to the templeton controller MR job started by the 
templeton service, go to %hcatalog_home%\conf\webhcat-site.xml and add the 
following block (copied from etc\webhcat\webhcat-default.xml, and enhanced with 
the highlighted options for debugging).
  
templeton.controller.mr.child.opts
-Xdebug -Djava.compiler=NONE 
-Xrunjdwp:transport=dt_socket,address=8004,server=y,suspend=y -server -Xmx256m 
-Djava.net.preferIPv4Stack=true
Java options to be passed to templeton controller map task.
The default value of mapreduce child "-Xmx" (heap memory limit)
might be close to what is allowed for a map task.
Even if templeton  controller map task does not need much 
memory, the jvm (with -server option?)
allocates the max memory when it starts. This along with the 
memory used by pig/hive client it starts can end up exceeding
the max memory configured to be allowed for a map task
Use this option to set -Xmx to lower value

  

-Original Message-
From: Eric Hanson (BIG DATA) [mailto:eric.n.han...@microsoft.com] 
Sent: Friday, February 28, 2014 12:06 PM
To: dev@hive.apache.org
Subject: need advice on debugging into TempletonJobController.java

I want to attach a debugger to TempletonJobController.java (code that runs in a 
map job started by templeton service, that in turn will start another job). 
Does anybody know how to make the job wait for a debugger to attach? i.e. what 
file to modify to change the java opts?

Eric

Details of what I tried:

I tried adding it in %hadoop_home%/conf/mapred-site.xml but it didn't work:

  
mapred.child.java.opts
-Xdebug -Djava.compiler=NONE 
-Xrunjdwp:transport=dt_socket,address=8004,server=y,suspend=y -Xmx1024m
  

I also tried this, in:

%hcatalog_home%\etc\webhcat\webhcat-default.xml

Adding:


templeton.controller.mr.child.opts
-Xdebug -Djava.compiler=NONE 
-Xrunjdwp:transport=dt_socket,address=8004,server=y,suspend=y -server -Xmx256m 
-Djava.net.preferIPv4Stack=true
Java options to be passed to templeton controller map task.
The default value of mapreduce child "-Xmx" (heap memory limit)
might be close to what is allowed for a map task.
Even if templeton  controller map task does not need much
memory, the jvm (with -server option?)
allocates the max memory when it starts. This along with the
memory used by pig/hive client it starts can end up exceeding
the max memory configured to be allowed for a map task
Use this option to set -Xmx to lower value

  

But the job doesn't appear to wait, and I keep seeing this in my job config:

mapred.child.java.opts

-server -Xmx256m -Djava.net.preferIPv4Stack=true




[jira] [Updated] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered

2014-02-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-6518:
--

Attachment: HIVE-6518.2-tez.patch

Add DEBUG lines

> Add a GC canary to the VectorGroupByOperator to flush whenever a GC is 
> triggered
> 
>
> Key: HIVE-6518
> URL: https://issues.apache.org/jira/browse/HIVE-6518
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-6518.1-tez.patch, HIVE-6518.2-tez.patch
>
>
> The current VectorGroupByOperator implementation flushes the in-memory hashes 
> when the maximum entries or fraction of memory is hit.
> This works for most cases, but there are some corner cases where we hit GC 
> ovehead limits or heap size limits before either of those conditions are 
> reached due to the rest of the pipeline.
> This patch adds a SoftReference as a GC canary. If the soft reference is 
> dead, then a full GC pass happened sometime in the near past & the 
> aggregation hashtables should be flushed immediately before another full GC 
> is triggered.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 18464: Support secure Subject.doAs() in HiveServer2 JDBC client

2014-02-28 Thread Shivaraju Gowda


> On Feb. 28, 2014, 12:59 a.m., Vaibhav Gumashta wrote:
> > jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java, line 136
> > 
> >
> > I think, instead of having to do identityContext equals 
> > "fromKerberosSubject", we can just use assumeSubject equals true/false, 
> > keeping the default to false.
> 
> Shivaraju Gowda wrote:
> Passing it as "assumeSubject" boolean url property was my first cut. 
> However I thought "assumeSubject" itself doesn't convey the message for its 
> intended use in and off by itself(need to refer to the documentation) and 
> making it key-value pair might give it some more meaning and there is also a 
> possibility of it being later used for other use cases (say hypothetically 
> the value can be fromKeyTab, fromTicketCache or fromLogin etc.).

Do you think it might better if we use auth property here, i.e 
auth=fromKerberosSubject. Right now the only values for auth=noSasl.


- Shivaraju


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18464/#review35730
---


On Feb. 25, 2014, 2:50 p.m., Kevin Minder wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18464/
> ---
> 
> (Updated Feb. 25, 2014, 2:50 p.m.)
> 
> 
> Review request for hive, Kevin Minder and Vaibhav Gumashta.
> 
> 
> Bugs: HIVE-6486
> https://issues.apache.org/jira/browse/HIVE-6486
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Support secure Subject.doAs() in HiveServer2 JDBC client
> 
> 
> Diffs
> -
> 
>   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 17b4d39 
>   service/src/java/org/apache/hive/service/auth/KerberosSaslHelper.java 
> 379dafb 
>   
> service/src/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18464/diff/
> 
> 
> Testing
> ---
> 
> Manual testing
> 
> 
> Thanks,
> 
> Kevin Minder
> 
>



Re: Review Request 18230: HIVE-6429 MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Gunther Hagleitner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18230/#review35866
---



ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java


doesn't look like keyobject is used?



ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java


looks like unused import.



ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java


this still violates the coding standard as far as i can tell.



ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java


same here.



serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java


this doesn't seem to belong here. it's not a general purpose serde 
method... in the vectorizedreducesink we seem to just break the row group into 
rows and serialize with the unchanged serde. can we do this here too?



serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java


this doesn't seem to belong in the serde. this is a helper for the map join 
key only. (e.g.: field < 8, etc) you should be able to just use the existing 
public interface, right?


- Gunther Hagleitner


On Feb. 28, 2014, 10:04 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18230/
> ---
> 
> (Updated Feb. 28, 2014, 10:04 p.m.)
> 
> 
> Review request for hive, Gunther Hagleitner and Jitendra Pandey.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> See JIRA
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6802b4d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java 
> 3cfaacf 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java 988cc57 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 
> 8b25300 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 46e37c2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 9948583 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java 5cf347b 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKey.java 
> 2ac0928 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyBytes.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyObject.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
>  0279f7c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 295854d 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java
>  581046e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
> 2466a3b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java
>  997202f 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinEqualityTableContainer.java
>  d17b656 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinKey.java 
> a103a51 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java
>  40bf006 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 
> 22eca50 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
>  fcded96 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/OutputByteBuffer.java
>  7bfe473 
>   serde/src/java/org/apache/hadoop/hive/serde2/io/HiveDecimalWritable.java 
> 67cb1e8 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
> f9b4031 
>   
> serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
> c583ae2 
> 
> Diff: https://reviews.apache.org/r/18230/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Commented] (HIVE-5193) Columnar Pushdown for RC/ORC File not happening in HCatLoader

2014-02-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916570#comment-13916570
 ] 

Sushanth Sowmyan commented on HIVE-5193:


Hi Viraj,

A couple of quick notes:

a) If you cancel your patch, rename it to HIVE-5193.2.patch, re-upload and mark 
as patch-avaliable, then the pre-commit tests will pick it up.
b) Please do not modify any of the org.apache.hcatalog.* classes - they are 
deprecated and are being maintained in parity with the state they were in as of 
hive-0.11. The only reason we'd change them is if they do not build any more 
due to issues with hadoop or hive/etc. Bugfixes are to go on to 
org.apache.hive.hcatalog.* only for now.

> Columnar Pushdown for RC/ORC File not happening in HCatLoader 
> --
>
> Key: HIVE-5193
> URL: https://issues.apache.org/jira/browse/HIVE-5193
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.10.0, 0.11.0, 0.12.0
>Reporter: Viraj Bhat
>Assignee: Viraj Bhat
>  Labels: hcatalog
> Fix For: 0.13.0
>
> Attachments: HIVE-5193-2.patch, HIVE-5193.patch
>
>
> Currently the HCatLoader is not taking advantage of the 
> ColumnProjectionUtils. where it could skip columns during read. The 
> information is available in Pig it just needs to get to the Readers.
> Viraj



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6409) FileOutputCommitterContainer::commitJob() cancels delegation tokens too early.

2014-02-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916568#comment-13916568
 ] 

Sushanth Sowmyan commented on HIVE-6409:


Also, please do not patch any of the org.apache.hcatalog.* classes - they are 
deprecated and are being maintained in parity with the state they were in as of 
hive-0.11. The only reason we'd change them is if they do not build any more 
due to issues with hadoop or hive/etc. Bugfixes are to go on to 
org.apache.hive.hcatalog.* only for now.

> FileOutputCommitterContainer::commitJob() cancels delegation tokens too early.
> --
>
> Key: HIVE-6409
> URL: https://issues.apache.org/jira/browse/HIVE-6409
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-6409.trunk.patch
>
>
> When HCatalog's FileOutputCommitterContainer::commitJob() is run, it calls 
> the underlying OutputCommitter and then attempts to register partitions in 
> HCatalog.
> If the commit fails (for example, because of HIVE-4996), commitJob() cancels 
> delegation tokens retrieved from HCatalog before the exception is rethrown.
> {code}
> java.io.IOException: java.lang.reflect.InvocationTargetException
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:185)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:249)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:212)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:183)
> ... 5 more
> Caused by: org.apache.hcatalog.common.HCatException : 2006 : Error adding
> partition to metastore. Cause :
> MetaException(message:java.lang.RuntimeException: commitTransaction was called
> but openTransactionCalls = 0. This probably indicates that there are 
> unbalanced
> calls to openTransaction/commitTransaction)
> at
> org.apache.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:712)
> {code}
> The problem is that this happens before abortJob() has had a chance to run, 
> thus yielding the following error:
> {code}
> MetaException(message:Could not connect to meta store using any of the URIs 
> provided. Most recent failure: 
> org.apache.thrift.transport.TTransportException: Peer indicated failure: 
> DIGEST-MD5: IO error acquiring password
>   at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266)
>   at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>   at java.security.AccessController.doPrivileged(Native Method)
> ...
> {code}
> I'll have a patch out that only cancels delegation tokens if the commitJob() 
> has succeeded.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6409) FileOutputCommitterContainer::commitJob() cancels delegation tokens too early.

2014-02-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916564#comment-13916564
 ] 

Sushanth Sowmyan commented on HIVE-6409:


Also, actually, you may have to regenerate this patch real quick off trunk now, 
since FileOutputCommitterContainer has changed a bit.

> FileOutputCommitterContainer::commitJob() cancels delegation tokens too early.
> --
>
> Key: HIVE-6409
> URL: https://issues.apache.org/jira/browse/HIVE-6409
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-6409.trunk.patch
>
>
> When HCatalog's FileOutputCommitterContainer::commitJob() is run, it calls 
> the underlying OutputCommitter and then attempts to register partitions in 
> HCatalog.
> If the commit fails (for example, because of HIVE-4996), commitJob() cancels 
> delegation tokens retrieved from HCatalog before the exception is rethrown.
> {code}
> java.io.IOException: java.lang.reflect.InvocationTargetException
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:185)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:249)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:212)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:183)
> ... 5 more
> Caused by: org.apache.hcatalog.common.HCatException : 2006 : Error adding
> partition to metastore. Cause :
> MetaException(message:java.lang.RuntimeException: commitTransaction was called
> but openTransactionCalls = 0. This probably indicates that there are 
> unbalanced
> calls to openTransaction/commitTransaction)
> at
> org.apache.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:712)
> {code}
> The problem is that this happens before abortJob() has had a chance to run, 
> thus yielding the following error:
> {code}
> MetaException(message:Could not connect to meta store using any of the URIs 
> provided. Most recent failure: 
> org.apache.thrift.transport.TTransportException: Peer indicated failure: 
> DIGEST-MD5: IO error acquiring password
>   at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266)
>   at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>   at java.security.AccessController.doPrivileged(Native Method)
> ...
> {code}
> I'll have a patch out that only cancels delegation tokens if the commitJob() 
> has succeeded.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6409) FileOutputCommitterContainer::commitJob() cancels delegation tokens too early.

2014-02-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916562#comment-13916562
 ] 

Sushanth Sowmyan commented on HIVE-6409:


Hi Mithun, could you please cancel your patch, rename your patch to 
HIVE-6409.patch, and mark it as patch available again? That helps the 
pre-commit tests pick up the patch and run it with it.

> FileOutputCommitterContainer::commitJob() cancels delegation tokens too early.
> --
>
> Key: HIVE-6409
> URL: https://issues.apache.org/jira/browse/HIVE-6409
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-6409.trunk.patch
>
>
> When HCatalog's FileOutputCommitterContainer::commitJob() is run, it calls 
> the underlying OutputCommitter and then attempts to register partitions in 
> HCatalog.
> If the commit fails (for example, because of HIVE-4996), commitJob() cancels 
> delegation tokens retrieved from HCatalog before the exception is rethrown.
> {code}
> java.io.IOException: java.lang.reflect.InvocationTargetException
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:185)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:249)
> at
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:212)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:183)
> ... 5 more
> Caused by: org.apache.hcatalog.common.HCatException : 2006 : Error adding
> partition to metastore. Cause :
> MetaException(message:java.lang.RuntimeException: commitTransaction was called
> but openTransactionCalls = 0. This probably indicates that there are 
> unbalanced
> calls to openTransaction/commitTransaction)
> at
> org.apache.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:712)
> {code}
> The problem is that this happens before abortJob() has had a chance to run, 
> thus yielding the following error:
> {code}
> MetaException(message:Could not connect to meta store using any of the URIs 
> provided. Most recent failure: 
> org.apache.thrift.transport.TTransportException: Peer indicated failure: 
> DIGEST-MD5: IO error acquiring password
>   at 
> org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:190)
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:266)
>   at 
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
>   at 
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
>   at java.security.AccessController.doPrivileged(Native Method)
> ...
> {code}
> I'll have a patch out that only cancels delegation tokens if the commitJob() 
> has succeeded.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6475) Implement support for appending to mutable tables in HCatalog

2014-02-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6475:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Implement support for appending to mutable tables in HCatalog
> -
>
> Key: HIVE-6475
> URL: https://issues.apache.org/jira/browse/HIVE-6475
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: 6475.log, HIVE-6475.2.patch, HIVE-6475.patch
>
>
> Part of HIVE-6405, this is the implementation of the append feature on the 
> HCatalog side. If a table is mutable, we must support being able to append to 
> existing data instead of erroring out as  a duplicate publish.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6475) Implement support for appending to mutable tables in HCatalog

2014-02-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916549#comment-13916549
 ] 

Sushanth Sowmyan commented on HIVE-6475:


Committed. Thanks, Daniel.

> Implement support for appending to mutable tables in HCatalog
> -
>
> Key: HIVE-6475
> URL: https://issues.apache.org/jira/browse/HIVE-6475
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: 6475.log, HIVE-6475.2.patch, HIVE-6475.patch
>
>
> Part of HIVE-6405, this is the implementation of the append feature on the 
> HCatalog side. If a table is mutable, we must support being able to append to 
> existing data instead of erroring out as  a duplicate publish.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6360) Hadoop 2.3 + Tez 0.3

2014-02-28 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916532#comment-13916532
 ] 

Vikram Dixit K commented on HIVE-6360:
--

LGTM +1. Ran tests on my machine. 

> Hadoop 2.3 + Tez 0.3
> 
>
> Key: HIVE-6360
> URL: https://issues.apache.org/jira/browse/HIVE-6360
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-6360.1.patch, HIVE-6360.2.patch
>
>
> There are some things pending that rely on hadoop 2.3 or tez 0.3. These are 
> not released yet, but will be soon. I'm proposing to collect these in the tez 
> branch and do a merge back once these components have been released at that 
> version.
> The things depending on 0.3 or hadoop 2.3 are:
> - Zero Copy read for ORC
> - Unions in Tez
> - Tez on secure clusters
> - Changes to DagUtils to reflect tez 0.2 -> 0.3
> - Prewarm containers



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6475) Implement support for appending to mutable tables in HCatalog

2014-02-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6475:
---

Attachment: 6475.log

Since the pre-commit tests were stalled for a couple of days, I've run the 
tests for hcatalog locally with -Phadoop-1, and am attaching the output of the 
run.

With all tests passing, and Daniel's +1, I'm going to go ahead with committing 
this.

> Implement support for appending to mutable tables in HCatalog
> -
>
> Key: HIVE-6475
> URL: https://issues.apache.org/jira/browse/HIVE-6475
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog, Metastore, Query Processor, Thrift API
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: 6475.log, HIVE-6475.2.patch, HIVE-6475.patch
>
>
> Part of HIVE-6405, this is the implementation of the append feature on the 
> HCatalog side. If a table is mutable, we must support being able to append to 
> existing data instead of erroring out as  a duplicate publish.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6527) Child jvm hive logs missing

2014-02-28 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-6527:
---

 Summary: Child jvm hive logs missing
 Key: HIVE-6527
 URL: https://issues.apache.org/jira/browse/HIVE-6527
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan


2014-02-17 16:38:52,774 ERROR exec.Task (SessionState.java:printError(524)) -
Task failed!
Task ID:
  Stage-1

In some places in hive code, the hadoop jar commands are spawned as a new 
process from the shell. Since the newly invoked hadoop jar are part of the 
running query they should be present in the hive.log for the executing query. 
However, current the logs are missing for the spawned child jvms.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 18592: HIVE-6137

2014-02-28 Thread Mohammad Islam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18592/#review35863
---


+! after minor formatting.


http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java


nit: pls remove space.


- Mohammad Islam


On Feb. 28, 2014, 7:37 p.m., Hari Sankar Sivarama Subramaniyan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18592/
> ---
> 
> (Updated Feb. 28, 2014, 7:37 p.m.)
> 
> 
> Review request for hive and Thejas Nair.
> 
> 
> Bugs: HIVE-6137
> https://issues.apache.org/jira/browse/HIVE-6137
> 
> 
> Repository: hive
> 
> 
> Description
> ---
> 
> HIVE-6137 Improve error message when file is not found while creating a table.
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  1573040 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/queries/clientnegative/external1.q
>  1573040 
>   
> http://svn.apache.org/repos/asf/hive/trunk/ql/src/test/results/clientnegative/external1.q.out
>  1573040 
> 
> Diff: https://reviews.apache.org/r/18592/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Hari Sankar Sivarama Subramaniyan
> 
>



[jira] [Commented] (HIVE-6360) Hadoop 2.3 + Tez 0.3

2014-02-28 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916527#comment-13916527
 ] 

Gunther Hagleitner commented on HIVE-6360:
--

Ran tests locally. All passed except for minimr test udf_using, which is 
failing w or w/o patch.

> Hadoop 2.3 + Tez 0.3
> 
>
> Key: HIVE-6360
> URL: https://issues.apache.org/jira/browse/HIVE-6360
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-6360.1.patch, HIVE-6360.2.patch
>
>
> There are some things pending that rely on hadoop 2.3 or tez 0.3. These are 
> not released yet, but will be soon. I'm proposing to collect these in the tez 
> branch and do a merge back once these components have been released at that 
> version.
> The things depending on 0.3 or hadoop 2.3 are:
> - Zero Copy read for ORC
> - Unions in Tez
> - Tez on secure clusters
> - Changes to DagUtils to reflect tez 0.2 -> 0.3
> - Prewarm containers



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6360) Hadoop 2.3 + Tez 0.3

2014-02-28 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6360:
-

Status: Patch Available  (was: Open)

> Hadoop 2.3 + Tez 0.3
> 
>
> Key: HIVE-6360
> URL: https://issues.apache.org/jira/browse/HIVE-6360
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-6360.1.patch, HIVE-6360.2.patch
>
>
> There are some things pending that rely on hadoop 2.3 or tez 0.3. These are 
> not released yet, but will be soon. I'm proposing to collect these in the tez 
> branch and do a merge back once these components have been released at that 
> version.
> The things depending on 0.3 or hadoop 2.3 are:
> - Zero Copy read for ORC
> - Unions in Tez
> - Tez on secure clusters
> - Changes to DagUtils to reflect tez 0.2 -> 0.3
> - Prewarm containers



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6360) Hadoop 2.3 + Tez 0.3

2014-02-28 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6360:
-

Attachment: HIVE-6360.2.patch

.2 addresses review comments and add change to golden file for FileDumpTest.

> Hadoop 2.3 + Tez 0.3
> 
>
> Key: HIVE-6360
> URL: https://issues.apache.org/jira/browse/HIVE-6360
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-6360.1.patch, HIVE-6360.2.patch
>
>
> There are some things pending that rely on hadoop 2.3 or tez 0.3. These are 
> not released yet, but will be soon. I'm proposing to collect these in the tez 
> branch and do a merge back once these components have been released at that 
> version.
> The things depending on 0.3 or hadoop 2.3 are:
> - Zero Copy read for ORC
> - Unions in Tez
> - Tez on secure clusters
> - Changes to DagUtils to reflect tez 0.2 -> 0.3
> - Prewarm containers



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5504) OrcOutputFormat honors compression properties only from within hive

2014-02-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5504:
---

Status: Patch Available  (was: Open)

> OrcOutputFormat honors  compression  properties only from within hive
> -
>
> Key: HIVE-5504
> URL: https://issues.apache.org/jira/browse/HIVE-5504
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0, 0.11.0, 0.13.0
>Reporter: Venkat Ranganathan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-5504.2.patch, HIVE-5504.patch
>
>
> When we import data into a HCatalog table created with the following storage  
> description
> .. stored as orc tblproperties ("orc.compress"="SNAPPY") 
> the resultant orc file still uses the default zlib compression
> It looks like HCatOutputFormat is ignoring the tblproperties specified.   
> show tblproperties shows that the table indeed has the properties properly 
> saved.
> An insert/select into the table has the resulting orc file honor the tbl 
> property.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5504) OrcOutputFormat honors compression properties only from within hive

2014-02-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-5504:
---

Affects Version/s: 0.13.0
   Status: Open  (was: Patch Available)

Re-canceling patch and resubmitting so that precommit tests pick it up, now 
that it's running again.

> OrcOutputFormat honors  compression  properties only from within hive
> -
>
> Key: HIVE-5504
> URL: https://issues.apache.org/jira/browse/HIVE-5504
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0, 0.11.0, 0.13.0
>Reporter: Venkat Ranganathan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-5504.2.patch, HIVE-5504.patch
>
>
> When we import data into a HCatalog table created with the following storage  
> description
> .. stored as orc tblproperties ("orc.compress"="SNAPPY") 
> the resultant orc file still uses the default zlib compression
> It looks like HCatOutputFormat is ignoring the tblproperties specified.   
> show tblproperties shows that the table indeed has the properties properly 
> saved.
> An insert/select into the table has the resulting orc file honor the tbl 
> property.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6325) Enable using multiple concurrent sessions in tez

2014-02-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916505#comment-13916505
 ] 

Hive QA commented on HIVE-6325:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12631652/HIVE-6325.9.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5183 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_auto_sortmerge_join_16
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
org.apache.hive.service.cli.TestEmbeddedThriftBinaryCLIService.testExecuteStatementAsync
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1557/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1557/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12631652

> Enable using multiple concurrent sessions in tez
> 
>
> Key: HIVE-6325
> URL: https://issues.apache.org/jira/browse/HIVE-6325
> Project: Hive
>  Issue Type: Improvement
>  Components: Tez
>Affects Versions: 0.13.0
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-6325.1.patch, HIVE-6325.2.patch, HIVE-6325.3.patch, 
> HIVE-6325.4.patch, HIVE-6325.5.patch, HIVE-6325.6.patch, HIVE-6325.7.patch, 
> HIVE-6325.8.patch, HIVE-6325.9.patch
>
>
> We would like to enable multiple concurrent sessions in tez via hive server 
> 2. This will enable users to make efficient use of the cluster when it has 
> been partitioned using yarn queues.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6455) Scalable dynamic partitioning and bucketing optimization

2014-02-28 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-6455:
-

Attachment: HIVE-6455.9.patch

Reuploading for HIVE QA to pick it up.

> Scalable dynamic partitioning and bucketing optimization
> 
>
> Key: HIVE-6455
> URL: https://issues.apache.org/jira/browse/HIVE-6455
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: optimization
> Attachments: HIVE-6455.1.patch, HIVE-6455.1.patch, 
> HIVE-6455.10.patch, HIVE-6455.2.patch, HIVE-6455.3.patch, HIVE-6455.4.patch, 
> HIVE-6455.4.patch, HIVE-6455.5.patch, HIVE-6455.6.patch, HIVE-6455.7.patch, 
> HIVE-6455.8.patch, HIVE-6455.9.patch, HIVE-6455.9.patch
>
>
> The current implementation of dynamic partition works by keeping at least one 
> record writer open per dynamic partition directory. In case of bucketing 
> there can be multispray file writers which further adds up to the number of 
> open record writers. The record writers of column oriented file format (like 
> ORC, RCFile etc.) keeps some sort of in-memory buffers (value buffer or 
> compression buffers) open all the time to buffer up the rows and compress 
> them before flushing it to disk. Since these buffers are maintained per 
> column basis the amount of constant memory that will required at runtime 
> increases as the number of partitions and number of columns per partition 
> increases. This often leads to OutOfMemory (OOM) exception in mappers or 
> reducers depending on the number of open record writers. Users often tune the 
> JVM heapsize (runtime memory) to get over such OOM issues. 
> With this optimization, the dynamic partition columns and bucketing columns 
> (in case of bucketed tables) are sorted before being fed to the reducers. 
> Since the partitioning and bucketing columns are sorted, each reducers can 
> keep only one record writer open at any time thereby reducing the memory 
> pressure on the reducers. This optimization is highly scalable as the number 
> of partition and number of columns per partition increases at the cost of 
> sorting the columns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6459) Change the precison/scale for intermediate sum result in the avg() udf

2014-02-28 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916476#comment-13916476
 ] 

Prasad Mujumdar commented on HIVE-6459:
---

+1


> Change the precison/scale for intermediate sum result in the avg() udf 
> ---
>
> Key: HIVE-6459
> URL: https://issues.apache.org/jira/browse/HIVE-6459
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.13.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-6459.1.patch, HIVE-6459.2.patch, HIVE-6459.3.patch, 
> HIVE-6459.4.patch, HIVE-6459.patch
>
>
> The avg() udf, when applied to a decimal column, selects the precision/scale 
> of the intermediate sum field as (p+4, s+4), which is the same for the 
> precision/scale of the avg() result. However, the additional scale increase 
> is unnecessary, and the problem of data overflow may occur. The requested 
> change is that for the intermediate sum result,  the precsion/scale is set to 
> (p+10, s), which is consistent to sum() udf. The avg() result still keeps its 
> precision/scale.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.09.patch

removed leftover BSS changes, filed HIVE-6526.
Should be ready to go... would be nice to have HiveQA too

> MapJoinKey has large memory overhead in typical cases
> -
>
> Key: HIVE-6429
> URL: https://issues.apache.org/jira/browse/HIVE-6429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
> HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
> HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
> HIVE-6429.09.patch, HIVE-6429.WIP.patch, HIVE-6429.patch
>
>
> The only thing that MJK really needs it hashCode and equals (well, and 
> construction), so there's no need to have array of writables in there. 
> Assuming all the keys for a table have the same structure, for the common 
> case where keys are primitive types, we can store something like a byte array 
> combination of keys to reduce the memory usage. Will probably speed up 
> compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 18230: HIVE-6429 MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18230/
---

(Updated Feb. 28, 2014, 10:04 p.m.)


Review request for hive, Gunther Hagleitner and Jitendra Pandey.


Repository: hive-git


Description
---

See JIRA


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6802b4d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractMapJoinOperator.java 
3cfaacf 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableLoader.java 988cc57 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 8b25300 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java 46e37c2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 9948583 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HashTableLoader.java 5cf347b 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKey.java 
2ac0928 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyBytes.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinKeyObject.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 0279f7c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HashTableLoader.java 295854d 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java
 581046e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
2466a3b 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSMBMapJoinOperator.java 
997202f 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinEqualityTableContainer.java
 d17b656 
  ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinKey.java 
a103a51 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/TestMapJoinTableContainer.java
 40bf006 
  ql/src/test/org/apache/hadoop/hive/ql/exec/persistence/Utilities.java 22eca50 
  
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/BinarySortableSerDe.java
 fcded96 
  
serde/src/java/org/apache/hadoop/hive/serde2/binarysortable/OutputByteBuffer.java
 7bfe473 
  serde/src/java/org/apache/hadoop/hive/serde2/io/HiveDecimalWritable.java 
67cb1e8 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinarySerDe.java 
f9b4031 
  serde/src/java/org/apache/hadoop/hive/serde2/lazybinary/LazyBinaryUtils.java 
c583ae2 

Diff: https://reviews.apache.org/r/18230/diff/


Testing
---


Thanks,

Sergey Shelukhin



[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Status: Open  (was: Patch Available)

> MapJoinKey has large memory overhead in typical cases
> -
>
> Key: HIVE-6429
> URL: https://issues.apache.org/jira/browse/HIVE-6429
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
> HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
> HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
> HIVE-6429.WIP.patch, HIVE-6429.patch
>
>
> The only thing that MJK really needs it hashCode and equals (well, and 
> construction), so there's no need to have array of writables in there. 
> Assuming all the keys for a table have the same structure, for the common 
> case where keys are primitive types, we can store something like a byte array 
> combination of keys to reduce the memory usage. Will probably speed up 
> compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6526) clean up BinarySortableSerde a bit

2014-02-28 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-6526:
--

 Summary: clean up BinarySortableSerde a bit
 Key: HIVE-6526
 URL: https://issues.apache.org/jira/browse/HIVE-6526
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Priority: Minor
 Attachments: HIVE-6526.patch

After another round of rewrite HIVE-6429 ended up with some refactoring in 
BinarySortableSerde.
Unfortunately it breaks some tests (which the original patch that changed BSS 
didn't, so I guess it must be really subtle or really stupid).
I don't have time now to track down that issue, so will submit a patch here, 
and do it later, to unblock HIVE-6429



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6526) clean up BinarySortableSerde a bit

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6526:
---

Attachment: HIVE-6526.patch

parking the patch here for now

> clean up BinarySortableSerde a bit
> --
>
> Key: HIVE-6526
> URL: https://issues.apache.org/jira/browse/HIVE-6526
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-6526.patch
>
>
> After another round of rewrite HIVE-6429 ended up with some refactoring in 
> BinarySortableSerde.
> Unfortunately it breaks some tests (which the original patch that changed BSS 
> didn't, so I guess it must be really subtle or really stupid).
> I don't have time now to track down that issue, so will submit a patch here, 
> and do it later, to unblock HIVE-6429



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Assigned] (HIVE-6526) clean up BinarySortableSerde a bit

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-6526:
--

Assignee: Sergey Shelukhin

> clean up BinarySortableSerde a bit
> --
>
> Key: HIVE-6526
> URL: https://issues.apache.org/jira/browse/HIVE-6526
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-6526.patch
>
>
> After another round of rewrite HIVE-6429 ended up with some refactoring in 
> BinarySortableSerde.
> Unfortunately it breaks some tests (which the original patch that changed BSS 
> didn't, so I guess it must be really subtle or really stupid).
> I don't have time now to track down that issue, so will submit a patch here, 
> and do it later, to unblock HIVE-6429



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6525) Fix some whitespace issues in GenTezUtils

2014-02-28 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6525:
-

Attachment: HIVE-6525.1.patch

> Fix some whitespace issues in GenTezUtils
> -
>
> Key: HIVE-6525
> URL: https://issues.apache.org/jira/browse/HIVE-6525
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>Priority: Minor
> Fix For: tez-branch
>
> Attachments: HIVE-6525.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6525) Fix some whitespace issues in GenTezUtils

2014-02-28 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-6525:


 Summary: Fix some whitespace issues in GenTezUtils
 Key: HIVE-6525
 URL: https://issues.apache.org/jira/browse/HIVE-6525
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Minor
 Fix For: tez-branch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Execution log for qfile tests

2014-02-28 Thread Sergey Shelukhin
That doesn't work for me... I will take a look later.


On Mon, Feb 24, 2014 at 2:11 PM, Brock Noland  wrote:

> They always show up in ${system:java.io.tmp}/$USER for me. For example:
>
> $ head -5 $TMPDIR/noland/noland*
> ==>
> /var/folders/6l/2kf3r2pj1t176h2nhdwfpk1rgp/T//noland/noland_20140221162424_103e159a-4511-42ae-a84a-01cd95820886.log
> <==
> 2014-02-21 16:24:36,807 WARN  common.LogUtils
> (LogUtils.java:logConfigLocation(145)) - hive-site.xml not found on
> CLASSPATH
> 2014-02-21 16:24:36,816 INFO  mr.ExecDriver
> (SessionState.java:printInfo(417)) - Execution log at:
>
> /var/folders/6l/2kf3r2pj1t176h2nhdwfpk1rgp/T//noland/noland_20140221162424_103e159a-4511-42ae-a84a-01cd95820886.log
> 2014-02-21 16:24:36,974 WARN  conf.Configuration
> (Configuration.java:loadProperty(2345)) -
>
> file:/var/folders/6l/2kf3r2pj1t176h2nhdwfpk1rgp/T/noland/hive_2014-02-21_16-24-32_176_6585530032622480558-3/-local-10002/jobconf.xml:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> 2014-02-21 16:24:36,985 WARN  conf.Configuration
> (Configuration.java:loadProperty(2345)) -
>
> file:/var/folders/6l/2kf3r2pj1t176h2nhdwfpk1rgp/T/noland/hive_2014-02-21_16-24-32_176_6585530032622480558-3/-local-10002/jobconf.xml:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> 2014-02-21 16:24:37,071 INFO  log.PerfLogger
> (PerfLogger.java:PerfLogBegin(97)) -  from=org.apache.hadoop.hive.ql.exec.Utilities>
>
> ==>
> /var/folders/6l/2kf3r2pj1t176h2nhdwfpk1rgp/T//noland/noland_20140221162424_5629b8d1-77f3-4180-a000-220c0e551812.log
> <==
> 2014-02-21 16:24:27,004 WARN  common.LogUtils
> (LogUtils.java:logConfigLocation(145)) - hive-site.xml not found on
> CLASSPATH
> 2014-02-21 16:24:27,013 INFO  mr.ExecDriver
> (SessionState.java:printInfo(417)) - Execution log at:
>
> /var/folders/6l/2kf3r2pj1t176h2nhdwfpk1rgp/T//noland/noland_20140221162424_5629b8d1-77f3-4180-a000-220c0e551812.log
> 2014-02-21 16:24:27,175 WARN  conf.Configuration
> (Configuration.java:loadProperty(2345)) -
>
> file:/var/folders/6l/2kf3r2pj1t176h2nhdwfpk1rgp/T/noland/hive_2014-02-21_16-24-20_980_1410307595308328042-2/-local-10002/jobconf.xml:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> 2014-02-21 16:24:27,186 WARN  conf.Configuration
> (Configuration.java:loadProperty(2345)) -
>
> file:/var/folders/6l/2kf3r2pj1t176h2nhdwfpk1rgp/T/noland/hive_2014-02-21_16-24-20_980_1410307595308328042-2/-local-10002/jobconf.xml:an
> attempt to override final parameter:
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> 2014-02-21 16:24:27,279 INFO  log.PerfLogger
> (PerfLogger.java:PerfLogBegin(97)) -  from=org.apache.hadoop.hive.ql.exec.Utilities>
>
> On Mon, Feb 24, 2014 at 4:00 PM, Sergey Shelukhin
>  wrote:
> > That doesn't contain logs from local MR tasks for me...
> >
> > So all there is is
> >
> > 2014-02-24 13:50:35,434 INFO  mr.MapredLocalTask
> > (MapredLocalTask.java:execute(254)) - Executing:
> > /Users/sergey/git/hive/itests/qtest/../../testutils/hadoop jar
> >
> /Users/sergey/.m2/repository/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT.jar
> > org.apache.hadoop.hive.ql.exec.mr.ExecDriver -localtask -plan
> >
> file:/Users/sergey/git/hive/itests/qtest/target/tmp/localscratchdir/hive_2014-02-24_13-50-35_104_1296981405740899980-1/-local-10006/plan.xml
> > -jobconffile
> >
> file:/Users/sergey/git/hive/itests/qtest/target/tmp/localscratchdir/hive_2014-02-24_13-50-35_104_1296981405740899980-1/-local-10007/jobconf.xml
> > ...
> > 2014-02-24 13:50:41,680 ERROR exec.Task
> (SessionState.java:printError(533))
> > - Execution failed with exit status: 2
> > 2014-02-24 13:50:41,680 ERROR exec.Task
> (SessionState.java:printError(533))
> > - Obtaining error information
> > 2014-02-24 13:50:41,681 ERROR exec.Task
> (SessionState.java:printError(533))
> > -
> > Task failed!
> > Task ID:
> >   Stage-6
> >
> > Logs:
> >
> > 2014-02-24 13:50:41,681 ERROR exec.Task
> (SessionState.java:printError(533))
> > - /Users/sergey/git/hive/itests/qtest/target/tmp/log/hive.log
> > 2014-02-24 13:50:41,681 ERROR mr.MapredLocalTask
> > (MapredLocalTask.java:execute(270)) - Execution failed with exit status:
> 2
> >
> > and I can't find the logs anywhere... do you know where they are if
> > anywhere?
> >
> >
> > On Mon, Feb 17, 2014 at 6:51 PM, Lefty Leverenz  >wrote:
> >
> >> Sweet.  Thanks Brock.
> >>
> >> -- Lefty
> >>
> >>
> >> On Mon, Feb 17, 2014 at 6:43 PM, Brock Noland 
> wrote:
> >>
> >> > Good call! I added this here:
> >> > https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ
> >> >
> >> > On Mon, Feb 17, 2014 at 8:24 PM, Lefty Leverenz <
> leftylever...@gmail.com
> >> >
> >> > wrote:
> >> > > Should this be mentioned somewhere in the Hive wiki, or is just a
> >> Hadoop
> >> > > doc issue?
> >> > >
> >> > > -- Lefty
> >> > >
> >> > >
> >> > > On Mon

[jira] [Updated] (HIVE-5998) Add vectorized reader for Parquet files

2014-02-28 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-5998:
---

Status: Patch Available  (was: Open)

The patch.6 failure does not seem to be related to the patch, resubmitting. 

> Add vectorized reader for Parquet files
> ---
>
> Key: HIVE-5998
> URL: https://issues.apache.org/jira/browse/HIVE-5998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers, Vectorization
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
>  Labels: Parquet, vectorization
> Attachments: HIVE-5998.1.patch, HIVE-5998.2.patch, HIVE-5998.3.patch, 
> HIVE-5998.4.patch, HIVE-5998.5.patch, HIVE-5998.6.patch, HIVE-5998.7.patch
>
>
> HIVE-5783 is adding native Parquet support in Hive. As Parquet is a columnar 
> format, it makes sense to provide a vectorized reader, similar to how RC and 
> ORC formats have, to benefit from vectorized execution engine.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5998) Add vectorized reader for Parquet files

2014-02-28 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-5998:
---

Attachment: HIVE-5998.7.patch

> Add vectorized reader for Parquet files
> ---
>
> Key: HIVE-5998
> URL: https://issues.apache.org/jira/browse/HIVE-5998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers, Vectorization
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
>  Labels: Parquet, vectorization
> Attachments: HIVE-5998.1.patch, HIVE-5998.2.patch, HIVE-5998.3.patch, 
> HIVE-5998.4.patch, HIVE-5998.5.patch, HIVE-5998.6.patch, HIVE-5998.7.patch
>
>
> HIVE-5783 is adding native Parquet support in Hive. As Parquet is a columnar 
> format, it makes sense to provide a vectorized reader, similar to how RC and 
> ORC formats have, to benefit from vectorized execution engine.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-5998) Add vectorized reader for Parquet files

2014-02-28 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-5998:
---

Status: Open  (was: Patch Available)

> Add vectorized reader for Parquet files
> ---
>
> Key: HIVE-5998
> URL: https://issues.apache.org/jira/browse/HIVE-5998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers, Vectorization
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
>  Labels: Parquet, vectorization
> Attachments: HIVE-5998.1.patch, HIVE-5998.2.patch, HIVE-5998.3.patch, 
> HIVE-5998.4.patch, HIVE-5998.5.patch, HIVE-5998.6.patch
>
>
> HIVE-5783 is adding native Parquet support in Hive. As Parquet is a columnar 
> format, it makes sense to provide a vectorized reader, similar to how RC and 
> ORC formats have, to benefit from vectorized execution engine.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6524) Update ORC Filedump stripe sizes to match the memory manager changes

2014-02-28 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6524:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to branch. Thanks Gopal!

> Update ORC Filedump stripe sizes to match the memory manager changes
> 
>
> Key: HIVE-6524
> URL: https://issues.apache.org/jira/browse/HIVE-6524
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: tez-branch
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Fix For: tez-branch
>
> Attachments: HIVE-6524.1-tez.patch
>
>
> The MemoryManager in ORC now resets to default whenever we close all open 
> writers. 
> This results in consistent (but different from test golden) stripe sizes for 
> all files being written.
> Fix the goldens.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6523) Tests with -Phadoop-2 and MiniMRCluster error if it doesn't find yarn-site.xml

2014-02-28 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916386#comment-13916386
 ] 

Sushanth Sowmyan commented on HIVE-6523:


Oh good that there's a patch up for that already. And yes, patching every 
minimr test with this is very much a hacky interim solution. We can ignore if 
that patch gets committed.

> Tests with -Phadoop-2 and MiniMRCluster error if it doesn't find yarn-site.xml
> --
>
> Key: HIVE-6523
> URL: https://issues.apache.org/jira/browse/HIVE-6523
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
> Environment: Hadoop 2.4.*
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-6523.patch
>
>
> With the newer hadoop versions (2.4+) in tests, MiniMRCluster throws an error 
> loading resources if it can't find a yarn-site.xml in its classpath, which 
> affects test runs with -Phadoop-2 and minimrclusters.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6524) Update ORC Filedump stripe sizes to match the memory manager changes

2014-02-28 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916365#comment-13916365
 ] 

Prasanth J commented on HIVE-6524:
--

LGTM. +1 (non-binding)

> Update ORC Filedump stripe sizes to match the memory manager changes
> 
>
> Key: HIVE-6524
> URL: https://issues.apache.org/jira/browse/HIVE-6524
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: tez-branch
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Fix For: tez-branch
>
> Attachments: HIVE-6524.1-tez.patch
>
>
> The MemoryManager in ORC now resets to default whenever we close all open 
> writers. 
> This results in consistent (but different from test golden) stripe sizes for 
> all files being written.
> Fix the goldens.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6514) TestExecDriver/HCat Pig tests fails with -Phadoop-2

2014-02-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916377#comment-13916377
 ] 

Hive QA commented on HIVE-6514:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12631629/HIVE-6514.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5180 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1556/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1556/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12631629

> TestExecDriver/HCat Pig tests fails with -Phadoop-2
> ---
>
> Key: HIVE-6514
> URL: https://issues.apache.org/jira/browse/HIVE-6514
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.13.0
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-6514.1.patch, HIVE-6514.2.patch, HIVE-6514.3.patch
>
>
> Running TestExecDriver with -Phadoop-2 results in the error below. Looks like 
> the test isn't able to access LocalClientProtocolProvider.
> {noformat}
> java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:449)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:396)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:739)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Job Submission failed with exception 'java.io.IOException(Cannot initialize 
> Cluster. Please check your configuration for mapreduce.framework.name and the 
> correspond server addresses.)'
> Execution failed with exit status: 1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-28 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916358#comment-13916358
 ] 

Gunther Hagleitner commented on HIVE-6492:
--

Thanks, Selina. Just trying to understand the requirements to see what's the 
best way to get this in.

One question is whether you can deploy different configs in these scenarios. 
E.g: use a different site file is someone is starting hive on the console v 
tools. Or use an alias to add a --hiveconf on the node where users start hive. 
You're trying to protect the cluster from large jobs - in your case you seem to 
want to turn this on for certain interfaces and off for others, but for other 
deployments that might not make much sense (the interface (ODBC/JDBC/CLI) 
doesn't say if it's a human, tool, etc).

But specifically:

1) What's "small"? Sounds like if it's a query doesn't submit a job you want to 
let it go through? Or only if there's an explicit limit clause?
2) That's the same as 1 - if you just check for "no job started"
3) Aggregation on partition key right now will scan the entire table in a 
massive map-red job. Definitely something that should be fixed - but there's no 
optimization for that yet afaik. Allowing this query seems to defeat the 
purpose of the this flag doesn't it? Seems like again you just want to check 
for "no job started".

With that - it would make sense to update/extend the hive.mapred.mode variable 
to allow for queries that don't actually start a job (and allow jobs only with 
explicit partition pruning). That change + different config for different 
interfaces you should get all that you want and would be simpler. Correct?

> limit partition number involved in a table scan
> ---
>
> Key: HIVE-6492
> URL: https://issues.apache.org/jira/browse/HIVE-6492
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-6492.1.patch.txt
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> To protect the cluster, a new configure variable 
> "hive.limit.query.max.table.partition" is added to hive configuration to
> limit the table partitions involved in a table scan. 
> The default value will be set to -1 which means there is no limit by default. 
> This variable will not affect "metadata only" query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6524) Update ORC Filedump stripe sizes to match the memory manager changes

2014-02-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-6524:
--

Release Note: Fix orc filedump tests to check for 5k row stripes for all 
but the final stripe
  Status: Patch Available  (was: Open)

> Update ORC Filedump stripe sizes to match the memory manager changes
> 
>
> Key: HIVE-6524
> URL: https://issues.apache.org/jira/browse/HIVE-6524
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: tez-branch
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Fix For: tez-branch
>
> Attachments: HIVE-6524.1-tez.patch
>
>
> The MemoryManager in ORC now resets to default whenever we close all open 
> writers. 
> This results in consistent (but different from test golden) stripe sizes for 
> all files being written.
> Fix the goldens.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6524) Update ORC Filedump stripe sizes to match the memory manager changes

2014-02-28 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-6524:
--

Attachment: HIVE-6524.1-tez.patch

{code}
 Stripes:
-  Stripe: offset: 3 data: 102311 rows: 4000 tail: 68 index: 224
+  Stripe: offset: 3 data: 144733 rows: 5000 tail: 68 index: 235
{code}

And everything else changes because of the first stripe being 5k rows.

A previous 21k orc writer was causing a leak into the next file, which ended up 
with 4k rows for 1st stream instead of the full 5k.

> Update ORC Filedump stripe sizes to match the memory manager changes
> 
>
> Key: HIVE-6524
> URL: https://issues.apache.org/jira/browse/HIVE-6524
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: tez-branch
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Fix For: tez-branch
>
> Attachments: HIVE-6524.1-tez.patch
>
>
> The MemoryManager in ORC now resets to default whenever we close all open 
> writers. 
> This results in consistent (but different from test golden) stripe sizes for 
> all files being written.
> Fix the goldens.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6459) Change the precison/scale for intermediate sum result in the avg() udf

2014-02-28 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6459:
--

Attachment: HIVE-6459.4.patch

Patch #4 rebased with trunk and addressed review comments.

> Change the precison/scale for intermediate sum result in the avg() udf 
> ---
>
> Key: HIVE-6459
> URL: https://issues.apache.org/jira/browse/HIVE-6459
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.13.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-6459.1.patch, HIVE-6459.2.patch, HIVE-6459.3.patch, 
> HIVE-6459.4.patch, HIVE-6459.patch
>
>
> The avg() udf, when applied to a decimal column, selects the precision/scale 
> of the intermediate sum field as (p+4, s+4), which is the same for the 
> precision/scale of the avg() result. However, the additional scale increase 
> is unnecessary, and the problem of data overflow may occur. The requested 
> change is that for the intermediate sum result,  the precsion/scale is set to 
> (p+10, s), which is consistent to sum() udf. The avg() result still keeps its 
> precision/scale.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6523) Tests with -Phadoop-2 and MiniMRCluster error if it doesn't find yarn-site.xml

2014-02-28 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916331#comment-13916331
 ] 

Jason Dere commented on HIVE-6523:
--

Not to mention doing the workaround on our end means having to patch every 
minimr test, of which there are many

> Tests with -Phadoop-2 and MiniMRCluster error if it doesn't find yarn-site.xml
> --
>
> Key: HIVE-6523
> URL: https://issues.apache.org/jira/browse/HIVE-6523
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
> Environment: Hadoop 2.4.*
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-6523.patch
>
>
> With the newer hadoop versions (2.4+) in tests, MiniMRCluster throws an error 
> loading resources if it can't find a yarn-site.xml in its classpath, which 
> affects test runs with -Phadoop-2 and minimrclusters.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6523) Tests with -Phadoop-2 and MiniMRCluster error if it doesn't find yarn-site.xml

2014-02-28 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916325#comment-13916325
 ] 

Jason Dere commented on HIVE-6523:
--

FYI there is also YARN-1758, which should fix the NPE in yarn. If we wait for 
this fix I believe the tests should work again without having to do a fix on 
our end.

> Tests with -Phadoop-2 and MiniMRCluster error if it doesn't find yarn-site.xml
> --
>
> Key: HIVE-6523
> URL: https://issues.apache.org/jira/browse/HIVE-6523
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
> Environment: Hadoop 2.4.*
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-6523.patch
>
>
> With the newer hadoop versions (2.4+) in tests, MiniMRCluster throws an error 
> loading resources if it can't find a yarn-site.xml in its classpath, which 
> affects test runs with -Phadoop-2 and minimrclusters.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HIVE-6524) Update ORC Filedump stripe sizes to match the memory manager changes

2014-02-28 Thread Gopal V (JIRA)
Gopal V created HIVE-6524:
-

 Summary: Update ORC Filedump stripe sizes to match the memory 
manager changes
 Key: HIVE-6524
 URL: https://issues.apache.org/jira/browse/HIVE-6524
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: tez-branch
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
 Fix For: tez-branch


The MemoryManager in ORC now resets to default whenever we close all open 
writers. 

This results in consistent (but different from test golden) stripe sizes for 
all files being written.

Fix the goldens.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6492) limit partition number involved in a table scan

2014-02-28 Thread Selina Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916310#comment-13916310
 ] 

Selina Zhang commented on HIVE-6492:


Strict mode disables types of queries we cannot disabled. We need:
1. enable queries on small table without partition filters;
2. "select * from table" issues from Tableau, because it is a must to enable 
Tableau connects Hive Server directly through ODBC driver;
3. Enable aggregation on partition keys without partition limits. 
Thanks for reviewing the changes!

> limit partition number involved in a table scan
> ---
>
> Key: HIVE-6492
> URL: https://issues.apache.org/jira/browse/HIVE-6492
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Affects Versions: 0.12.0
>Reporter: Selina Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-6492.1.patch.txt
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> To protect the cluster, a new configure variable 
> "hive.limit.query.max.table.partition" is added to hive configuration to
> limit the table partitions involved in a table scan. 
> The default value will be set to -1 which means there is no limit by default. 
> This variable will not affect "metadata only" query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6519) Allow optional "as" in subquery definition

2014-02-28 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916295#comment-13916295
 ] 

Gunther Hagleitner commented on HIVE-6519:
--

Test failures are unrelated...

> Allow optional "as" in subquery definition
> --
>
> Key: HIVE-6519
> URL: https://issues.apache.org/jira/browse/HIVE-6519
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>Priority: Minor
> Attachments: HIVE-6519.1.patch
>
>
> Allow both:
> select * from (select * from foo) bar 
> select * from (select * from foo) as bar 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


need advice on debugging into TempletonJobController.java

2014-02-28 Thread Eric Hanson (BIG DATA)
I want to attach a debugger to TempletonJobController.java (code that runs in a 
map job started by templeton service, that in turn will start another job). 
Does anybody know how to make the job wait for a debugger to attach? i.e. what 
file to modify to change the java opts?

Eric

Details of what I tried:

I tried adding it in %hadoop_home%/conf/mapred-site.xml but it didn't work:

  
mapred.child.java.opts
-Xdebug -Djava.compiler=NONE 
-Xrunjdwp:transport=dt_socket,address=8004,server=y,suspend=y -Xmx1024m
  

I also tried this, in:

%hcatalog_home%\etc\webhcat\webhcat-default.xml

Adding:


templeton.controller.mr.child.opts
-Xdebug -Djava.compiler=NONE 
-Xrunjdwp:transport=dt_socket,address=8004,server=y,suspend=y -server -Xmx256m 
-Djava.net.preferIPv4Stack=true
Java options to be passed to templeton controller map task.
The default value of mapreduce child "-Xmx" (heap memory limit)
might be close to what is allowed for a map task.
Even if templeton  controller map task does not need much
memory, the jvm (with -server option?)
allocates the max memory when it starts. This along with the
memory used by pig/hive client it starts can end up exceeding
the max memory configured to be allowed for a map task
Use this option to set -Xmx to lower value

  

But the job doesn't appear to wait, and I keep seeing this in my job config:

mapred.child.java.opts

-server -Xmx256m -Djava.net.preferIPv4Stack=true




[jira] [Commented] (HIVE-6459) Change the precison/scale for intermediate sum result in the avg() udf

2014-02-28 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916285#comment-13916285
 ] 

Xuefu Zhang commented on HIVE-6459:
---

[~prasadm] I don't think the failures are related. However, I will upload a new 
patch to address your comments on RB.

> Change the precison/scale for intermediate sum result in the avg() udf 
> ---
>
> Key: HIVE-6459
> URL: https://issues.apache.org/jira/browse/HIVE-6459
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.13.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-6459.1.patch, HIVE-6459.2.patch, HIVE-6459.3.patch, 
> HIVE-6459.patch
>
>
> The avg() udf, when applied to a decimal column, selects the precision/scale 
> of the intermediate sum field as (p+4, s+4), which is the same for the 
> precision/scale of the avg() result. However, the additional scale increase 
> is unnecessary, and the problem of data overflow may occur. The requested 
> change is that for the intermediate sum result,  the precsion/scale is set to 
> (p+10, s), which is consistent to sum() udf. The avg() result still keeps its 
> precision/scale.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-6518) Add a GC canary to the VectorGroupByOperator to flush whenever a GC is triggered

2014-02-28 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916282#comment-13916282
 ] 

Remus Rusanu commented on HIVE-6518:


Can you somehow modify the LOG.debug at top of flush() to call out that the 
flush was triggered by the gcCanary.get() == null? I was thinking: keep a count 
of gcCanary allocations and print it in the LOG.debug message, this will tell 
us if the GC is the trigger and also will tell how often has occured in the 
operator lifetime, when debugging etc.
+1

> Add a GC canary to the VectorGroupByOperator to flush whenever a GC is 
> triggered
> 
>
> Key: HIVE-6518
> URL: https://issues.apache.org/jira/browse/HIVE-6518
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Minor
> Attachments: HIVE-6518.1-tez.patch
>
>
> The current VectorGroupByOperator implementation flushes the in-memory hashes 
> when the maximum entries or fraction of memory is hit.
> This works for most cases, but there are some corner cases where we hit GC 
> ovehead limits or heap size limits before either of those conditions are 
> reached due to the rest of the pipeline.
> This patch adds a SoftReference as a GC canary. If the soft reference is 
> dead, then a full GC pass happened sometime in the near past & the 
> aggregation hashtables should be flushed immediately before another full GC 
> is triggered.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Review Request 18478: HIVE-6459: Change the precison/scale for intermediate sum result in the avg() udf

2014-02-28 Thread Xuefu Zhang


> On Feb. 28, 2014, 7:51 p.m., Prasad Mujumdar wrote:
> > ql/src/test/queries/clientpositive/vector_decimal_aggregate.q, line 7
> > 
> >
> > Is the vectorization disabled intentionally ?

I guess it's by accident. Thank for catching this. I will fix this.


- Xuefu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18478/#review35825
---


On Feb. 25, 2014, 7:37 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18478/
> ---
> 
> (Updated Feb. 25, 2014, 7:37 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-6459
> https://issues.apache.org/jira/browse/HIVE-6459
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Patch addressed the issue by keeping the type of the sum field consistent 
> with that of sum UDF. The type of the final avg result is unchanged.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFAvgDecimal.java
>  6f593f9 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
> abd54be 
>   ql/src/test/queries/clientpositive/vector_decimal_aggregate.q eb9146e 
>   ql/src/test/results/clientpositive/create_genericudaf.q.out 96fe2fa 
>   ql/src/test/results/clientpositive/decimal_precision.q.out a80695c 
>   ql/src/test/results/clientpositive/decimal_udf.q.out 74ae554 
>   ql/src/test/results/clientpositive/groupby10.q.out 341427f 
>   ql/src/test/results/clientpositive/groupby3.q.out a74f2b5 
>   ql/src/test/results/clientpositive/groupby3_map.q.out 9424071 
>   ql/src/test/results/clientpositive/groupby3_map_multi_distinct.q.out 
> 9bcd7c9 
>   ql/src/test/results/clientpositive/groupby3_map_skew.q.out f438f89 
>   ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out 310a202 
>   ql/src/test/results/clientpositive/limit_pushdown.q.out a8add4c 
>   ql/src/test/results/clientpositive/subquery_in.q.out 48be22b 
>   ql/src/test/results/clientpositive/subquery_in_having.q.out ef3dc18 
>   ql/src/test/results/clientpositive/subquery_notin.q.out b2d687b 
>   ql/src/test/results/clientpositive/subquery_notin_having.q.out 5f4d96e 
>   ql/src/test/results/clientpositive/udaf_number_format.q.out 339ef94 
>   ql/src/test/results/clientpositive/udf3.q.out 546f949 
>   ql/src/test/results/clientpositive/udf8.q.out 79c3bff 
>   ql/src/test/results/clientpositive/vector_decimal_aggregate.q.out 8b73971 
>   ql/src/test/results/clientpositive/vectorization_limit.q.out 51a4e81 
>   ql/src/test/results/clientpositive/vectorization_pushdown.q.out df474d6 
>   ql/src/test/results/clientpositive/vectorization_short_regress.q.out 
> 07accb6 
>   ql/src/test/results/clientpositive/vectorized_mapjoin.q.out 9590642 
>   ql/src/test/results/clientpositive/vectorized_shufflejoin.q.out 928bc82 
>   ql/src/test/results/compiler/plan/groupby3.q.xml cc88d5c 
> 
> Diff: https://reviews.apache.org/r/18478/diff/
> 
> 
> Testing
> ---
> 
> Existing tests cover this. Some test output is regenerated due to the output 
> diff.
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



[jira] [Commented] (HIVE-6459) Change the precison/scale for intermediate sum result in the avg() udf

2014-02-28 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916269#comment-13916269
 ] 

Prasad Mujumdar commented on HIVE-6459:
---

There are two failures, do you think those are related to the patch ?

BTW, I approved the RB request.

> Change the precison/scale for intermediate sum result in the avg() udf 
> ---
>
> Key: HIVE-6459
> URL: https://issues.apache.org/jira/browse/HIVE-6459
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Affects Versions: 0.13.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-6459.1.patch, HIVE-6459.2.patch, HIVE-6459.3.patch, 
> HIVE-6459.patch
>
>
> The avg() udf, when applied to a decimal column, selects the precision/scale 
> of the intermediate sum field as (p+4, s+4), which is the same for the 
> precision/scale of the avg() result. However, the additional scale increase 
> is unnecessary, and the problem of data overflow may occur. The requested 
> change is that for the intermediate sum result,  the precsion/scale is set to 
> (p+10, s), which is consistent to sum() udf. The avg() result still keeps its 
> precision/scale.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


RE: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang

2014-02-28 Thread Eric Hanson (BIG DATA)
Congratulations Xuefu!

-Original Message-
From: Remus Rusanu [mailto:rem...@microsoft.com] 
Sent: Friday, February 28, 2014 11:43 AM
To: dev@hive.apache.org; u...@hive.apache.org
Cc: Xuefu Zhang
Subject: RE: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang

Grats!

From: Prasanth Jayachandran 
Sent: Friday, February 28, 2014 9:11 PM
To: dev@hive.apache.org
Cc: u...@hive.apache.org; Xuefu Zhang
Subject: Re: [ANNOUNCE] New Hive PMC Member - Xuefu Zhang

Congratulations Xuefu!

Thanks
Prasanth Jayachandran

On Feb 28, 2014, at 11:04 AM, Vaibhav Gumashta  
wrote:

> Congrats Xuefu!
>
>
> On Fri, Feb 28, 2014 at 9:20 AM, Prasad Mujumdar wrote:
>
>>   Congratulations Xuefu !!
>>
>> thanks
>> Prasad
>>
>>
>>
>> On Fri, Feb 28, 2014 at 1:20 AM, Carl Steinbach  wrote:
>>
>>> I am pleased to announce that Xuefu Zhang has been elected to the 
>>> Hive Project Management Committee. Please join me in congratulating Xuefu!
>>>
>>> Thanks.
>>>
>>> Carl
>>>
>>>
>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. Thank 
> You.


--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


Re: Review Request 18478: HIVE-6459: Change the precison/scale for intermediate sum result in the avg() udf

2014-02-28 Thread Prasad Mujumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18478/#review35825
---

Ship it!


Looks fine to me.
A minor question below.


ql/src/test/queries/clientpositive/vector_decimal_aggregate.q


Is the vectorization disabled intentionally ?


- Prasad Mujumdar


On Feb. 25, 2014, 7:37 p.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18478/
> ---
> 
> (Updated Feb. 25, 2014, 7:37 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-6459
> https://issues.apache.org/jira/browse/HIVE-6459
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Patch addressed the issue by keeping the type of the sum field consistent 
> with that of sum UDF. The type of the final avg result is unchanged.
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/aggregates/VectorUDAFAvgDecimal.java
>  6f593f9 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
> abd54be 
>   ql/src/test/queries/clientpositive/vector_decimal_aggregate.q eb9146e 
>   ql/src/test/results/clientpositive/create_genericudaf.q.out 96fe2fa 
>   ql/src/test/results/clientpositive/decimal_precision.q.out a80695c 
>   ql/src/test/results/clientpositive/decimal_udf.q.out 74ae554 
>   ql/src/test/results/clientpositive/groupby10.q.out 341427f 
>   ql/src/test/results/clientpositive/groupby3.q.out a74f2b5 
>   ql/src/test/results/clientpositive/groupby3_map.q.out 9424071 
>   ql/src/test/results/clientpositive/groupby3_map_multi_distinct.q.out 
> 9bcd7c9 
>   ql/src/test/results/clientpositive/groupby3_map_skew.q.out f438f89 
>   ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out 310a202 
>   ql/src/test/results/clientpositive/limit_pushdown.q.out a8add4c 
>   ql/src/test/results/clientpositive/subquery_in.q.out 48be22b 
>   ql/src/test/results/clientpositive/subquery_in_having.q.out ef3dc18 
>   ql/src/test/results/clientpositive/subquery_notin.q.out b2d687b 
>   ql/src/test/results/clientpositive/subquery_notin_having.q.out 5f4d96e 
>   ql/src/test/results/clientpositive/udaf_number_format.q.out 339ef94 
>   ql/src/test/results/clientpositive/udf3.q.out 546f949 
>   ql/src/test/results/clientpositive/udf8.q.out 79c3bff 
>   ql/src/test/results/clientpositive/vector_decimal_aggregate.q.out 8b73971 
>   ql/src/test/results/clientpositive/vectorization_limit.q.out 51a4e81 
>   ql/src/test/results/clientpositive/vectorization_pushdown.q.out df474d6 
>   ql/src/test/results/clientpositive/vectorization_short_regress.q.out 
> 07accb6 
>   ql/src/test/results/clientpositive/vectorized_mapjoin.q.out 9590642 
>   ql/src/test/results/clientpositive/vectorized_shufflejoin.q.out 928bc82 
>   ql/src/test/results/compiler/plan/groupby3.q.xml cc88d5c 
> 
> Diff: https://reviews.apache.org/r/18478/diff/
> 
> 
> Testing
> ---
> 
> Existing tests cover this. Some test output is regenerated due to the output 
> diff.
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



[jira] [Commented] (HIVE-6486) Support secure Subject.doAs() in HiveServer2 JDBC client.

2014-02-28 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13916261#comment-13916261
 ] 

Vaibhav Gumashta commented on HIVE-6486:


[~shivshi] Can you rename the patch in the format: HIVE-6486.1.patch. Thanks.

> Support secure Subject.doAs() in HiveServer2 JDBC client.
> -
>
> Key: HIVE-6486
> URL: https://issues.apache.org/jira/browse/HIVE-6486
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Shivaraju Gowda
> Attachments: Hive_011_Support-Subject_doAS.patch, 
> TestHive_SujectDoAs.java
>
>
> HIVE-5155 addresses the problem of kerberos authentication in multi-user 
> middleware server using proxy user.  In this mode the principal used by the 
> middle ware server has privileges to impersonate selected users in 
> Hive/Hadoop. 
> This enhancement is to support Subject.doAs() authentication in  Hive JDBC 
> layer so that the end users Kerberos Subject is passed through in the middle 
> ware server. With this improvement there won't be any additional setup in the 
> server to grant proxy privileges to some users and there won't be need to 
> specify a proxy user in the JDBC client. This version should also be more 
> secure since it won't require principals with the privileges to impersonate 
> other users in Hive/Hadoop setup.
>  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6513) Most hcatalog pig tests fail when building for hadoop-2

2014-02-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6513:
---

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Closing as duplicate of HIVE-6514, tracking there.

> Most hcatalog pig tests fail when building for hadoop-2
> ---
>
> Key: HIVE-6513
> URL: https://issues.apache.org/jira/browse/HIVE-6513
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.13.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-6513.patch
>
>
> Most of the unit tests in hcatalog/hcatalog-pig-adaptor fail when built with 
> -Phadoop-2



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6523) Tests with -Phadoop-2 and MiniMRCluster error if it doesn't find yarn-site.xml

2014-02-28 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-6523:
---

Environment: Hadoop 2.4.*

> Tests with -Phadoop-2 and MiniMRCluster error if it doesn't find yarn-site.xml
> --
>
> Key: HIVE-6523
> URL: https://issues.apache.org/jira/browse/HIVE-6523
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
> Environment: Hadoop 2.4.*
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-6523.patch
>
>
> With the newer hadoop versions (2.4+) in tests, MiniMRCluster throws an error 
> loading resources if it can't find a yarn-site.xml in its classpath, which 
> affects test runs with -Phadoop-2 and minimrclusters.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


  1   2   >