[jira] [Commented] (HIVE-5888) group by after join operation product no result when hive.optimize.skewjoin = true

2013-11-25 Thread cyril liao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832362#comment-13832362
 ] 

cyril liao commented on HIVE-5888:
--

if hive.optimize.skewjoin is set to false,we got right result

> group by after join operation product no result when  hive.optimize.skewjoin 
> = true 
> 
>
> Key: HIVE-5888
> URL: https://issues.apache.org/jira/browse/HIVE-5888
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: cyril liao
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5888) group by after join operation product no result when hive.optimize.skewjoin = true

2013-11-25 Thread cyril liao (JIRA)
cyril liao created HIVE-5888:


 Summary: group by after join operation product no result when  
hive.optimize.skewjoin = true 
 Key: HIVE-5888
 URL: https://issues.apache.org/jira/browse/HIVE-5888
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: cyril liao






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5672) Insert with custom separator not supported for non-local directory

2013-11-25 Thread Romain Rigaux (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832358#comment-13832358
 ] 

Romain Rigaux commented on HIVE-5672:
-

Any update?

> Insert with custom separator not supported for non-local directory
> --
>
> Key: HIVE-5672
> URL: https://issues.apache.org/jira/browse/HIVE-5672
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0
>Reporter: Romain Rigaux
>Assignee: Xuefu Zhang
>
> https://issues.apache.org/jira/browse/HIVE-3682 is great but non local 
> directory don't seem to be supported:
> {code}
> insert overwrite directory '/tmp/test-02'
> row format delimited
> FIELDS TERMINATED BY ':'
> select description FROM sample_07
> {code}
> {code}
> Error while compiling statement: FAILED: ParseException line 2:0 cannot 
> recognize input near 'row' 'format' 'delimited' in select clause
> {code}
> This works (with 'local'):
> {code}
> insert overwrite local directory '/tmp/test-02'
> row format delimited
> FIELDS TERMINATED BY ':'
> select code, description FROM sample_07
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-4790) MapredLocalTask task does not make virtual columns

2013-11-25 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4790:


Status: Patch Available  (was: Open)

> MapredLocalTask task does not make virtual columns
> --
>
> Key: HIVE-4790
> URL: https://issues.apache.org/jira/browse/HIVE-4790
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: D11511.3.patch, HIVE-4790.D11511.1.patch, 
> HIVE-4790.D11511.2.patch
>
>
> From mailing list, 
> http://www.mail-archive.com/user@hive.apache.org/msg08264.html
> {noformat}
> SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON 
> b.rownumber = a.number;
> fails with this error:
>  
> > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = 
> a.number;
> Automatically selecting local only mode for query
> Total MapReduce jobs = 1
> setting HADOOP_USER_NAMEpmarron
> 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property 
> hive.metastore.local no longer has any effect. Make sure to provide a valid 
> value for hive.metastore.uris if you are connecting to a remote metastore.
> Execution log at: /tmp/pmarron/.log
> 2013-06-25 10:52:56 Starting to launch local task to process map join;
>   maximum memory = 932118528
> java.lang.RuntimeException: cannot find field block__offset__inside__file 
> from [0:rownumber, 1:offset]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
> at 
> org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
> at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394)
> at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Execution failed with exit status: 2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5876) Split elimination in ORC breaks for partitioned tables

2013-11-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832350#comment-13832350
 ] 

Hive QA commented on HIVE-5876:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12615667/HIVE-5876.2.patch

{color:green}SUCCESS:{color} +1 4684 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/443/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/443/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12615667

> Split elimination in ORC breaks for partitioned tables
> --
>
> Key: HIVE-5876
> URL: https://issues.apache.org/jira/browse/HIVE-5876
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-5876.1.patch, HIVE-5876.2.patch
>
>
> HIVE-5632 eliminates ORC stripes from split computation that do not satisfy 
> SARG condition. SARG expression can also refer to partition columns. But 
> partition column will not be contained in the column names list in ORC file. 
> This was causing ArrayIndexOutOfBoundException in split elimination logic 
> when used with partitioned tables. The fix is to ignore evaluation of 
> partition column expressions in split elimination.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-4790) MapredLocalTask task does not make virtual columns

2013-11-25 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4790:
--

Attachment: D11511.3.patch

navis updated the revision "HIVE-4790 [jira] MapredLocalTask task does not make 
virtual columns".

  Rebased to trunk

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11511

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11511?vs=35637&id=44409#toc

AFFECTED FILES
  ql/pom.xml
  ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/PartitionKeySampler.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsWork.java
  ql/src/test/queries/clientpositive/join_vc.q
  ql/src/test/results/clientpositive/join_vc.q.out

To: JIRA, navis


> MapredLocalTask task does not make virtual columns
> --
>
> Key: HIVE-4790
> URL: https://issues.apache.org/jira/browse/HIVE-4790
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: D11511.3.patch, HIVE-4790.D11511.1.patch, 
> HIVE-4790.D11511.2.patch
>
>
> From mailing list, 
> http://www.mail-archive.com/user@hive.apache.org/msg08264.html
> {noformat}
> SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON 
> b.rownumber = a.number;
> fails with this error:
>  
> > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = 
> a.number;
> Automatically selecting local only mode for query
> Total MapReduce jobs = 1
> setting HADOOP_USER_NAMEpmarron
> 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property 
> hive.metastore.local no longer has any effect. Make sure to provide a valid 
> value for hive.metastore.uris if you are connecting to a remote metastore.
> Execution log at: /tmp/pmarron/.log
> 2013-06-25 10:52:56 Starting to launch local task to process map join;
>   maximum memory = 932118528
> java.lang.RuntimeException: cannot find field block__offset__inside__file 
> from [0:rownumber, 1:offset]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
> at 
> org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
> at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394)
> at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Execution failed with exit status: 2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-4790) MapredLocalTask task does not make virtual columns

2013-11-25 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4790:


Status: Open  (was: Patch Available)

> MapredLocalTask task does not make virtual columns
> --
>
> Key: HIVE-4790
> URL: https://issues.apache.org/jira/browse/HIVE-4790
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4790.D11511.1.patch, HIVE-4790.D11511.2.patch
>
>
> From mailing list, 
> http://www.mail-archive.com/user@hive.apache.org/msg08264.html
> {noformat}
> SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON 
> b.rownumber = a.number;
> fails with this error:
>  
> > SELECT *,b.BLOCK__OFFSET__INSIDE__FILE FROM a JOIN b ON b.rownumber = 
> a.number;
> Automatically selecting local only mode for query
> Total MapReduce jobs = 1
> setting HADOOP_USER_NAMEpmarron
> 13/06/25 10:52:56 WARN conf.HiveConf: DEPRECATED: Configuration property 
> hive.metastore.local no longer has any effect. Make sure to provide a valid 
> value for hive.metastore.uris if you are connecting to a remote metastore.
> Execution log at: /tmp/pmarron/.log
> 2013-06-25 10:52:56 Starting to launch local task to process map join;
>   maximum memory = 932118528
> java.lang.RuntimeException: cannot find field block__offset__inside__file 
> from [0:rownumber, 1:offset]
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:366)
> at 
> org.apache.hadoop.hive.serde2.lazy.objectinspector.LazySimpleStructObjectInspector.getStructFieldRef(LazySimpleStructObjectInspector.java:168)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.DelegatedStructObjectInspector.getStructFieldRef(DelegatedStructObjectInspector.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:57)
> at 
> org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
> at 
> org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:222)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
> at 
> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
> at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:394)
> at 
> org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:277)
> at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:676)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Execution failed with exit status: 2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5581) Implement vectorized year/month/day... etc. for string arguments

2013-11-25 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832318#comment-13832318
 ] 

Teddy Choi commented on HIVE-5581:
--

I wrote vectorized versions of UDFYear, UDFMonth, UDFDayOfMonth, UDFHour, 
UDFMinute, UDFSecond, UDFWeekOfYear.

There still remain more functions that use TimestampWritable type. So I could 
write more of them.
* Date-related: UDFDate, UDFDateSub, UDFDateAdd, UDFDateDiff
* Conversion-related: UDFToBoolean, UDFToByte, UDFToFloat, UDFToInteger, 
UDFToLong, UDFToShort, UDFToString

However, there are no vectorized date/datesub/toboolean... for long arguments 
yet. Should I write them now or later?

> Implement vectorized year/month/day... etc. for string arguments
> 
>
> Key: HIVE-5581
> URL: https://issues.apache.org/jira/browse/HIVE-5581
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor
>Affects Versions: 0.13.0
>Reporter: Eric Hanson
>Assignee: Teddy Choi
> Attachments: HIVE-5581.1.patch.txt, HIVE-5581.2.patch, 
> HIVE-5581.3.patch, HIVE-5581.4.patch, HIVE-5581.5.patch, HIVE-5581.5.patch, 
> HIVE-5581.6.patch
>
>
> Functions year(), month(), day(), weekofyear(), hour(), minute(), second() 
> need to be implemented for string arguments in vectorized mode. 
> They already work for timestamp arguments.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5761) Implement vectorized support for the DATE data type

2013-11-25 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832312#comment-13832312
 ] 

Teddy Choi commented on HIVE-5761:
--

I would like to take this issue, following to HIVE-5581. They have much in 
common. I will create vectorized versions of UDFDate, UDFDateAdd, UDFDateDiff, 
UDFDateSub, UDFDayOfMonth, UDFMonth, UDFToString, UDFWeekOfYear, UDFYear. 

> Implement vectorized support for the DATE data type
> ---
>
> Key: HIVE-5761
> URL: https://issues.apache.org/jira/browse/HIVE-5761
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>
> Add support to allow queries referencing DATE columns and expression results 
> to run efficiently in vectorized mode. This should re-use the code for the 
> the integer/timestamp types to the extent possible and beneficial. Include 
> unit tests and end-to-end tests. Consider re-using or extending existing 
> end-to-end tests for vectorized integer and/or timestamp operations.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5706) Move a few numeric UDFs to generic implementations

2013-11-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832309#comment-13832309
 ] 

Hive QA commented on HIVE-5706:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12615781/HIVE-5706.6.patch

{color:green}SUCCESS:{color} +1 4732 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/442/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/442/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12615781

> Move a few numeric UDFs to generic implementations
> --
>
> Key: HIVE-5706
> URL: https://issues.apache.org/jira/browse/HIVE-5706
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5706.1.patch, HIVE-5706.2.patch, HIVE-5706.3.patch, 
> HIVE-5706.4.patch, HIVE-5706.5.patch, HIVE-5706.6.patch, HIVE-5706.patch
>
>
> This is a follow-up JIRA for HIVE-5356 to reduce the review scope. It will 
> cover UDFOPPositive, UDFOPNegative, UDFCeil, UDFFloor, and UDFPower.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5887) metastore direct sql doesn't work with oracle

2013-11-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5887:
---

Attachment: HIVE-5887.patch

tiny patch. Sanity checked on mysql and pg also, let HiveQA run for now

> metastore direct sql doesn't work with oracle
> -
>
> Key: HIVE-5887
> URL: https://issues.apache.org/jira/browse/HIVE-5887
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5887.patch
>
>
> Looks like "as" keyword is not needed/supported on Oracle. Let me make a 
> quick patch...



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5887) metastore direct sql doesn't work with oracle

2013-11-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5887:
---

Status: Patch Available  (was: Open)

> metastore direct sql doesn't work with oracle
> -
>
> Key: HIVE-5887
> URL: https://issues.apache.org/jira/browse/HIVE-5887
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5887.patch
>
>
> Looks like "as" keyword is not needed/supported on Oracle. Let me make a 
> quick patch...



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5706) Move a few numeric UDFs to generic implementations

2013-11-25 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5706:
--

Attachment: HIVE-5706.6.patch

Patch #6 is equivalent to #5. Rebased and reloaded to kick off test after the 
build was fixed.

> Move a few numeric UDFs to generic implementations
> --
>
> Key: HIVE-5706
> URL: https://issues.apache.org/jira/browse/HIVE-5706
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5706.1.patch, HIVE-5706.2.patch, HIVE-5706.3.patch, 
> HIVE-5706.4.patch, HIVE-5706.5.patch, HIVE-5706.6.patch, HIVE-5706.patch
>
>
> This is a follow-up JIRA for HIVE-5356 to reduce the review scope. It will 
> cover UDFOPPositive, UDFOPNegative, UDFCeil, UDFFloor, and UDFPower.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832232#comment-13832232
 ] 

Ashutosh Chauhan commented on HIVE-5817:


We should add following .q file in patch:
{code}
set hive.auto.convert.join=true;
create table store(s_store_sk int, s_city string)
stored as orc;
insert overwrite table store
select cint, cstring1
from alltypesorc
where cint not in (
-3728, -563, 762, 6981, 253665376, 528534767, 626923679);
create table store_sales(ss_store_sk int, ss_hdemo_sk int, ss_net_profit double)
stored as orc;
insert overwrite table store_sales
select cint, cint, cdouble
from alltypesorc
where cint not in (
-3728, -563, 762, 6981, 253665376, 528534767, 626923679);
create table household_demographics(hd_demo_sk int)
stored as orc;
insert overwrite table household_demographics
select cint
from alltypesorc
where cint not in (
-3728, -563, 762, 6981, 253665376, 528534767, 626923679);

set hive.vectorized.execution.enabled = true;
select store.s_city, ss_net_profit
from store_sales
JOIN store ON store_sales.ss_store_sk = store.s_store_sk
JOIN household_demographics ON store_sales.ss_hdemo_sk = 
household_demographics.hd_demo_sk
;

set hive.auto.convert.join=false;
set hive.vectorized.execution.enabled = false;
{code}

I tested the patch on above query and output returned are correct.

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Build seems broken

2013-11-25 Thread Navis류승우
I've forgot to mark delete/add files for the patch. It seemed working now.

Sorry for the inconvenience to all.


2013/11/26 Jarek Jarcec Cecho 

> I've pushed something that I didn't want couple of minutes ago and then
> force push to remove it. I'm not sure whether it's caused by that though.
>
> Jarcec
>
> On Mon, Nov 25, 2013 at 06:04:34PM -0800, Xuefu Zhang wrote:
> > [INFO] BUILD FAILURE
> > [INFO]
> > 
> > [INFO] Total time: 5.604s
> > [INFO] Finished at: Mon Nov 25 17:53:20 PST 2013
> > [INFO] Final Memory: 29M/283M
> > [INFO]
> > 
> > [ERROR] Failed to execute goal
> > org.apache.maven.plugins:maven-compiler-plugin:3.1:compile
> > (default-compile) on project hive-it-util: Compilation failure:
> Compilation
> > failure:
> > [ERROR]
> >
> /home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[45,73]
> > cannot find symbol
> > [ERROR] symbol  : variable HIVEJOBPROGRESS
> > [ERROR] location: class org.apache.hadoop.hive.conf.HiveConf.ConfVars
> > [ERROR]
> >
> /home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[57,38]
> > cannot find symbol
> > [ERROR] symbol  : method getCounters()
> > [ERROR] location: class
> org.apache.hadoop.hive.ql.exec.Operator > of ? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>
> > [ERROR] -> [Help 1]
> > [ERROR]
> > [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e
> > switch.
> > [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> > [ERROR]
> > [ERROR] For more information about the errors and possible solutions,
> > please read the following articles:
> > [ERROR] [Help 1]
> > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> > [ERROR]
> > [ERROR] After correcting the problems, you can resume the build with the
> > command
> > [ERROR]   mvn  -rf :hive-it-util
>


Re: Build seems broken

2013-11-25 Thread Jarek Jarcec Cecho
I've pushed something that I didn't want couple of minutes ago and then force 
push to remove it. I'm not sure whether it's caused by that though.

Jarcec

On Mon, Nov 25, 2013 at 06:04:34PM -0800, Xuefu Zhang wrote:
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 5.604s
> [INFO] Finished at: Mon Nov 25 17:53:20 PST 2013
> [INFO] Final Memory: 29M/283M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile
> (default-compile) on project hive-it-util: Compilation failure: Compilation
> failure:
> [ERROR]
> /home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[45,73]
> cannot find symbol
> [ERROR] symbol  : variable HIVEJOBPROGRESS
> [ERROR] location: class org.apache.hadoop.hive.conf.HiveConf.ConfVars
> [ERROR]
> /home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[57,38]
> cannot find symbol
> [ERROR] symbol  : method getCounters()
> [ERROR] location: class org.apache.hadoop.hive.ql.exec.Operator of ? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn  -rf :hive-it-util


signature.asc
Description: Digital signature


Re: Build seems broken

2013-11-25 Thread Navis류승우
My bad. I should removed the class committing HIVE-4518.


2013/11/26 Xuefu Zhang 

> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 5.604s
> [INFO] Finished at: Mon Nov 25 17:53:20 PST 2013
> [INFO] Final Memory: 29M/283M
> [INFO]
> 
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile
> (default-compile) on project hive-it-util: Compilation failure: Compilation
> failure:
> [ERROR]
>
> /home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[45,73]
> cannot find symbol
> [ERROR] symbol  : variable HIVEJOBPROGRESS
> [ERROR] location: class org.apache.hadoop.hive.conf.HiveConf.ConfVars
> [ERROR]
>
> /home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[57,38]
> cannot find symbol
> [ERROR] symbol  : method getCounters()
> [ERROR] location: class org.apache.hadoop.hive.ql.exec.Operator of ? extends org.apache.hadoop.hive.ql.plan.OperatorDesc>
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn  -rf :hive-it-util
>


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832197#comment-13832197
 ] 

Ashutosh Chauhan commented on HIVE-5817:


Did some testing and found testcase : vectorization_part_project.q is failing 
after applying this patch with following stack trace:
{code}
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:181)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163)
... 4 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Error evaluating 
(cdouble + 2)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:117)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:489)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:827)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
... 5 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 13
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.gen.DoubleColAddLongScalar.evaluate(DoubleColAddLongScalar.java:57)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:115)
{code}

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Build seems broken

2013-11-25 Thread Xuefu Zhang
[INFO] BUILD FAILURE
[INFO]

[INFO] Total time: 5.604s
[INFO] Finished at: Mon Nov 25 17:53:20 PST 2013
[INFO] Final Memory: 29M/283M
[INFO]

[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile
(default-compile) on project hive-it-util: Compilation failure: Compilation
failure:
[ERROR]
/home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[45,73]
cannot find symbol
[ERROR] symbol  : variable HIVEJOBPROGRESS
[ERROR] location: class org.apache.hadoop.hive.conf.HiveConf.ConfVars
[ERROR]
/home/xzhang/apa/hive-commit/itests/util/src/main/java/org/apache/hadoop/hive/ql/hooks/OptrStatGroupByHook.java:[57,38]
cannot find symbol
[ERROR] symbol  : method getCounters()
[ERROR] location: class org.apache.hadoop.hive.ql.exec.Operator
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions,
please read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command
[ERROR]   mvn  -rf :hive-it-util


[jira] [Updated] (HIVE-5839) BytesRefArrayWritable compareTo violates contract

2013-11-25 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5839:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Xuefu!

> BytesRefArrayWritable compareTo violates contract
> -
>
> Key: HIVE-5839
> URL: https://issues.apache.org/jira/browse/HIVE-5839
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Ian Robertson
>Assignee: Xuefu Zhang
> Attachments: HIVE-5839.1.patch, HIVE-5839.2.patch, HIVE-5839.patch, 
> HIVE-5839.patch
>
>
> BytesRefArrayWritable's compareTo violates the compareTo contract from 
> java.lang.Object. Specifically:
> * The implementor must ensure sgn(x.compareTo( y )) == -sgn(y.compareTo( x )) 
> for all x and y.
> The compareTo implementation on BytesRefArrayWritable does a proper 
> comparison of the sizes of the two instances. However, if the sizes are the 
> same, it proceeds to do a check if both array's have the same constant. If 
> not, it returns 1. This means that if x and y are two BytesRefArrayWritable 
> instances with the same size, but different contents, then x.compareTo( y ) 
> == 1 and y.compareTo( x ) == 1.
> Additionally, the comparison of contents is order agnostic. This seems wrong, 
> since order of entries should matter. It is also very inefficient, running at 
> O(n^2), where n is the number of entries.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5885) Add myself and Jitendra to committer list

2013-11-25 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832178#comment-13832178
 ] 

Gunther Hagleitner commented on HIVE-5885:
--

LGTM +1

> Add myself and Jitendra to committer list
> -
>
> Key: HIVE-5885
> URL: https://issues.apache.org/jira/browse/HIVE-5885
> Project: Hive
>  Issue Type: Task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
>Priority: Minor
> Attachments: HIVE-5885.1.patch, HIVE-5885.2.patch
>
>
> Update website to include me and Jitendra.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition

2013-11-25 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3286:


Attachment: HIVE-3286.12.patch.txt

Resubmitting to run test.

> Explicit skew join on user provided condition
> -
>
> Key: HIVE-3286
> URL: https://issues.apache.org/jira/browse/HIVE-3286
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, 
> HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, 
> HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch
>
>
> Join operation on table with skewed data takes most of execution time 
> handling the skewed keys. But mostly we already know about that and even know 
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total 
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for 
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= 
> a.key < 150, and 17 reducers for others (could be extended to assign more 
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should 
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.
> 
> Skew expressions* in "SKEW ON (expr, expr, ...)" are evaluated sequentially 
> at runtime, and first 'true' one decides skew group for the row. Each skew 
> group has reserved partition slot(s), to which all rows in a group would be 
> assigned. 
> The number of partition slot reserved for each group is decided also at 
> runtime by simple calculation of percentage. If a skew group is "CLUSTER BY 
> 20 PERCENT" and total partition slot (=number of reducer) is 20, that group 
> will reserve 4 partition slots, etc.
> "DISTRIBUTE BY" decides how the rows in a group is dispersed in the range of 
> reserved slots (If there is only one slot for a group, this is meaningless). 
> Currently, three distribution policies are available: RANDOM, KEYS, 
> . 
> 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
> non-driver alias are duplicated for all the slots (default if not specified)
> 2. KEYS : determined by hash value of keys (same with previous)
> 3. expression : determined by hash of object evaluated by user-provided 
> expression
> Only possible with inner, equi, common-joins. Not yet supports join tree 
> merging.
> Might be used by other RS users like "SORT BY" or "GROUP BY"
> If there exists column statistics for the key, it could be possible to apply 
> automatically.
> For example, if 20 reducers are used for the query below,
> {code}
> select count(*) from src a join src b on a.key=b.key skew on (
>a.key = '0' CLUSTER BY 10 PERCENT,
>b.key < '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
>cast(a.key as int) > 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
> {code}
> group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
> reserve slots 0~5.
> For a row with key='0' from alias a, the row is randomly assigned in the 
> range of 6~7 (driver alias) : 6 or 7
> For a row with key='0' from alias b, the row is disributed for all slots in 
> 6~7 (non-driver alias) : 6 and 7
> For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
> of upper(b.key) : 8 + (hash(upper(key)) % 4)
> For a row with key='500', the row is assigned in the range of 12~19 by 
> hashcode of join key : 12 + (hash(key) % 8)
> For a row with key='200', this is not belong to any skew group : hash(key) % 6
> *expressions in skew condition : 
> 1. all expressions should be made of expression in join condition, which 
> means if join condition is "a.key=b.key", user can make any expression with 
> "a.key" or "b.key". But if join condition is a.key+1=b.key, user cannot make 
> expression with "a.key" solely (should make expression with "a.key+1"). 
> 2. all expressions should reference one and only-one side of aliases. For 
> example, simple constant expressions or expressions referencing both side of 
> join condition ("a.key+b.key<100") is not allowed.
> 3. all functions in expression should be deteministic and stateless.
> 4. if "DISTRIBUTED BY expression" is used, distibution expression also should 
> have same alias with skew expression.
> **driver alias :
> 1. driver alias means the sole referenced alias from skew expression, which 
> is important for RANDOM distribution. ro

[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition

2013-11-25 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3286:


Status: Open  (was: Patch Available)

> Explicit skew join on user provided condition
> -
>
> Key: HIVE-3286
> URL: https://issues.apache.org/jira/browse/HIVE-3286
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, 
> HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, 
> HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch
>
>
> Join operation on table with skewed data takes most of execution time 
> handling the skewed keys. But mostly we already know about that and even know 
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total 
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for 
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= 
> a.key < 150, and 17 reducers for others (could be extended to assign more 
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should 
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.
> 
> Skew expressions* in "SKEW ON (expr, expr, ...)" are evaluated sequentially 
> at runtime, and first 'true' one decides skew group for the row. Each skew 
> group has reserved partition slot(s), to which all rows in a group would be 
> assigned. 
> The number of partition slot reserved for each group is decided also at 
> runtime by simple calculation of percentage. If a skew group is "CLUSTER BY 
> 20 PERCENT" and total partition slot (=number of reducer) is 20, that group 
> will reserve 4 partition slots, etc.
> "DISTRIBUTE BY" decides how the rows in a group is dispersed in the range of 
> reserved slots (If there is only one slot for a group, this is meaningless). 
> Currently, three distribution policies are available: RANDOM, KEYS, 
> . 
> 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
> non-driver alias are duplicated for all the slots (default if not specified)
> 2. KEYS : determined by hash value of keys (same with previous)
> 3. expression : determined by hash of object evaluated by user-provided 
> expression
> Only possible with inner, equi, common-joins. Not yet supports join tree 
> merging.
> Might be used by other RS users like "SORT BY" or "GROUP BY"
> If there exists column statistics for the key, it could be possible to apply 
> automatically.
> For example, if 20 reducers are used for the query below,
> {code}
> select count(*) from src a join src b on a.key=b.key skew on (
>a.key = '0' CLUSTER BY 10 PERCENT,
>b.key < '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
>cast(a.key as int) > 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
> {code}
> group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
> reserve slots 0~5.
> For a row with key='0' from alias a, the row is randomly assigned in the 
> range of 6~7 (driver alias) : 6 or 7
> For a row with key='0' from alias b, the row is disributed for all slots in 
> 6~7 (non-driver alias) : 6 and 7
> For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
> of upper(b.key) : 8 + (hash(upper(key)) % 4)
> For a row with key='500', the row is assigned in the range of 12~19 by 
> hashcode of join key : 12 + (hash(key) % 8)
> For a row with key='200', this is not belong to any skew group : hash(key) % 6
> *expressions in skew condition : 
> 1. all expressions should be made of expression in join condition, which 
> means if join condition is "a.key=b.key", user can make any expression with 
> "a.key" or "b.key". But if join condition is a.key+1=b.key, user cannot make 
> expression with "a.key" solely (should make expression with "a.key+1"). 
> 2. all expressions should reference one and only-one side of aliases. For 
> example, simple constant expressions or expressions referencing both side of 
> join condition ("a.key+b.key<100") is not allowed.
> 3. all functions in expression should be deteministic and stateless.
> 4. if "DISTRIBUTED BY expression" is used, distibution expression also should 
> have same alias with skew expression.
> **driver alias :
> 1. driver alias means the sole referenced alias from skew expression, which 
> is important for RANDOM distribution. rows of driver alias are as

[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition

2013-11-25 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3286:


Status: Patch Available  (was: Open)

> Explicit skew join on user provided condition
> -
>
> Key: HIVE-3286
> URL: https://issues.apache.org/jira/browse/HIVE-3286
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, 
> HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, 
> HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch
>
>
> Join operation on table with skewed data takes most of execution time 
> handling the skewed keys. But mostly we already know about that and even know 
> what is look like the skewed keys.
> If we can explicitly assign reducer slots for the skewed keys, total 
> execution time could be greatly shortened.
> As for a start, I've extended join grammar something like this.
> {code}
> select * from src a join src b on a.key=b.key skew on (a.key+1 < 50, a.key+1 
> < 100, a.key < 150);
> {code}
> which means if above query is executed by 20 reducers, one reducer for 
> a.key+1 < 50, one reducer for 50 <= a.key+1 < 100, one reducer for 99 <= 
> a.key < 150, and 17 reducers for others (could be extended to assign more 
> than one reducer later)
> This can be only used with common-inner-equi joins. And skew condition should 
> be composed of join keys only.
> Work till done now will be updated shortly after code cleanup.
> 
> Skew expressions* in "SKEW ON (expr, expr, ...)" are evaluated sequentially 
> at runtime, and first 'true' one decides skew group for the row. Each skew 
> group has reserved partition slot(s), to which all rows in a group would be 
> assigned. 
> The number of partition slot reserved for each group is decided also at 
> runtime by simple calculation of percentage. If a skew group is "CLUSTER BY 
> 20 PERCENT" and total partition slot (=number of reducer) is 20, that group 
> will reserve 4 partition slots, etc.
> "DISTRIBUTE BY" decides how the rows in a group is dispersed in the range of 
> reserved slots (If there is only one slot for a group, this is meaningless). 
> Currently, three distribution policies are available: RANDOM, KEYS, 
> . 
> 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
> non-driver alias are duplicated for all the slots (default if not specified)
> 2. KEYS : determined by hash value of keys (same with previous)
> 3. expression : determined by hash of object evaluated by user-provided 
> expression
> Only possible with inner, equi, common-joins. Not yet supports join tree 
> merging.
> Might be used by other RS users like "SORT BY" or "GROUP BY"
> If there exists column statistics for the key, it could be possible to apply 
> automatically.
> For example, if 20 reducers are used for the query below,
> {code}
> select count(*) from src a join src b on a.key=b.key skew on (
>a.key = '0' CLUSTER BY 10 PERCENT,
>b.key < '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
>cast(a.key as int) > 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
> {code}
> group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
> reserve slots 0~5.
> For a row with key='0' from alias a, the row is randomly assigned in the 
> range of 6~7 (driver alias) : 6 or 7
> For a row with key='0' from alias b, the row is disributed for all slots in 
> 6~7 (non-driver alias) : 6 and 7
> For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
> of upper(b.key) : 8 + (hash(upper(key)) % 4)
> For a row with key='500', the row is assigned in the range of 12~19 by 
> hashcode of join key : 12 + (hash(key) % 8)
> For a row with key='200', this is not belong to any skew group : hash(key) % 6
> *expressions in skew condition : 
> 1. all expressions should be made of expression in join condition, which 
> means if join condition is "a.key=b.key", user can make any expression with 
> "a.key" or "b.key". But if join condition is a.key+1=b.key, user cannot make 
> expression with "a.key" solely (should make expression with "a.key+1"). 
> 2. all expressions should reference one and only-one side of aliases. For 
> example, simple constant expressions or expressions referencing both side of 
> join condition ("a.key+b.key<100") is not allowed.
> 3. all functions in expression should be deteministic and stateless.
> 4. if "DISTRIBUTED BY expression" is used, distibution expression also should 
> have same alias with skew expression.
> **driver alias :
> 1. driver alias means the sole referenced alias from skew expression, which 
> is important for RANDOM distribution. rows of driver alias are as

[jira] [Commented] (HIVE-5414) The result of show grant is not visible via JDBC

2013-11-25 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832173#comment-13832173
 ] 

Navis commented on HIVE-5414:
-

Strange. I cannot reproduce the test failure.

> The result of show grant is not visible via JDBC
> 
>
> Key: HIVE-5414
> URL: https://issues.apache.org/jira/browse/HIVE-5414
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, JDBC
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: D13209.1.patch, D13209.2.patch, D13209.3.patch
>
>
> Currently, show grant / show role grant does not make fetch task, which 
> provides the result schema for jdbc clients.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5706) Move a few numeric UDFs to generic implementations

2013-11-25 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5706:
--

Attachment: HIVE-5706.5.patch

Above failure was due to bad patch #4, which didn't have all the changes. Patch 
#5 fixed it.

> Move a few numeric UDFs to generic implementations
> --
>
> Key: HIVE-5706
> URL: https://issues.apache.org/jira/browse/HIVE-5706
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5706.1.patch, HIVE-5706.2.patch, HIVE-5706.3.patch, 
> HIVE-5706.4.patch, HIVE-5706.5.patch, HIVE-5706.patch
>
>
> This is a follow-up JIRA for HIVE-5356 to reduce the review scope. It will 
> cover UDFOPPositive, UDFOPNegative, UDFCeil, UDFFloor, and UDFPower.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5886) [Refactor] Remove unused class JobCloseFeedback

2013-11-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5886:
---

Status: Patch Available  (was: Open)

> [Refactor] Remove unused class JobCloseFeedback
> ---
>
> Key: HIVE-5886
> URL: https://issues.apache.org/jira/browse/HIVE-5886
> Project: Hive
>  Issue Type: Task
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5886.patch
>
>
> It was originally added to aid dynamic partition code path, but that logic 
> has evolved such that it doesn't use this class anymore. So, now we can get 
> rid of this dead code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5887) metastore direct sql doesn't work with oracle

2013-11-25 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-5887:
--

 Summary: metastore direct sql doesn't work with oracle
 Key: HIVE-5887
 URL: https://issues.apache.org/jira/browse/HIVE-5887
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Looks like "as" keyword is not needed/supported on Oracle. Let me make a quick 
patch...



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5414) The result of show grant is not visible via JDBC

2013-11-25 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5414:


Status: Open  (was: Patch Available)

> The result of show grant is not visible via JDBC
> 
>
> Key: HIVE-5414
> URL: https://issues.apache.org/jira/browse/HIVE-5414
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, JDBC
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: D13209.1.patch, D13209.2.patch, D13209.3.patch
>
>
> Currently, show grant / show role grant does not make fetch task, which 
> provides the result schema for jdbc clients.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-4518) Counter Strike: Operation Operator

2013-11-25 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4518:


   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Gunther and Jason!

> Counter Strike: Operation Operator
> --
>
> Key: HIVE-4518
> URL: https://issues.apache.org/jira/browse/HIVE-4518
> Project: Hive
>  Issue Type: Improvement
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.13.0
>
> Attachments: HIVE-4518.1.patch, HIVE-4518.10.patch, 
> HIVE-4518.11.patch, HIVE-4518.2.patch, HIVE-4518.3.patch, HIVE-4518.4.patch, 
> HIVE-4518.5.patch, HIVE-4518.6.patch.txt, HIVE-4518.7.patch, 
> HIVE-4518.8.patch, HIVE-4518.9.patch
>
>
> Queries of the form:
> from foo
> insert overwrite table bar partition (p) select ...
> insert overwrite table bar partition (p) select ...
> insert overwrite table bar partition (p) select ...
> Generate a huge amount of counters. The reason is that task.progress is 
> turned on for dynamic partitioning queries.
> The counters not only make queries slower than necessary (up to 50%) you will 
> also eventually run out. That's because we're wrapping them in enum values to 
> comply with hadoop 0.17.
> The real reason we turn task.progress on is that we need CREATED_FILES and 
> FATAL counters to ensure dynamic partitioning queries don't go haywire.
> The counters have counter-intuitive names like C1 through C1000 and don't 
> seem really useful by themselves.
> With hadoop 20+ you don't need to wrap the counters anymore, each operator 
> can simply create and increment counters. That should simplify the code a lot.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5025) Column aliases for input argument of GenericUDFs

2013-11-25 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-5025:


Status: Patch Available  (was: Open)

> Column aliases for input argument of GenericUDFs 
> -
>
> Key: HIVE-5025
> URL: https://issues.apache.org/jira/browse/HIVE-5025
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: D12093.2.patch, HIVE-5025.D12093.1.patch
>
>
> In some cases, column aliases for input argument are very useful to know. But 
> I cannot sure of this in the sense that UDFs should not be dependent to 
> contextual information like column alias.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5025) Column aliases for input argument of GenericUDFs

2013-11-25 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-5025:
--

Attachment: D12093.2.patch

navis updated the revision "HIVE-5025 [jira] Column aliases for input argument 
of GenericUDFs".

  Rebased to trunk

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12093

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12093?vs=37359&id=44403#toc

AFFECTED FILES
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFColumnNameTest.java
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFColumnNameTest.java
  
itests/util/src/main/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFColumnNameTest.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExprNodeGenericFuncEvaluator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/UDTFOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/AggregationDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeGenericFuncDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PTFDeserializer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/UDTFDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ptf/PTFExpressionDef.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFEvaluator.java
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDF.java
  ql/src/test/queries/clientpositive/udf_col_names.q
  ql/src/test/results/clientpositive/udf_col_names.q.out
  
serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java

To: JIRA, navis


> Column aliases for input argument of GenericUDFs 
> -
>
> Key: HIVE-5025
> URL: https://issues.apache.org/jira/browse/HIVE-5025
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: D12093.2.patch, HIVE-5025.D12093.1.patch
>
>
> In some cases, column aliases for input argument are very useful to know. But 
> I cannot sure of this in the sense that UDFs should not be dependent to 
> contextual information like column alias.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5886) [Refactor] Remove unused class JobCloseFeedback

2013-11-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5886:
---

Attachment: HIVE-5886.patch

> [Refactor] Remove unused class JobCloseFeedback
> ---
>
> Key: HIVE-5886
> URL: https://issues.apache.org/jira/browse/HIVE-5886
> Project: Hive
>  Issue Type: Task
>  Components: Query Processor
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-5886.patch
>
>
> It was originally added to aid dynamic partition code path, but that logic 
> has evolved such that it doesn't use this class anymore. So, now we can get 
> rid of this dead code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5886) [Refactor] Remove unused class JobCloseFeedback

2013-11-25 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-5886:
--

 Summary: [Refactor] Remove unused class JobCloseFeedback
 Key: HIVE-5886
 URL: https://issues.apache.org/jira/browse/HIVE-5886
 Project: Hive
  Issue Type: Task
  Components: Query Processor
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: HIVE-5886.patch

It was originally added to aid dynamic partition code path, but that logic has 
evolved such that it doesn't use this class anymore. So, now we can get rid of 
this dead code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5589) perflogger output is hard to associate with queries

2013-11-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832094#comment-13832094
 ] 

Sergey Shelukhin commented on HIVE-5589:


Ping? 

> perflogger output is hard to associate with queries
> ---
>
> Key: HIVE-5589
> URL: https://issues.apache.org/jira/browse/HIVE-5589
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Minor
> Attachments: HIVE-5589.01.patch, HIVE-5589.02.patch
>
>
> It would be nice to dump the query somewhere in output.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5706) Move a few numeric UDFs to generic implementations

2013-11-25 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5706:
--

Attachment: HIVE-5706.4.patch

Patch #4 updated based some of the RB comment.

> Move a few numeric UDFs to generic implementations
> --
>
> Key: HIVE-5706
> URL: https://issues.apache.org/jira/browse/HIVE-5706
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5706.1.patch, HIVE-5706.2.patch, HIVE-5706.3.patch, 
> HIVE-5706.4.patch, HIVE-5706.patch
>
>
> This is a follow-up JIRA for HIVE-5356 to reduce the review scope. It will 
> cover UDFOPPositive, UDFOPNegative, UDFCeil, UDFFloor, and UDFPower.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5885) Add myself and Jitendra to committer list

2013-11-25 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-5885:
--

Attachment: HIVE-5885.2.patch

Normalized line ends to Unix line ends in html file (since I'm on Windows)

> Add myself and Jitendra to committer list
> -
>
> Key: HIVE-5885
> URL: https://issues.apache.org/jira/browse/HIVE-5885
> Project: Hive
>  Issue Type: Task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
>Priority: Minor
> Attachments: HIVE-5885.1.patch, HIVE-5885.2.patch
>
>
> Update website to include me and Jitendra.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5885) Add myself and Jitendra to committer list

2013-11-25 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832058#comment-13832058
 ] 

Eric Hanson commented on HIVE-5885:
---

Added me and Jitendra to committer list. Fixed alphabetical order.

> Add myself and Jitendra to committer list
> -
>
> Key: HIVE-5885
> URL: https://issues.apache.org/jira/browse/HIVE-5885
> Project: Hive
>  Issue Type: Task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
>Priority: Minor
> Attachments: HIVE-5885.1.patch
>
>
> Update website to include me and Jitendra.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 15777: HIVE-5706: Move a few numeric UDFs to generic implementations

2013-11-25 Thread Xuefu Zhang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15777/#review29406
---



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFloorCeilBase.java


Okay. Will fix it.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNegative.java


Runtime exceptions represent problems that are the result of a programming 
problem, which seems proper for this case. That's, unless I programmed wrong, 
it shouldn't happen, which is exactly the case.

I'm not sure if other type exception really buy us anything.


- Xuefu Zhang


On Nov. 22, 2013, 5:52 a.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15777/
> ---
> 
> (Updated Nov. 22, 2013, 5:52 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-5706
> https://issues.apache.org/jira/browse/HIVE-5706
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 1. Replaced the old implementations of power, ceil, floor, positive, and 
> negative with generic UDF implmentations.
> 2. Added unit tests for each.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 435d6e6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
> 0a79256 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 151c648 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCeil.java a01122e 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFFloor.java 3fdaf88 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPNegative.java bab1105 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPPositive.java ae11d74 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFPower.java 184c5d2 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseUnary.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCeil.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFloor.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFloorCeilBase.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNegative.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPPositive.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFPower.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java
>  73bcee0 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFCeil.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFFloor.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPNegative.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPPositive.java
>  PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFPower.java 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/decimal_udf.q.out ed5bc65 
>   ql/src/test/results/clientpositive/literal_decimal.q.out 78dac31 
>   ql/src/test/results/clientpositive/udf4.q.out 50db96c 
>   ql/src/test/results/clientpositive/udf7.q.out 7316449 
>   ql/src/test/results/clientpositive/vectorization_short_regress.q.out 
> c9296e1 
>   ql/src/test/results/clientpositive/vectorized_math_funcs.q.out 8bb0edf 
>   ql/src/test/results/compiler/plan/udf4.q.xml 145e244 
> 
> Diff: https://reviews.apache.org/r/15777/diff/
> 
> 
> Testing
> ---
> 
> All new tests and old test passed.
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



[jira] [Updated] (HIVE-5885) Add myself and Jitendra to committer list

2013-11-25 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson updated HIVE-5885:
--

Attachment: HIVE-5885.1.patch

> Add myself and Jitendra to committer list
> -
>
> Key: HIVE-5885
> URL: https://issues.apache.org/jira/browse/HIVE-5885
> Project: Hive
>  Issue Type: Task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
>Priority: Minor
> Attachments: HIVE-5885.1.patch
>
>
> Update website to include me and Jitendra.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5731) Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832048#comment-13832048
 ] 

Ashutosh Chauhan commented on HIVE-5731:


Left couple of comments on RB. Other then that, patch looks good. 

> Use new GenericUDF instead of basic UDF for UDFDate* classes 
> -
>
> Key: HIVE-5731
> URL: https://issues.apache.org/jira/browse/HIVE-5731
> Project: Hive
>  Issue Type: Improvement
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5731.1.patch, HIVE-5731.2.patch, HIVE-5731.3.patch, 
> HIVE-5731.4.patch, HIVE-5731.5.patch, HIVE-5731.6.patch
>
>
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
> accept arguments of complex types, and return complex types. 2. It can 
> accept
> variable length of arguments. 3. It can accept an infinite number of 
> function
> signature - for example, it's easy to write a GenericUDF that accepts
> array, array> and so on (arbitrary levels of nesting). 4. 
> It
> can do short-circuit evaluations using DeferedObject."



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 15213: HIVE-5731: Use new GenericUDF instead of basic UDF for UDFDate* classes

2013-11-25 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15213/#review29405
---



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateDiff.java


Instead of converttoDate() and creating new Date() everytime, your earlier 
approach of getting DateWritable via converter and than obtaining Date from it 
via dw.get() was better, since new one requires creating new Date() everytime.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateDiff.java


Have this object result as a class member and then reuse that object across 
function call by doing result.set() on each invocation, that will help in 
object reuse and save new() on each function invocation.


- Ashutosh Chauhan


On Nov. 11, 2013, 9:03 p.m., Mohammad Islam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15213/
> ---
> 
> (Updated Nov. 11, 2013, 9:03 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-5731
> https://issues.apache.org/jira/browse/HIVE-5731
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> GenericUDF class is the latest and recommended base class for any UDFs.
> This JIRA is to change the current UDFDate* classes extended from GenericUDF.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 8d3a84f 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDate.java 3df453c 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateAdd.java b1b0bf2 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateDiff.java da14c4f 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFDateSub.java c8a1d1f 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDate.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateAdd.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateDiff.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDateSub.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFDate.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFDateAdd.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFDateDiff.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFDateSub.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFDateAdd.java f0af069 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFDateDiff.java 8a6dbc3 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFDateSub.java fa722a9 
>   ql/src/test/results/clientpositive/udf_to_date.q.out 6ff5ee8 
> 
> Diff: https://reviews.apache.org/r/15213/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Mohammad Islam
> 
>



[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-11-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832042#comment-13832042
 ] 

Sergey Shelukhin commented on HIVE-5317:


I think "the small number of rows" meant above was for the update, not the 
entire partition.
So, large dataset, small number of rows updated. Exporting entire dataset to 
rdbms to perform a query seems excessive in this case

> Implement insert, update, and delete in Hive with full ACID support
> ---
>
> Key: HIVE-5317
> URL: https://issues.apache.org/jira/browse/HIVE-5317
> Project: Hive
>  Issue Type: New Feature
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: InsertUpdatesinHive.pdf
>
>
> Many customers want to be able to insert, update and delete rows from Hive 
> tables with full ACID support. The use cases are varied, but the form of the 
> queries that should be supported are:
> * INSERT INTO tbl SELECT …
> * INSERT INTO tbl VALUES ...
> * UPDATE tbl SET … WHERE …
> * DELETE FROM tbl WHERE …
> * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
> ...
> * SET TRANSACTION LEVEL …
> * BEGIN/END TRANSACTION
> Use Cases
> * Once an hour, a set of inserts and updates (up to 500k rows) for various 
> dimension tables (eg. customer, inventory, stores) needs to be processed. The 
> dimension tables have primary keys and are typically bucketed and sorted on 
> those keys.
> * Once a day a small set (up to 100k rows) of records need to be deleted for 
> regulatory compliance.
> * Once an hour a log of transactions is exported from a RDBS and the fact 
> tables need to be updated (up to 1m rows)  to reflect the new data. The 
> transactions are a combination of inserts, updates, and deletes. The table is 
> partitioned and bucketed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5885) Add myself and Jitendra to committer list

2013-11-25 Thread Eric Hanson (JIRA)
Eric Hanson created HIVE-5885:
-

 Summary: Add myself and Jitendra to committer list
 Key: HIVE-5885
 URL: https://issues.apache.org/jira/browse/HIVE-5885
 Project: Hive
  Issue Type: Task
Reporter: Eric Hanson
Priority: Minor


Update website to include me and Jitendra.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HIVE-5885) Add myself and Jitendra to committer list

2013-11-25 Thread Eric Hanson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Hanson reassigned HIVE-5885:
-

Assignee: Eric Hanson

> Add myself and Jitendra to committer list
> -
>
> Key: HIVE-5885
> URL: https://issues.apache.org/jira/browse/HIVE-5885
> Project: Hive
>  Issue Type: Task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
>Priority: Minor
>
> Update website to include me and Jitendra.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13832020#comment-13832020
 ] 

Sergey Shelukhin commented on HIVE-5817:


About the other operators, what do you mean? In the explain of the original 
query I was looking at, vectorized select does this
{noformat}
Select Operator
  expressions:
expr: _col22
type: string
expr: _col53
type: float
  outputColumnNames: _col0, _col1
{noformat}

In that case, _col22 from 2 preceding joins happened to collide, but cannot, 
for example, _col1 from the last join become _col0 of select? So the collision 
will happen when _col1 of select is added and there's already _col1 from the 
last join

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5684) Serde support for char

2013-11-25 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5684:
-

Attachment: HIVE-5684.4.patch

re-uploading patch for precommit tests.

> Serde support for char
> --
>
> Key: HIVE-5684
> URL: https://issues.apache.org/jira/browse/HIVE-5684
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers, Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-5684.1.patch, HIVE-5684.2.patch, HIVE-5684.3.patch, 
> HIVE-5684.4.patch
>
>
> Update some of the SerDe's with char support



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5515) Writing to an HBase table throws IllegalArgumentException, failing job submission

2013-11-25 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated HIVE-5515:
-

Attachment: HIVE-5515.1.patch

> Writing to an HBase table throws IllegalArgumentException, failing job 
> submission
> -
>
> Key: HIVE-5515
> URL: https://issues.apache.org/jira/browse/HIVE-5515
> Project: Hive
>  Issue Type: Bug
>  Components: HBase Handler
>Affects Versions: 0.12.0
> Environment: Hadoop2, Hive 0.12.0, HBase-0.96RC
>Reporter: Nick Dimiduk
>Assignee: Viraj Bhat
>  Labels: hbase
> Fix For: 0.13.0
>
> Attachments: HIVE-5515.1.patch, HIVE-5515.patch
>
>
> Inserting data into HBase table via hive query fails with the following 
> message:
> {noformat}
> $ hive -e "FROM pgc INSERT OVERWRITE TABLE pagecounts_hbase SELECT pgc.* 
> WHERE rowkey LIKE 'en/q%' LIMIT 10;"
> ...
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> java.lang.IllegalArgumentException: Property value must not be null
> at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:810)
> at org.apache.hadoop.conf.Configuration.set(Configuration.java:792)
> at 
> org.apache.hadoop.hive.ql.exec.Utilities.copyTableJobPropertiesToConf(Utilities.java:2002)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.checkOutputSpecs(FileSinkOperator.java:947)
> at 
> org.apache.hadoop.hive.ql.io.HiveOutputFormatImpl.checkOutputSpecs(HiveOutputFormatImpl.java:67)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:458)
> at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:342)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:425)
> at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:348)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:731)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Job Submission failed with exception 
> 'java.lang.IllegalArgumentException(Property value

[jira] [Resolved] (HIVE-5882) Reduce logging verbosity on Tez

2013-11-25 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner resolved HIVE-5882.
--

Resolution: Fixed

Committed to branch.

> Reduce logging verbosity on Tez
> ---
>
> Key: HIVE-5882
> URL: https://issues.apache.org/jira/browse/HIVE-5882
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>Priority: Minor
> Fix For: tez-branch
>
> Attachments: HIVE-5882.1.patch
>
>
> Running on Tez with debug level set to INFO is very noisy.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5684) Serde support for char

2013-11-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831985#comment-13831985
 ] 

Xuefu Zhang commented on HIVE-5684:
---

[~jdere] Could you please rebase/reload to allow the test to re-rerun? Thanks.

> Serde support for char
> --
>
> Key: HIVE-5684
> URL: https://issues.apache.org/jira/browse/HIVE-5684
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers, Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-5684.1.patch, HIVE-5684.2.patch, HIVE-5684.3.patch
>
>
> Update some of the SerDe's with char support



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831979#comment-13831979
 ] 

Remus Rusanu commented on HIVE-5817:


[~sershe]: not sure about other operators. Aren't they covered by the existing 
addOutputColumn mechanism?

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5884) Mini tez cluster does not work after merging latest changes

2013-11-25 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5884:
-

Attachment: HIVE-5884.1.patch

This patch requires the user to build hive using the -Phadoop-1 or -Phadoop-2 
flags. The tez tests need to be run with the -Phadoop-2 flags as before.

> Mini tez cluster does not work after merging latest changes
> ---
>
> Key: HIVE-5884
> URL: https://issues.apache.org/jira/browse/HIVE-5884
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-5884.1.patch
>
>
> After merging the maven changes from trunk to the tez branch, the mini tez 
> tests do not work.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5884) Mini tez cluster does not work after merging latest changes

2013-11-25 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-5884:
-

Status: Patch Available  (was: Open)

> Mini tez cluster does not work after merging latest changes
> ---
>
> Key: HIVE-5884
> URL: https://issues.apache.org/jira/browse/HIVE-5884
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: tez-branch
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-5884.1.patch
>
>
> After merging the maven changes from trunk to the tez branch, the mini tez 
> tests do not work.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5882) Reduce logging verbosity on Tez

2013-11-25 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5882:
-

Attachment: HIVE-5882.1.patch

> Reduce logging verbosity on Tez
> ---
>
> Key: HIVE-5882
> URL: https://issues.apache.org/jira/browse/HIVE-5882
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
>Priority: Minor
> Fix For: tez-branch
>
> Attachments: HIVE-5882.1.patch
>
>
> Running on Tez with debug level set to INFO is very noisy.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5883) Plan is deserialized more often than necessary on Tez (in container reuse case)

2013-11-25 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5883:
-

Attachment: HIVE-5883.1.patch

> Plan is deserialized more often than necessary on Tez (in container reuse 
> case)
> ---
>
> Key: HIVE-5883
> URL: https://issues.apache.org/jira/browse/HIVE-5883
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: tez-branch
>
> Attachments: HIVE-5883.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831972#comment-13831972
 ] 

Remus Rusanu commented on HIVE-5817:


[~prasanth_j]: why I wanted to do it via 'vectorization regions' is because I 
believe this concept is needed not only for name resolution. Currently, w/o 
'regions', the same vectorization context is used for the entire query tree. 
this results in very wide (many columns) contexts, because there has to be a 
column in the context for every intermediate column. With Joins, this becomes 
quite wide.

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5883) Plan is deserialized more often than necessary on Tez (in container reuse case)

2013-11-25 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831971#comment-13831971
 ] 

Gunther Hagleitner commented on HIVE-5883:
--

When the plan is found in the tez object registry we need to add it to the 
global work map. Otherwise vectorized map operator, context aware readers, etc 
will still deserialize the plan one more time.

> Plan is deserialized more often than necessary on Tez (in container reuse 
> case)
> ---
>
> Key: HIVE-5883
> URL: https://issues.apache.org/jira/browse/HIVE-5883
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: tez-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5884) Mini tez cluster does not work after merging latest changes

2013-11-25 Thread Vikram Dixit K (JIRA)
Vikram Dixit K created HIVE-5884:


 Summary: Mini tez cluster does not work after merging latest 
changes
 Key: HIVE-5884
 URL: https://issues.apache.org/jira/browse/HIVE-5884
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K


After merging the maven changes from trunk to the tez branch, the mini tez 
tests do not work.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5883) Plan is deserialized more often than necessary on Tez (in container reuse case)

2013-11-25 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-5883:


 Summary: Plan is deserialized more often than necessary on Tez (in 
container reuse case)
 Key: HIVE-5883
 URL: https://issues.apache.org/jira/browse/HIVE-5883
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5882) Reduce logging verbosity on Tez

2013-11-25 Thread Gunther Hagleitner (JIRA)
Gunther Hagleitner created HIVE-5882:


 Summary: Reduce logging verbosity on Tez
 Key: HIVE-5882
 URL: https://issues.apache.org/jira/browse/HIVE-5882
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Minor
 Fix For: tez-branch


Running on Tez with debug level set to INFO is very noisy.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831968#comment-13831968
 ] 

Remus Rusanu commented on HIVE-5817:


My patch .4 addresses the issue the following manner:
 
 - vector operators can implement optional interface 
VectorizationContextRegion. If they do, they must provide a new vectorization 
context to be used by child operators. In my patch only VectorMapJoinOperator 
does so.
 - vectorizer walks up the stack of parent nodes to locate the first one (last 
one?) that created a vectorization context, and this is the vectorization 
context used to vectorize the current node. At the root of the stack there is a 
table scan that always creates a vectorization context. 
 - I made the VectorMapJoinOperator build the output VectorizedRowBatch using 
the VectorizedRowBatchCtx class, same as ORC and RC scanners do. This is more 
consistent and removes the need for the VectorizedRowBatch.buildBatch method 
(was used only by VMJ)
 - add a simplified init to VectorizedRowBatchCtx  to be used by VMJ (or any 
other operator we decide).

I did not enable yet 'submit patch' because more code can be removed  (the 
mapper scratch for vector type map) , code that was use donly by VMJ to enable 
it to build the output batch. Using VectorizedRowBatchCtx  makes all that code 
obsolete.

I tested the repro query and passes fine, produces 100 rows (I assume they're 
the right ones...). I will do some more testing.

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831966#comment-13831966
 ] 

Ashutosh Chauhan commented on HIVE-5817:


[~rusanu] We should add test-case provided by [~ehans] earlier in the comment 
thread as part of this patch.

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831960#comment-13831960
 ] 

Prasanth J commented on HIVE-5817:
--

I came across this exact same issue (column name collision) when I worked on 
HIVE-5369. As Ashutosh pointed out tabAlias.colName will work. This is what I 
am using for resolving column names from different parent operators. Reference 
code can be found in JoinStatsRule under StatsRulesProcFactory.java.
Another option is every operator will have output row schema which has list of 
ColumnInfo. ColumnInfo internally contains internal column name and table 
alias, when used as a key for hashmap should avoid column name collision. Hope 
this helps.

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831953#comment-13831953
 ] 

Remus Rusanu commented on HIVE-5817:


https://reviews.apache.org/r/15849/

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Review Request 15849: HIVE-5817 column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Remus Rusanu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15849/
---

Review request for hive, Ashutosh Chauhan, Eric Hanson, Jitendra Pandey, and 
Sergey Shelukhin.


Bugs: HIVE-5817
https://issues.apache.org/jira/browse/HIVE-5817


Repository: hive-git


Description
---

See HIVE-5817 for explanation


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java 
af11196 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContextRegion.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 
be7cb9f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java 
289757c 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
151c648 

Diff: https://reviews.apache.org/r/15849/diff/


Testing
---


Thanks,

Remus Rusanu



[jira] [Updated] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-5817:
---

Attachment: HIVE-5817.4.patch

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5704) A couple of generic UDFs are not in the right folder/package

2013-11-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831945#comment-13831945
 ] 

Xuefu Zhang commented on HIVE-5704:
---

Here it's!
{code}
svn mv ql/src/java/org/apache/hadoop/hive/ql/udf/GenericUDFDecode.java 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDecode.java
perl -i -pe 
's...@org.apache.hadoop.hive.ql.udf;@org.apache.hadoop.hive.ql.udf.generic;@g' 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDecode.java
svn mv ql/src/java/org/apache/hadoop/hive/ql/udf/GenericUDFEncode.java 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEncode.java
perl -i -pe 
's...@org.apache.hadoop.hive.ql.udf;@org.apache.hadoop.hive.ql.udf.generic;@g' 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFEncode.java
svn mv ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFAbs.java 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFAbs.java
perl -i -pe 
's...@org.apache.hadoop.hive.ql.udf;@org.apache.hadoop.hive.ql.udf.generic;@g' 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFAbs.java
svn mv ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFDecode.java 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFDecode.java
perl -i -pe 
's...@org.apache.hadoop.hive.ql.udf;@org.apache.hadoop.hive.ql.udf.generic;@g' 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFDecode.java
svn mv ql/src/test/org/apache/hadoop/hive/ql/udf/TestGenericUDFEncode.java 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFEncode.java
perl -i -pe 
's...@org.apache.hadoop.hive.ql.udf;@org.apache.hadoop.hive.ql.udf.generic;@g' 
ql/src/java/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFEncode.java
{code}

> A couple of generic UDFs are not in the right folder/package
> 
>
> Key: HIVE-5704
> URL: https://issues.apache.org/jira/browse/HIVE-5704
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Trivial
> Attachments: HIVE-5704-to-be-committed.patch, HIVE-5704.1.patch, 
> HIVE-5704.1.patch, HIVE-5704.patch
>
>
> There are two generic UDFs are in the package for non-generic UDFs. I think 
> it's better to be consistent but putting them in the udf.generic package



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 15777: HIVE-5706: Move a few numeric UDFs to generic implementations

2013-11-25 Thread Brock Noland

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15777/#review29393
---


LGTM, minor issues below.


ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFloorCeilBase.java


"0 + 1" 

?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNegative.java


Should be an IllegalStateException or AssertionError.


- Brock Noland


On Nov. 22, 2013, 5:52 a.m., Xuefu Zhang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15777/
> ---
> 
> (Updated Nov. 22, 2013, 5:52 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-5706
> https://issues.apache.org/jira/browse/HIVE-5706
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> 1. Replaced the old implementations of power, ceil, floor, positive, and 
> negative with generic UDF implmentations.
> 2. Added unit tests for each.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 435d6e6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 
> 0a79256 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
> 151c648 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFCeil.java a01122e 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFFloor.java 3fdaf88 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPNegative.java bab1105 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFOPPositive.java ae11d74 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFPower.java 184c5d2 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseUnary.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCeil.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFloor.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFFloorCeilBase.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPNegative.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPPositive.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFPower.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java
>  73bcee0 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFCeil.java 
> PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFFloor.java 
> PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPNegative.java
>  PRE-CREATION 
>   
> ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFOPPositive.java
>  PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFPower.java 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/decimal_udf.q.out ed5bc65 
>   ql/src/test/results/clientpositive/literal_decimal.q.out 78dac31 
>   ql/src/test/results/clientpositive/udf4.q.out 50db96c 
>   ql/src/test/results/clientpositive/udf7.q.out 7316449 
>   ql/src/test/results/clientpositive/vectorization_short_regress.q.out 
> c9296e1 
>   ql/src/test/results/clientpositive/vectorized_math_funcs.q.out 8bb0edf 
>   ql/src/test/results/compiler/plan/udf4.q.xml 145e244 
> 
> Diff: https://reviews.apache.org/r/15777/diff/
> 
> 
> Testing
> ---
> 
> All new tests and old test passed.
> 
> 
> Thanks,
> 
> Xuefu Zhang
> 
>



[jira] [Commented] (HIVE-5400) Allow admins to disable compile and other commands

2013-11-25 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831930#comment-13831930
 ] 

Thejas M Nair commented on HIVE-5400:
-

bq. but hive.conf.restricted.list is not documented in the wiki.
bq. Should it be in the wiki?
Yes
bq. If so, which release added it?
hive 0.11 . (patch HIVE-2935)


> Allow admins to disable compile and other commands
> --
>
> Key: HIVE-5400
> URL: https://issues.apache.org/jira/browse/HIVE-5400
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.13.0
>
> Attachments: HIVE-5400.patch, HIVE-5400.patch, HIVE-5400.patch
>
>
> From here: 
> https://issues.apache.org/jira/browse/HIVE-5253?focusedCommentId=13782220&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13782220
>  I think we should afford admins who want to disable this functionality the 
> ability to do so. Since such admins might want to disable other commands such 
> as add or dfs, it wouldn't be much trouble to allow them to do this as well. 
> For example we could have a configuration option "hive.available.commands" 
> (or similar) which specified add,set,delete,reset, etc by default. Then check 
> this value in CommandProcessorFactory. It would probably make sense to add 
> this property to the restrict list.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5704) A couple of generic UDFs are not in the right folder/package

2013-11-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831925#comment-13831925
 ] 

Brock Noland commented on HIVE-5704:


Hi,

Your SVN commands here 
https://issues.apache.org/jira/browse/HIVE-5704?focusedCommentId=13814652&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13814652
 do not have the associated perl commands to change the package the files after 
the svn mv.

Each file move should be:

1) the svn mv command
2) sed or perl command to update the file appropriately

example:

{noformat}
svn mv MyCLass.java MyClass.java
perl -i -pe 's@MyCLass@MyClass@g' MyClass.java
{noformat}

> A couple of generic UDFs are not in the right folder/package
> 
>
> Key: HIVE-5704
> URL: https://issues.apache.org/jira/browse/HIVE-5704
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Trivial
> Attachments: HIVE-5704-to-be-committed.patch, HIVE-5704.1.patch, 
> HIVE-5704.1.patch, HIVE-5704.patch
>
>
> There are two generic UDFs are in the package for non-generic UDFs. I think 
> it's better to be consistent but putting them in the udf.generic package



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5876) Split elimination in ORC breaks for partitioned tables

2013-11-25 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-5876:
-

Attachment: HIVE-5876.2.patch

Added fix for failing test.

> Split elimination in ORC breaks for partitioned tables
> --
>
> Key: HIVE-5876
> URL: https://issues.apache.org/jira/browse/HIVE-5876
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>  Labels: orcfile
> Attachments: HIVE-5876.1.patch, HIVE-5876.2.patch
>
>
> HIVE-5632 eliminates ORC stripes from split computation that do not satisfy 
> SARG condition. SARG expression can also refer to partition columns. But 
> partition column will not be contained in the column names list in ORC file. 
> This was causing ArrayIndexOutOfBoundException in split elimination logic 
> when used with partitioned tables. The fix is to ignore evaluation of 
> partition column expressions in split elimination.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Hive-trunk-h0.21 - Build # 2471 - Still Failing

2013-11-25 Thread Apache Jenkins Server
Changes for Build #2438
[brock] HIVE-5740: Tar files should extract to the directory of the same name 
minus tar.gz (Brock Noland reviewed by Xuefu Zhang)

[brock] HIVE-5611: Add assembly (i.e.) tar creation to pom (Szehon Ho via Brock 
Noland)

[brock] HIVE-5707: Validate values for ConfVar (Navis via Brock Noland)

[brock] HIVE-5721: Incremental build is disabled by MCOMPILER-209 (Navis via 
Brock Noland)


Changes for Build #2439
[brock] HIVE-5354 - Decimal precision/scale support in ORC file (Xuefu Zhang 
via Brock Noland)

[brock] HIVE-4523 - round() function with specified decimal places not 
consistent with mysql (Xuefu Zhang via Brock Noland)

[thejas] HIVE-5542 : Webhcat is failing to run ddl command on a secure cluster 
(Sushanth Sowmyan via Thejas Nair)


Changes for Build #2440
[brock] HIVE-5730: Beeline throws non-terminal NPE upon starting, after 
mavenization (Szehon Ho reviewed by Navis)


Changes for Build #2441
[omalley] HIVE-5425 Provide a configuration option to control the default stripe
size for ORC. (omalley reviewed by gunther)

[omalley] Revert HIVE-5583 since it broke the build.

[hashutosh] HIVE-5583 : Implement support for IN (list-of-constants) filter in 
vectorized mode (Eric Hanson via Ashutosh Chauhan)

[brock] HIVE-5355 - JDBC support for decimal precision/scale


Changes for Build #2443
[brock] HIVE-5351 - Secure-Socket-Layer (SSL) support for HiveServer2 (Prasad 
Mujumdar via Brock Noland)

[hashutosh] HIVE-5583 : Implement support for IN (list-of-constants) filter in 
vectorized mode (Eric Hanson via Ashutosh Chauhan)

[brock] HIVE-5773 - Fix build due to conflict between HIVE-5711 and HIVE-5713

[brock] HIVE-5711 - Fix eclipse:eclipse maven goal (Carl Steinbach via Brock 
Noland)

[brock] HIVE-5752 - log4j properties appear to have been lost in maven upgrade 
(Sergey Shelukhin via Brock Noland)

[brock] HIVE-5713 - Verify versions of libraries post maven merge (Brock Noland 
reviewed by Gunther Hagleitner)

[brock] HIVE-5765 - Beeline throws NPE when -e option is used (Szehon Ho via 
Brock Noland)

[xuefu] HIVE-5726: The DecimalTypeInfo instance associated with a decimal 
constant is not in line with the precision/scale of the constant (reviewed by 
Brock)

[xuefu] HIVE-5655: Hive incorrecly handles divide-by-zero case (reviewed by 
Edward and Brock)

[xuefu] HIVE-5191: Add char data type (Jason via Xuefu)


Changes for Build #2444
[brock] HIVE-5780 - Add the missing declaration of HIVE_CLI_SERVICE_PROTOCOL_V4 
in TCLIService.thrift (Prasad Mujumdar via Brock Noland)


Changes for Build #2445
[gunther] HIVE-5601: NPE in ORC's PPD when using select * from table with where 
predicate (Prasanth J via Owen O'Malley and Gunther Hagleitner)

[gunther] HIVE-5562: Provide stripe level column statistics in ORC (Patch by 
Prasanth J, reviewed by Owen O'Malley, committed by Gunther Hagleitner)

[hashutosh] HIVE-3777 : add a property in the partition to figure out if stats 
are accurate (Ashutosh Chauhan via Thejas Nair)


Changes for Build #2446
[hashutosh] HIVE-5691 : Intermediate columns are incorrectly initialized for 
partitioned tables. (Jitendra Nath Pandey via Gunther Hagleitner)

[hashutosh] HIVE-5779 : Subquery in where clause with distinct fails with 
mapjoin turned on with serialization error. (Ashutosh Chauhan via Harish Butani)

[gunther] HIVE-5632 (partial): Adding test data to data/files to enable 
pre-commit tests to run. (Prasanth J via Gunther Hagleitner)


Changes for Build #2447
[cws] HIVE-5786: Remove HadoopShims methods that were needed for pre-Hadoop 
0.20 (Jason Dere via cws)

[thejas] HIVE-5229 : Better thread management for HiveServer2 async threads 
(Vaibhav Gumashta via Thejas Nair)

[gunther] HIVE-5745: TestHiveLogging is failing (at least on mac) (Gunther 
Hagleitner, reviewed by Ashutosh Chauhan)


Changes for Build #2448
[hashutosh] HIVE-5699 : Add unit test for vectorized BETWEEN for timestamp 
inputs (Eric Hanson via Ashutosh Chauhan)

[hashutosh] HIVE-5767 : in SemanticAnalyzer#doPhase1, handling for TOK_UNION 
falls thru into TOK_INSERT (Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5657 : TopN produces incorrect results with count(distinct) 
(Sergey Shelukhin via Ashutosh Chauhan)


Changes for Build #2450
[hashutosh] HIVE-5683 : JDBC support for char (Jason Dere via Xuefu Zhang)

[hashutosh] HIVE-5626 : enable metastore direct SQL for drop/similar queries 
(Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5700 : enforce single date format for partition column storage 
(Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5753 : Remove collector from Operator base class (Mohammad 
Islam via Ashutosh Chauhan)

[hashutosh] HIVE-5737 : Provide StructObjectInspector for UDTFs rather than 
ObjectInspect[] (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-5790 : maven test build  failure shows wrong error message 
(Mohammad Islam via Ashutosh Chauhan)

[hashutosh] HIVE-5722 : Skip generating vectorization code if possible (Navis 

Hive-trunk-hadoop2 - Build # 570 - Still Failing

2013-11-25 Thread Apache Jenkins Server
Changes for Build #537
[brock] HIVE-5740: Tar files should extract to the directory of the same name 
minus tar.gz (Brock Noland reviewed by Xuefu Zhang)

[brock] HIVE-5611: Add assembly (i.e.) tar creation to pom (Szehon Ho via Brock 
Noland)

[brock] HIVE-5707: Validate values for ConfVar (Navis via Brock Noland)

[brock] HIVE-5721: Incremental build is disabled by MCOMPILER-209 (Navis via 
Brock Noland)


Changes for Build #538
[brock] HIVE-5354 - Decimal precision/scale support in ORC file (Xuefu Zhang 
via Brock Noland)

[brock] HIVE-4523 - round() function with specified decimal places not 
consistent with mysql (Xuefu Zhang via Brock Noland)

[thejas] HIVE-5542 : Webhcat is failing to run ddl command on a secure cluster 
(Sushanth Sowmyan via Thejas Nair)


Changes for Build #539
[brock] HIVE-5730: Beeline throws non-terminal NPE upon starting, after 
mavenization (Szehon Ho reviewed by Navis)


Changes for Build #540
[omalley] HIVE-5425 Provide a configuration option to control the default stripe
size for ORC. (omalley reviewed by gunther)

[omalley] Revert HIVE-5583 since it broke the build.

[hashutosh] HIVE-5583 : Implement support for IN (list-of-constants) filter in 
vectorized mode (Eric Hanson via Ashutosh Chauhan)

[brock] HIVE-5355 - JDBC support for decimal precision/scale


Changes for Build #541
[hashutosh] HIVE-5583 : Implement support for IN (list-of-constants) filter in 
vectorized mode (Eric Hanson via Ashutosh Chauhan)

[brock] HIVE-5773 - Fix build due to conflict between HIVE-5711 and HIVE-5713

[brock] HIVE-5711 - Fix eclipse:eclipse maven goal (Carl Steinbach via Brock 
Noland)

[brock] HIVE-5752 - log4j properties appear to have been lost in maven upgrade 
(Sergey Shelukhin via Brock Noland)

[brock] HIVE-5713 - Verify versions of libraries post maven merge (Brock Noland 
reviewed by Gunther Hagleitner)

[brock] HIVE-5765 - Beeline throws NPE when -e option is used (Szehon Ho via 
Brock Noland)

[xuefu] HIVE-5726: The DecimalTypeInfo instance associated with a decimal 
constant is not in line with the precision/scale of the constant (reviewed by 
Brock)

[xuefu] HIVE-5655: Hive incorrecly handles divide-by-zero case (reviewed by 
Edward and Brock)

[xuefu] HIVE-5191: Add char data type (Jason via Xuefu)


Changes for Build #542
[brock] HIVE-5351 - Secure-Socket-Layer (SSL) support for HiveServer2 (Prasad 
Mujumdar via Brock Noland)


Changes for Build #543
[brock] HIVE-5780 - Add the missing declaration of HIVE_CLI_SERVICE_PROTOCOL_V4 
in TCLIService.thrift (Prasad Mujumdar via Brock Noland)


Changes for Build #544
[gunther] HIVE-5601: NPE in ORC's PPD when using select * from table with where 
predicate (Prasanth J via Owen O'Malley and Gunther Hagleitner)

[gunther] HIVE-5562: Provide stripe level column statistics in ORC (Patch by 
Prasanth J, reviewed by Owen O'Malley, committed by Gunther Hagleitner)

[hashutosh] HIVE-3777 : add a property in the partition to figure out if stats 
are accurate (Ashutosh Chauhan via Thejas Nair)


Changes for Build #545
[hashutosh] HIVE-5691 : Intermediate columns are incorrectly initialized for 
partitioned tables. (Jitendra Nath Pandey via Gunther Hagleitner)

[hashutosh] HIVE-5779 : Subquery in where clause with distinct fails with 
mapjoin turned on with serialization error. (Ashutosh Chauhan via Harish Butani)

[gunther] HIVE-5632 (partial): Adding test data to data/files to enable 
pre-commit tests to run. (Prasanth J via Gunther Hagleitner)


Changes for Build #546
[cws] HIVE-5786: Remove HadoopShims methods that were needed for pre-Hadoop 
0.20 (Jason Dere via cws)

[thejas] HIVE-5229 : Better thread management for HiveServer2 async threads 
(Vaibhav Gumashta via Thejas Nair)

[gunther] HIVE-5745: TestHiveLogging is failing (at least on mac) (Gunther 
Hagleitner, reviewed by Ashutosh Chauhan)


Changes for Build #547
[hashutosh] HIVE-5699 : Add unit test for vectorized BETWEEN for timestamp 
inputs (Eric Hanson via Ashutosh Chauhan)

[hashutosh] HIVE-5767 : in SemanticAnalyzer#doPhase1, handling for TOK_UNION 
falls thru into TOK_INSERT (Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5657 : TopN produces incorrect results with count(distinct) 
(Sergey Shelukhin via Ashutosh Chauhan)


Changes for Build #549
[hashutosh] HIVE-5753 : Remove collector from Operator base class (Mohammad 
Islam via Ashutosh Chauhan)

[hashutosh] HIVE-5737 : Provide StructObjectInspector for UDTFs rather than 
ObjectInspect[] (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-5790 : maven test build  failure shows wrong error message 
(Mohammad Islam via Ashutosh Chauhan)

[hashutosh] HIVE-5722 : Skip generating vectorization code if possible (Navis 
via Brock Noland)

[hashutosh] HIVE-5697 : Correlation Optimizer may generate wrong plans for 
cases involving outer join (Yin Huai via Ashutosh Chauhan)

[hashutosh] HIVE-4880 : Rearrange explain order of stages simpler (Navis via 
Ashutosh Chauhan)

[xuefu] HIVE-5286: Negative test date_literal1.q fai

[jira] [Commented] (HIVE-5853) Hive Lock Manager leaks zookeeper connections

2013-11-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831871#comment-13831871
 ] 

Brock Noland commented on HIVE-5853:


HS2 is strongly recommended over HS1 and in fact HS1 is considered deprecated.  
I'd suggest you test with HS2. From a SQL perspective it should be compatible 
with HS1.

> Hive Lock Manager leaks zookeeper connections
> -
>
> Key: HIVE-5853
> URL: https://issues.apache.org/jira/browse/HIVE-5853
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Harel Ben Attia
>
> Hive 0.10 leaks zookeeper connections from ZooKeeperHiveLockManager. 
> HIVE-3723 describes a similar issue for cases of semantic errors and 
> failures, but we're experiencing a consistent connection leak per query (even 
> simple successful queries like "select * from dual").
> Workaround: When turning off hive.support.concurrency, everything works fine 
> - no leak (obviously, since the lock manager is not used).
> Details:
> OS: CentOS 5.9
> Hive version: hive-server-0.10.0+67-1.cdh4.2.0.p0.10.el5 and 
> hive-0.10.0+198-1.cdh4.4.0.p0.15.el5
> Hadoop version: CDH4.2
> Namenode uses HA. Hive's zookeeper configuration uses the NN zookeeper.
> The problem occurs both when using the python thrift API, and the java thrift 
> API. 
> The leak happens even when we're running repeated "select * from dual" 
> queries. We've checked the zookeeper connections using "netstat -n | grep 
> 2181 | grep ESTAB | wc -l".
> Eventually, the connection from the client reach the max connections per 
> client limit in ZK, causing new queries to get stuck and never return.
> We'll gladly provide more information if needed.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-1975) "insert overwrite directory" Not able to insert data with multi level directory path

2013-11-25 Thread Vijay Ratnagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831862#comment-13831862
 ] 

Vijay Ratnagiri commented on HIVE-1975:
---

Hey Guys,

I'm using hive 0.11.0 and I just verified that I'm facing this exact problem.

I first tried to ask hive to create a multilevel path and I got: "return code 1 
from org.apache.hadoop.hive.ql.exec.MoveTask" 

When I switched to using a simple one level directory, my query succeded and I 
could seem my data written out.

Can anyone else corroborate?

Thanks!

> "insert overwrite directory" Not able to insert data with multi level 
> directory path
> 
>
> Key: HIVE-1975
> URL: https://issues.apache.org/jira/browse/HIVE-1975
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.5.0
> Environment: Hadoop 0.20.1, Hive0.5.0 and SUSE Linux Enterprise 
> Server 10 SP2 (i586) - Kernel 2.6.16.60-0.21-smp (5).
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Fix For: 0.8.0
>
> Attachments: HIVE-1975.1.patch, HIVE-1975.2.patch, HIVE-1975.3.patch, 
> HIVE-1975.patch
>
>
> Below query execution is failed
> Ex:
> {noformat}
>insert overwrite directory '/HIVEFT25686/chinna/' select * from dept_j;
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5849) Improve the stats of operators based on heuristics in the absence of any column statistics

2013-11-25 Thread Harish Butani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-5849:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks Prasanth

> Improve the stats of operators based on heuristics in the absence of any 
> column statistics
> --
>
> Key: HIVE-5849
> URL: https://issues.apache.org/jira/browse/HIVE-5849
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Processor, Statistics
>Reporter: Prasanth J
>Assignee: Prasanth J
> Fix For: 0.13.0
>
> Attachments: HIVE-5849.1.patch.txt, HIVE-5849.2.patch.txt, 
> HIVE-5849.3.patch, HIVE-5849.3.patch.txt, HIVE-5849.4.javaonly.patch, 
> HIVE-5849.5.patch, HIVE-5849.6.patch, HIVE-5849.7.patch
>
>
> In the absence of any column statistics, operators will simply use the 
> statistics from its parents. It is useful to apply some heuristics to update 
> basic statistics (number of rows and data size) in the absence of any column 
> statistics. This will be worst case scenario.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5880) Rename HCatalog HBase Storage Handler artifact id

2013-11-25 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831793#comment-13831793
 ] 

Prasad Mujumdar commented on HIVE-5880:
---

LGTM

+1


> Rename HCatalog HBase Storage Handler artifact id
> -
>
> Key: HIVE-5880
> URL: https://issues.apache.org/jira/browse/HIVE-5880
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Attachments: HIVE-5880.patch
>
>
> Current the HBase storage handler is named hive-hbase-storage-handler. I 
> think we should rename it to hive-hcatalog-hbase-storage-handler to match the 
> other hcatalog artifacts and to differentiate it from the hive-hbase-handler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5881) Integrate Hive with Morphlines

2013-11-25 Thread wolfgang hoschek (JIRA)
wolfgang hoschek created HIVE-5881:
--

 Summary: Integrate Hive with Morphlines
 Key: HIVE-5881
 URL: https://issues.apache.org/jira/browse/HIVE-5881
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.12.0
Reporter: wolfgang hoschek


Integrate Hive with morphlines. 

Specifically, add support to Hive to call a morphline as a UDTF to leverage 
existing morphlines ETL functionality. Often, some flexible massaging needs to 
happen to get the input data into the shape that's desired, and morphline logic 
helps do this in a user-friendly, pluggable, efficient, pipelined manner. This 
issue basically boils down to transforming an input row into a morphline 
record, feeding the record into the morphline processing API, and finally 
converting zero or more morphline output records into corresponding Hive rows.

Some background is here:

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/morphlinesReferenceGuide.html

http://cloudera.github.io/cdk/docs/current/cdk-morphlines/index.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5880) Rename HCatalog HBase Storage Handler artifact id

2013-11-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831779#comment-13831779
 ] 

Hive QA commented on HIVE-5880:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12615630/HIVE-5880.patch

{color:green}SUCCESS:{color} +1 4684 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/437/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/437/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12615630

> Rename HCatalog HBase Storage Handler artifact id
> -
>
> Key: HIVE-5880
> URL: https://issues.apache.org/jira/browse/HIVE-5880
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Attachments: HIVE-5880.patch
>
>
> Current the HBase storage handler is named hive-hbase-storage-handler. I 
> think we should rename it to hive-hcatalog-hbase-storage-handler to match the 
> other hcatalog artifacts and to differentiate it from the hive-hbase-handler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice

2013-11-25 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831773#comment-13831773
 ] 

Sushanth Sowmyan commented on HIVE-3682:


[~caofangkun] : Thanks for bringing that up, apologies for not noticing it till 
now, I'll add it in to the wiki.

[~vratnagiri] : Well, for writing out to hdfs, there already exists a way to do 
this, and that is to write out to a new table at that location. What was 
lacking was the ability to be able to support a write to a local directory with 
the features that exist for a hdfs write, and therefore, this was added. 
Basically, you can do a CREATE TABLE with whatever format you want, at an 
appropriate hdfs location, and then do an INSERT OVERWRITE into that table with 
the results of whatever SELECT you desire. :)

> when output hive table to file,users should could have a separator of their 
> own choice
> --
>
> Key: HIVE-3682
> URL: https://issues.apache.org/jira/browse/HIVE-3682
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Affects Versions: 0.8.1
> Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 
> 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux
> java version "1.6.0_25"
> hadoop-0.20.2-cdh3u0
> hive-0.8.1
>Reporter: caofangkun
>Assignee: Sushanth Sowmyan
> Fix For: 0.11.0
>
> Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, 
> HIVE-3682.D10275.2.patch, HIVE-3682.D10275.3.patch, HIVE-3682.D10275.4.patch, 
> HIVE-3682.D10275.4.patch.for.0.11, HIVE-3682.with.serde.patch
>
>
> By default,when output hive table to file ,columns of the Hive table are 
> separated by ^A character (that is \001).
> But indeed users should have the right to set a seperator of their own choice.
> Usage Example:
> create table for_test (key string, value string);
> load data local inpath './in1.txt' into table for_test
> select * from for_test;
> UT-01:default separator is \001 line separator is \n
> insert overwrite local directory './test-01' 
> select * from src ;
> create table array_table (a array, b array)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ',';
> load data local inpath "../hive/examples/files/arraytest.txt" overwrite into 
> table table2;
> CREATE TABLE map_table (foo STRING , bar MAP)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> STORED AS TEXTFILE;
> UT-02:defined field separator as ':'
> insert overwrite local directory './test-02' 
> row format delimited 
> FIELDS TERMINATED BY ':' 
> select * from src ;
> UT-03: line separator DO NOT ALLOWED to define as other separator 
> insert overwrite local directory './test-03' 
> row format delimited 
> FIELDS TERMINATED BY ':' 
> select * from src ;
> UT-04: define map separators 
> insert overwrite local directory './test-04' 
> row format delimited 
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> select * from src;



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831772#comment-13831772
 ] 

Sergey Shelukhin commented on HIVE-5817:


Currently other operators also change the column names. In fact there's new 
method that gets the source operator for lineage, note that the only place 
where it passes thru to parent is Filter - everyone except filter passes their 
own column names to child ops (which may or may not be necessary).

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5870) Move TestJDBCDriver2.testNewConnectionConfiguration to TestJDBCWithMiniHS2

2013-11-25 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831671#comment-13831671
 ] 

Xuefu Zhang commented on HIVE-5870:
---

+1

> Move TestJDBCDriver2.testNewConnectionConfiguration to TestJDBCWithMiniHS2
> --
>
> Key: HIVE-5870
> URL: https://issues.apache.org/jira/browse/HIVE-5870
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.13.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-5870.patch
>
>
> TestJDBCDriver2.testNewConnectionConfiguration() attempts to start a 
> Hiveserver2 instance in the test.
> This can cause issues as creating HiveServer2 needs correct environment/path. 
>  This test should be moved to TestJdbcWithMiniHS2, which uses MiniHS2.  
> MiniHS2 is for this purpose (setting all the environment properly before 
> starting HiveServer2 instance).



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5880) Rename HCatalog HBase Storage Handler artifact id

2013-11-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5880:
---

Status: Patch Available  (was: Open)

> Rename HCatalog HBase Storage Handler artifact id
> -
>
> Key: HIVE-5880
> URL: https://issues.apache.org/jira/browse/HIVE-5880
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Attachments: HIVE-5880.patch
>
>
> Current the HBase storage handler is named hive-hbase-storage-handler. I 
> think we should rename it to hive-hcatalog-hbase-storage-handler to match the 
> other hcatalog artifacts and to differentiate it from the hive-hbase-handler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5880) Rename HCatalog HBase Storage Handler artifact id

2013-11-25 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5880:
---

Attachment: HIVE-5880.patch

> Rename HCatalog HBase Storage Handler artifact id
> -
>
> Key: HIVE-5880
> URL: https://issues.apache.org/jira/browse/HIVE-5880
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Attachments: HIVE-5880.patch
>
>
> Current the HBase storage handler is named hive-hbase-storage-handler. I 
> think we should rename it to hive-hcatalog-hbase-storage-handler to match the 
> other hcatalog artifacts and to differentiate it from the hive-hbase-handler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5880) Rename HCatalog HBase Storage Handler artifact id

2013-11-25 Thread Brock Noland (JIRA)
Brock Noland created HIVE-5880:
--

 Summary: Rename HCatalog HBase Storage Handler artifact id
 Key: HIVE-5880
 URL: https://issues.apache.org/jira/browse/HIVE-5880
 Project: Hive
  Issue Type: Sub-task
Reporter: Brock Noland
Assignee: Brock Noland
Priority: Minor


Current the HBase storage handler is named hive-hbase-storage-handler. I think 
we should rename it to hive-hcatalog-hbase-storage-handler to match the other 
hcatalog artifacts and to differentiate it from the hive-hbase-handler.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831645#comment-13831645
 ] 

Eric Hanson commented on HIVE-5817:
---

Great! Thanks for jumping in.

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5839) BytesRefArrayWritable compareTo violates contract

2013-11-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831640#comment-13831640
 ] 

Hive QA commented on HIVE-5839:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12615615/HIVE-5839.2.patch

{color:green}SUCCESS:{color} +1 4684 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/436/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/436/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12615615

> BytesRefArrayWritable compareTo violates contract
> -
>
> Key: HIVE-5839
> URL: https://issues.apache.org/jira/browse/HIVE-5839
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Ian Robertson
>Assignee: Xuefu Zhang
> Attachments: HIVE-5839.1.patch, HIVE-5839.2.patch, HIVE-5839.patch, 
> HIVE-5839.patch
>
>
> BytesRefArrayWritable's compareTo violates the compareTo contract from 
> java.lang.Object. Specifically:
> * The implementor must ensure sgn(x.compareTo( y )) == -sgn(y.compareTo( x )) 
> for all x and y.
> The compareTo implementation on BytesRefArrayWritable does a proper 
> comparison of the sizes of the two instances. However, if the sizes are the 
> same, it proceeds to do a check if both array's have the same constant. If 
> not, it returns 1. This means that if x and y are two BytesRefArrayWritable 
> instances with the same size, but different contents, then x.compareTo( y ) 
> == 1 and y.compareTo( x ) == 1.
> Additionally, the comparison of contents is order agnostic. This seems wrong, 
> since order of entries should matter. It is also very inefficient, running at 
> O(n^2), where n is the number of entries.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5793) Update hive-default.xml.template for HIVE-4002

2013-11-25 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5793:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Navis!

> Update hive-default.xml.template for HIVE-4002
> --
>
> Key: HIVE-5793
> URL: https://issues.apache.org/jira/browse/HIVE-5793
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: HIVE-5793.1.patch.txt, HIVE-5793.2.patch.txt, 
> HIVE-5793.3.patch.txt
>
>
> Addressing 
> https://issues.apache.org/jira/browse/HIVE-3990?focusedCommentId=13818388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13818388



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice

2013-11-25 Thread Vijay Ratnagiri (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831625#comment-13831625
 ] 

Vijay Ratnagiri commented on HIVE-3682:
---

Hey Guys,

I was really delighted to find that the export finally supported choosing the 
format, but unfortunately, my delight was short lived when I discovered thet 
this feature is supported only for 'insert overwrite LOCAL directory' and not 
when I'm exporting to an HDFS directory.

I get a syntax/parse error when I try to export to an HDFS directory with a 
custom row format.

How come this feature was implimented like this? If this wasn't intentional, 
then, does this warrant reopening this ticket? 

Thanks!

> when output hive table to file,users should could have a separator of their 
> own choice
> --
>
> Key: HIVE-3682
> URL: https://issues.apache.org/jira/browse/HIVE-3682
> Project: Hive
>  Issue Type: New Feature
>  Components: CLI
>Affects Versions: 0.8.1
> Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 
> 20:34:47 UTC 2011 i686 i686 i386 GNU/Linux
> java version "1.6.0_25"
> hadoop-0.20.2-cdh3u0
> hive-0.8.1
>Reporter: caofangkun
>Assignee: Sushanth Sowmyan
> Fix For: 0.11.0
>
> Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, 
> HIVE-3682.D10275.2.patch, HIVE-3682.D10275.3.patch, HIVE-3682.D10275.4.patch, 
> HIVE-3682.D10275.4.patch.for.0.11, HIVE-3682.with.serde.patch
>
>
> By default,when output hive table to file ,columns of the Hive table are 
> separated by ^A character (that is \001).
> But indeed users should have the right to set a seperator of their own choice.
> Usage Example:
> create table for_test (key string, value string);
> load data local inpath './in1.txt' into table for_test
> select * from for_test;
> UT-01:default separator is \001 line separator is \n
> insert overwrite local directory './test-01' 
> select * from src ;
> create table array_table (a array, b array)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ',';
> load data local inpath "../hive/examples/files/arraytest.txt" overwrite into 
> table table2;
> CREATE TABLE map_table (foo STRING , bar MAP)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> STORED AS TEXTFILE;
> UT-02:defined field separator as ':'
> insert overwrite local directory './test-02' 
> row format delimited 
> FIELDS TERMINATED BY ':' 
> select * from src ;
> UT-03: line separator DO NOT ALLOWED to define as other separator 
> insert overwrite local directory './test-03' 
> row format delimited 
> FIELDS TERMINATED BY ':' 
> select * from src ;
> UT-04: define map separators 
> insert overwrite local directory './test-04' 
> row format delimited 
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> select * from src;



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5793) Update hive-default.xml.template for HIVE-4002

2013-11-25 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831613#comment-13831613
 ] 

Ashutosh Chauhan commented on HIVE-5793:


+1

> Update hive-default.xml.template for HIVE-4002
> --
>
> Key: HIVE-5793
> URL: https://issues.apache.org/jira/browse/HIVE-5793
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-5793.1.patch.txt, HIVE-5793.2.patch.txt, 
> HIVE-5793.3.patch.txt
>
>
> Addressing 
> https://issues.apache.org/jira/browse/HIVE-3990?focusedCommentId=13818388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13818388



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5833) Remove versions from child module dependencies

2013-11-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831601#comment-13831601
 ] 

Brock Noland commented on HIVE-5833:


+1

> Remove versions from child module dependencies
> --
>
> Key: HIVE-5833
> URL: https://issues.apache.org/jira/browse/HIVE-5833
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
> Attachments: HIVE-5833.2.patch, HIVE-5833.patch
>
>
> HIVE-5741 moved all dependencies to the plugin management section of the 
> parent pom therefore we can remove 
> {noformat}${dep.version}{noformat} from all dependencies 
> in child modules.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5839) BytesRefArrayWritable compareTo violates contract

2013-11-25 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5839:
--

Attachment: HIVE-5839.2.patch

Patch #2 updated with two lines of minor comments based on rb request.

> BytesRefArrayWritable compareTo violates contract
> -
>
> Key: HIVE-5839
> URL: https://issues.apache.org/jira/browse/HIVE-5839
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Ian Robertson
>Assignee: Xuefu Zhang
> Attachments: HIVE-5839.1.patch, HIVE-5839.2.patch, HIVE-5839.patch, 
> HIVE-5839.patch
>
>
> BytesRefArrayWritable's compareTo violates the compareTo contract from 
> java.lang.Object. Specifically:
> * The implementor must ensure sgn(x.compareTo( y )) == -sgn(y.compareTo( x )) 
> for all x and y.
> The compareTo implementation on BytesRefArrayWritable does a proper 
> comparison of the sizes of the two instances. However, if the sizes are the 
> same, it proceeds to do a check if both array's have the same constant. If 
> not, it returns 1. This means that if x and y are two BytesRefArrayWritable 
> instances with the same size, but different contents, then x.compareTo( y ) 
> == 1 and y.compareTo( x ) == 1.
> Additionally, the comparison of contents is order agnostic. This seems wrong, 
> since order of entries should matter. It is also very inefficient, running at 
> O(n^2), where n is the number of entries.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5879) Fix spelling errors in hive-default.xml

2013-11-25 Thread Brock Noland (JIRA)
Brock Noland created HIVE-5879:
--

 Summary: Fix spelling errors in hive-default.xml
 Key: HIVE-5879
 URL: https://issues.apache.org/jira/browse/HIVE-5879
 Project: Hive
  Issue Type: Improvement
Reporter: Brock Noland
Priority: Trivial


See 
https://issues.apache.org/jira/browse/HIVE-5400?focusedCommentId=13830626&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13830626



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5400) Allow admins to disable compile and other commands

2013-11-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831588#comment-13831588
 ] 

Brock Noland commented on HIVE-5400:


Nice. I have always had trouble with that word.

 I created https://issues.apache.org/jira/browse/HIVE-5879 for that.

> Allow admins to disable compile and other commands
> --
>
> Key: HIVE-5400
> URL: https://issues.apache.org/jira/browse/HIVE-5400
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.13.0
>
> Attachments: HIVE-5400.patch, HIVE-5400.patch, HIVE-5400.patch
>
>
> From here: 
> https://issues.apache.org/jira/browse/HIVE-5253?focusedCommentId=13782220&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13782220
>  I think we should afford admins who want to disable this functionality the 
> ability to do so. Since such admins might want to disable other commands such 
> as add or dfs, it wouldn't be much trouble to allow them to do this as well. 
> For example we could have a configuration option "hive.available.commands" 
> (or similar) which specified add,set,delete,reset, etc by default. Then check 
> this value in CommandProcessorFactory. It would probably make sense to add 
> this property to the restrict list.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4741) Add Hive config API to modify the restrict list

2013-11-25 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831568#comment-13831568
 ] 

Brock Noland commented on HIVE-4741:


Awesome, thank you Navis!

+1

> Add Hive config API to modify the restrict list
> ---
>
> Key: HIVE-4741
> URL: https://issues.apache.org/jira/browse/HIVE-4741
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.11.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Attachments: HIVE-4714.2.patch.txt, HIVE-4741-1.patch
>
>
> HiveConf supports a restrict list configuration which is a black list of 
> configurations that can be modified through the 'set x=y'. This is especially 
> useful for HiveServer2 to restrict clients from overriding some of server 
> configurations.
> Currently the restrict list value can't be changed after the HiveConf object 
> is instantiated. It would be useful for custom code like Hooks to have a way 
> to be able to update that setting.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-11-25 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831541#comment-13831541
 ] 

Remus Rusanu commented on HIVE-5817:


I have  the implementation of 'vectorization regions" almost done and I know 
how to finish it. Will post a patch tomorrow. 

Using the qualified column names (alias + column) will work probably, I don't 
think a query can have duplicate aliases (not up to par on my ANSI readings one 
can tell...). Thing is that will ripple everywhere in the vectorization 
context, we'll have to modify all the expression builders to use the 
alias.column as a key. If the "region concept" will work, will have a much more 
contained impact.

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5414) The result of show grant is not visible via JDBC

2013-11-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831411#comment-13831411
 ] 

Hive QA commented on HIVE-5414:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12615544/D13209.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4684 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/435/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/435/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12615544

> The result of show grant is not visible via JDBC
> 
>
> Key: HIVE-5414
> URL: https://issues.apache.org/jira/browse/HIVE-5414
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, JDBC
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: D13209.1.patch, D13209.2.patch, D13209.3.patch
>
>
> Currently, show grant / show role grant does not make fetch task, which 
> provides the result schema for jdbc clients.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4741) Add Hive config API to modify the restrict list

2013-11-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831381#comment-13831381
 ] 

Hive QA commented on HIVE-4741:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12615539/HIVE-4714.2.patch.txt

{color:green}SUCCESS:{color} +1 4687 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/433/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/433/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12615539

> Add Hive config API to modify the restrict list
> ---
>
> Key: HIVE-4741
> URL: https://issues.apache.org/jira/browse/HIVE-4741
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.11.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Attachments: HIVE-4714.2.patch.txt, HIVE-4741-1.patch
>
>
> HiveConf supports a restrict list configuration which is a black list of 
> configurations that can be modified through the 'set x=y'. This is especially 
> useful for HiveServer2 to restrict clients from overriding some of server 
> configurations.
> Currently the restrict list value can't be changed after the HiveConf object 
> is instantiated. It would be useful for custom code like Hooks to have a way 
> to be able to update that setting.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5851) Hive query taking a lot of time just to launch map-reduce jobs

2013-11-25 Thread Sreenath S Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831357#comment-13831357
 ] 

Sreenath S Kamath commented on HIVE-5851:
-

normal pig input output format


> Hive query taking a lot of time just to launch map-reduce jobs
> --
>
> Key: HIVE-5851
> URL: https://issues.apache.org/jira/browse/HIVE-5851
> Project: Hive
>  Issue Type: Bug
>Reporter: Sreenath S Kamath
>Priority: Critical
>
> We are using hive for Ad-hoc querying and have a hive table which is 
> partitioned on two fields (date,id).Now for each date there are around 1400 
> ids so on a single day around that many partitions are added.The actual data 
> is residing in s3. now the issue we are facing is suppose we do a select 
> count(*) for a month from the table then it takes quite a long amount of 
> time(approx : 1hrs 52 min) just to launch the map reduce job. when i ran the 
> query in hive verbose mode i can see that its spending this time actually 
> deciding how many number of mappers to spawn(calculating splits). Is there 
> any means by which i can reduce this lag time for the launch of map-reduce 
> job.  this is one of the log messages that is being logged during this lag 
> time  13/11/19 07:11:06 INFO mapred.FileInputFormat: Total input paths to 
> process : 1 13/11/19 07:11:06 WARN httpclient.RestS3Service: Response 
> '/Analyze%2F2013%2F10%2F03%2F465' - Unexpected response code 404, expected 200



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5793) Update hive-default.xml.template for HIVE-4002

2013-11-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831315#comment-13831315
 ] 

Hive QA commented on HIVE-5793:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12615534/HIVE-5793.3.patch.txt

{color:green}SUCCESS:{color} +1 4684 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/432/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/432/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12615534

> Update hive-default.xml.template for HIVE-4002
> --
>
> Key: HIVE-5793
> URL: https://issues.apache.org/jira/browse/HIVE-5793
> Project: Hive
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Attachments: HIVE-5793.1.patch.txt, HIVE-5793.2.patch.txt, 
> HIVE-5793.3.patch.txt
>
>
> Addressing 
> https://issues.apache.org/jira/browse/HIVE-3990?focusedCommentId=13818388&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13818388



--
This message was sent by Atlassian JIRA
(v6.1#6144)


  1   2   >