[jira] [Commented] (HIVE-4256) JDBC2 HiveConnection does not use the specified database

2013-12-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842090#comment-13842090
 ] 

Hive QA commented on HIVE-4256:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12617530/HIVE-4256.2.patch

{color:green}SUCCESS:{color} +1 4460 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/560/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/560/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12617530

> JDBC2 HiveConnection does not use the specified database
> 
>
> Key: HIVE-4256
> URL: https://issues.apache.org/jira/browse/HIVE-4256
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 0.11.0
>Reporter: Chris Drome
>Assignee: Anandha L Ranganathan
> Attachments: HIVE-4256.1.patch, HIVE-4256.2.patch, HIVE-4256.patch
>
>
> HiveConnection ignores the database specified in the connection string when 
> configuring the connection.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-12-06 Thread Prasad Mujumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842068#comment-13842068
 ] 

Prasad Mujumdar commented on HIVE-4395:
---

Attached updated patch. The failed test cascade_dbdrop_hadoop20 passes in my 
setup with the patch.

> Support TFetchOrientation.FIRST for HiveServer2 FetchResults
> 
>
> Key: HIVE-4395
> URL: https://issues.apache.org/jira/browse/HIVE-4395
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC
>Affects Versions: 0.11.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch, HIVE-4395.2.patch, 
> HIVE-4395.3.patch, HIVE-4395.4.patch, HIVE-4395.5.patch
>
>
> Currently HiveServer2 only support fetching next row 
> (TFetchOrientation.NEXT). This ticket is to implement support for 
> TFetchOrientation.FIRST that resets the fetch position at the begining of the 
> resultset. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-12-06 Thread Prasad Mujumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-4395:
--

Attachment: HIVE-4395.5.patch

> Support TFetchOrientation.FIRST for HiveServer2 FetchResults
> 
>
> Key: HIVE-4395
> URL: https://issues.apache.org/jira/browse/HIVE-4395
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC
>Affects Versions: 0.11.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch, HIVE-4395.2.patch, 
> HIVE-4395.3.patch, HIVE-4395.4.patch, HIVE-4395.5.patch
>
>
> Currently HiveServer2 only support fetching next row 
> (TFetchOrientation.NEXT). This ticket is to implement support for 
> TFetchOrientation.FIRST that resets the fetch position at the begining of the 
> resultset. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 16063: HIVE-4395: Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-12-06 Thread Prasad Mujumdar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16063/
---

(Updated Dec. 7, 2013, 4:24 a.m.)


Review request for hive, Brock Noland and Thejas Nair.


Changes
---

Updates per review feedback.


Bugs: HIVE-4395
https://issues.apache.org/jira/browse/HIVE-4395


Repository: hive-git


Description
---

Support fetch-from-start for hiveserver2 fetch operations. 
 - Handle new fetch orientation for various HS2 operations.
 - Added support to reset the read position in Hive driver
 - Enabled scroll cursors with support for positioning cursor to start of 
resultset


Diffs (updated)
-

  itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
1ba8ad3 
  jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java 96bd724 
  jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 812ee56 
  jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java fce19bf 
  ql/src/java/org/apache/hadoop/hive/ql/Context.java ed502a7 
  ql/src/java/org/apache/hadoop/hive/ql/Driver.java 30ef73e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 343f760 
  ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java ce54e0c 
  
service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
 581e69c 
  
service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java 
af87a90 
  
service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
 0fe01c0 
  
service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java 
bafe40c 
  
service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
 2be018e 
  
service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java 
7e8a06b 
  
service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
 2daa9cd 
  
service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
 a1ac55b 
  service/src/java/org/apache/hive/service/cli/operation/Operation.java 6f4b8dc 
  service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
301187d 

Diff: https://reviews.apache.org/r/16063/diff/


Testing
---

Added new testcases to TestJdbcDriver2


Thanks,

Prasad Mujumdar



Re: Review Request 16063: HIVE-4395: Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-12-06 Thread Prasad Mujumdar


> On Dec. 6, 2013, 8:04 p.m., Thejas Nair wrote:
> > itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java, 
> > line 1833
> > 
> >
> > comment not applicable ?

ah, CPT errors. Removed.


> On Dec. 6, 2013, 8:04 p.m., Thejas Nair wrote:
> > itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java, 
> > line 1837
> > 
> >
> > why not re-use execFetchFirst here ?

execFetchFirst was more tightly associated with the SQL query. Changed it to be 
more generic and used in the other tests as well.


> On Dec. 6, 2013, 8:04 p.m., Thejas Nair wrote:
> > itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java, 
> > line 1856
> > 
> >
> > comment not applicable ?

removed


> On Dec. 6, 2013, 8:04 p.m., Thejas Nair wrote:
> > itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java, 
> > line 1884
> > 
> >
> > you can use fail("..") instead of assertTrue here.

Done


> On Dec. 6, 2013, 8:04 p.m., Thejas Nair wrote:
> > itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java, 
> > line 1886
> > 
> >
> > I think the error message would need to be updated in the test case.

yes, Thanks for catching that. It looks like the hive-unit tests is getting 
skipped all together due to some other build changes. Hence I didn't notice the 
failures.

Added SQLState in the error and updated the test to check for SQLState instead 
of messages.


> On Dec. 6, 2013, 8:04 p.m., Thejas Nair wrote:
> > itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java, 
> > line 1913
> > 
> >
> > nit - a trailing white space

Removed


> On Dec. 6, 2013, 8:04 p.m., Thejas Nair wrote:
> > itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java, 
> > line 1918
> > 
> >
> > Thanks for these comments!
> > can you extend the comment to - "@param oneRowOnly - read only one row 
> > from result"
> >

Done


- Prasad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16063/#review29900
---


On Dec. 6, 2013, 5:37 a.m., Prasad Mujumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16063/
> ---
> 
> (Updated Dec. 6, 2013, 5:37 a.m.)
> 
> 
> Review request for hive, Brock Noland and Thejas Nair.
> 
> 
> Bugs: HIVE-4395
> https://issues.apache.org/jira/browse/HIVE-4395
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Support fetch-from-start for hiveserver2 fetch operations. 
>  - Handle new fetch orientation for various HS2 operations.
>  - Added support to reset the read position in Hive driver
>  - Enabled scroll cursors with support for positioning cursor to start of 
> resultset
> 
> 
> Diffs
> -
> 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> 7b1c9da 
>   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java ef39573 
>   jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 812ee56 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java fce19bf 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ed502a7 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 86db406 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 343f760 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java ce54e0c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  581e69c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  af87a90 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  0fe01c0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  bafe40c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  2be018e 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  7e8a06b 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  2daa9cd 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  a1ac55b 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 6f4b8dc 
>   service

Re: Number of tests run via Hive QA has decreased

2013-12-06 Thread Prasad Mujumdar
  The hive-unit is removed from itest/pom.xml as part of HIVE-5755. Are we
skipping all those tests for hadoop-1 as well ?

thanks
Prasad



On Fri, Dec 6, 2013 at 3:55 PM, Ashutosh Chauhan wrote:

> It seems like number of tests run via Hive QA has gone down from ~4650
> tests to ~4450 tests over last 2 weeks.
> Brock, do you know what could be the reason for it? Also, is there a quick
> way to know how many tests are suppose to run?
>
> Ashutosh
>


[jira] [Commented] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842046#comment-13842046
 ] 

Hive QA commented on HIVE-5951:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12617525/HIVE-5951.02.patch

{color:green}SUCCESS:{color} +1 4460 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/559/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/559/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12617525

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.02.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4256) JDBC2 HiveConnection does not use the specified database

2013-12-06 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842016#comment-13842016
 ] 

Thejas M Nair commented on HIVE-4256:
-

+1

> JDBC2 HiveConnection does not use the specified database
> 
>
> Key: HIVE-4256
> URL: https://issues.apache.org/jira/browse/HIVE-4256
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 0.11.0
>Reporter: Chris Drome
>Assignee: Anandha L Ranganathan
> Attachments: HIVE-4256.1.patch, HIVE-4256.2.patch, HIVE-4256.patch
>
>
> HiveConnection ignores the database specified in the connection string when 
> configuring the connection.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-4256) JDBC2 HiveConnection does not use the specified database

2013-12-06 Thread Anandha L Ranganathan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anandha L Ranganathan updated HIVE-4256:


Attachment: HIVE-4256.2.patch

> JDBC2 HiveConnection does not use the specified database
> 
>
> Key: HIVE-4256
> URL: https://issues.apache.org/jira/browse/HIVE-4256
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 0.11.0
>Reporter: Chris Drome
>Assignee: Anandha L Ranganathan
> Attachments: HIVE-4256.1.patch, HIVE-4256.2.patch, HIVE-4256.patch
>
>
> HiveConnection ignores the database specified in the connection string when 
> configuring the connection.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-4256) JDBC2 HiveConnection does not use the specified database

2013-12-06 Thread Anandha L Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842014#comment-13842014
 ] 

Anandha L Ranganathan commented on HIVE-4256:
-

attached the patch with indentation comments

> JDBC2 HiveConnection does not use the specified database
> 
>
> Key: HIVE-4256
> URL: https://issues.apache.org/jira/browse/HIVE-4256
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 0.11.0
>Reporter: Chris Drome
>Assignee: Anandha L Ranganathan
> Attachments: HIVE-4256.1.patch, HIVE-4256.2.patch, HIVE-4256.patch
>
>
> HiveConnection ignores the database specified in the connection string when 
> configuring the connection.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Work started] (HIVE-5679) add date support to metastore JDO/SQL

2013-12-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-5679 started by Sergey Shelukhin.

> add date support to metastore JDO/SQL
> -
>
> Key: HIVE-5679
> URL: https://issues.apache.org/jira/browse/HIVE-5679
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> Metastore supports strings and integral types in filters.
> It could also support dates.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations

2013-12-06 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842007#comment-13842007
 ] 

Jason Dere commented on HIVE-5356:
--

So for users who may be concerned with speeding up queries involving integer 
division (at the cost of using approximate precision types), they may want to 
consider casting values to double prior to division.

> Move arithmatic UDFs to generic UDF implementations
> ---
>
> Key: HIVE-5356
> URL: https://issues.apache.org/jira/browse/HIVE-5356
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.11.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, 
> HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, 
> HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, 
> HIVE-5356.8.patch, HIVE-5356.9.patch
>
>
> Currently, all of the arithmetic operators, such as add/sub/mult/div, are 
> implemented as old-style UDFs and java reflection is used to determine the 
> return type TypeInfos/ObjectInspectors, based on the return type of the 
> evaluate() method chosen for the expression. This works fine for types that 
> don't have type params.
> Hive decimal type participates in these operations just like int or double. 
> Different from double or int, however, decimal has precision and scale, which 
> cannot be determined by just looking at the return type (decimal) of the UDF 
> evaluate() method, even though the operands have certain precision/scale. 
> With the default of "decimal" without precision/scale, then (10, 0) will be 
> the type params. This is certainly not desirable.
> To solve this problem, all of the arithmetic operators would need to be 
> implemented as GenericUDFs, which allow returning ObjectInspector during the 
> initialize() method. The object inspectors returned can carry type params, 
> from which the "exact" return type can be determined.
> It's worth mentioning that, for user UDF implemented in non-generic way, if 
> the return type of the chosen evaluate() method is decimal, the return type 
> actually has (10,0) as precision/scale, which might not be desirable. This 
> needs to be documented.
> This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit 
> the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842006#comment-13842006
 ] 

Ashutosh Chauhan commented on HIVE-5951:


If they are duplicate then its fine not to show them.

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.02.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842003#comment-13842003
 ] 

Sergey Shelukhin commented on HIVE-5951:


that is about not showing in hooks. That is because the new code does not put 
the partitions that were duplicate into output... that is easy to fix, but I 
wonder why they are needed

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.02.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations

2013-12-06 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842001#comment-13842001
 ] 

Jason Dere commented on HIVE-5356:
--

One big effect in changing int / int => decimal is the performance impact, 
since decimal arithmetic is quite a bit slower. Did a test similar to the 
division unit tests, running GenericUDFOPDivide.evaluate() in a loop with both 
double args and Decimal args.  On my laptop, running the loop with decimal 
division was over 50x slower than using double division.

Time in ms, 10M iterations:
double: 260
decimal: 13993

The loop I ran for double is below, I had a similar function for decimal:
{code:java}
  public static long testDivideDouble(double a, double b, int iterations) 
throws HiveException {
GenericUDFOPDivide udf = new GenericUDFOPDivide();

DoubleWritable left = new DoubleWritable(a);
DoubleWritable right = new DoubleWritable(b);
ObjectInspector[] inputOIs = {
PrimitiveObjectInspectorFactory.writableDoubleObjectInspector,
PrimitiveObjectInspectorFactory.writableDoubleObjectInspector
};
DeferredObject[] args = {
new DeferredJavaObject(left),
new DeferredJavaObject(right),
};

PrimitiveObjectInspector oi = (PrimitiveObjectInspector) 
udf.initialize(inputOIs);

long start = System.currentTimeMillis();
for (int idx = 0; idx < iterations; ++idx) {
  doubleResult = (DoubleWritable) udf.evaluate(args);
}
long end = System.currentTimeMillis();
return end - start;
  }
{code}

> Move arithmatic UDFs to generic UDF implementations
> ---
>
> Key: HIVE-5356
> URL: https://issues.apache.org/jira/browse/HIVE-5356
> Project: Hive
>  Issue Type: Task
>  Components: UDF
>Affects Versions: 0.11.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, 
> HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, 
> HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, 
> HIVE-5356.8.patch, HIVE-5356.9.patch
>
>
> Currently, all of the arithmetic operators, such as add/sub/mult/div, are 
> implemented as old-style UDFs and java reflection is used to determine the 
> return type TypeInfos/ObjectInspectors, based on the return type of the 
> evaluate() method chosen for the expression. This works fine for types that 
> don't have type params.
> Hive decimal type participates in these operations just like int or double. 
> Different from double or int, however, decimal has precision and scale, which 
> cannot be determined by just looking at the return type (decimal) of the UDF 
> evaluate() method, even though the operands have certain precision/scale. 
> With the default of "decimal" without precision/scale, then (10, 0) will be 
> the type params. This is certainly not desirable.
> To solve this problem, all of the arithmetic operators would need to be 
> implemented as GenericUDFs, which allow returning ObjectInspector during the 
> initialize() method. The object inspectors returned can carry type params, 
> from which the "exact" return type can be determined.
> It's worth mentioning that, for user UDF implemented in non-generic way, if 
> the return type of the chosen evaluate() method is decimal, the return type 
> actually has (10,0) as precision/scale, which might not be desirable. This 
> needs to be documented.
> This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit 
> the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841998#comment-13841998
 ] 

Ashutosh Chauhan commented on HIVE-5951:


In explain plan or hooks ?

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.02.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841994#comment-13841994
 ] 

Sergey Shelukhin commented on HIVE-5951:


the partitions gone from output are the ones that were not added (already 
exist). Should they also be present in output?

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.02.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5756) Implement vectorization support for IF conditional expression for long, double, timestamp, boolean and string inputs

2013-12-06 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841996#comment-13841996
 ] 

Jitendra Nath Pandey commented on HIVE-5756:


In IfExprStringColumnStringColumn.java :
{code}
   outputColVector.setVal(i, arg2ColVector.vector[i], arg2ColVector.start[i], 
arg2ColVector.length[i]);
{code}

This could run into trouble if the source byte array is null.

Similar problem exists in other string templates.

> Implement vectorization support for IF conditional expression for long, 
> double, timestamp, boolean and string inputs
> 
>
> Key: HIVE-5756
> URL: https://issues.apache.org/jira/browse/HIVE-5756
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Eric Hanson
>Assignee: Eric Hanson
> Attachments: HIVE-5756.1.patch, HIVE-5756.2.patch, HIVE-5756.3.patch, 
> HIVE-5756.4.patch, HIVE-5756.5.patch, HIVE-5756.6.patch.txt, HIVE-5756.7.patch
>
>
> Implement full, end-to-end support for IF in vectorized mode, including new 
> VectorExpression class(es), VectorizationContext translation to a 
> VectorExpression, and unit tests for these, as well as end-to-end ad hoc 
> testing. An end-to-end .q test is recommended but optional.
> This is high priority because IF is the most popular conditional expression.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841992#comment-13841992
 ] 

Ashutosh Chauhan commented on HIVE-5951:


It will be good to show partition spec in explain plan (unless its too 
complicated) instead of . Also for input / output 
hook, that stuff is used for authorization, so it will be good not to loose 
that. 

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.02.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5817) column name to index mapping in VectorizationContext is broken

2013-12-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841981#comment-13841981
 ] 

Sergey Shelukhin commented on HIVE-5817:


It seems like I cannot make a repro for select... even though there are name 
collisions, the mappings are correct. Perhaps when someone finds a bug we can 
solve it :)

> column name to index mapping in VectorizationContext is broken
> --
>
> Key: HIVE-5817
> URL: https://issues.apache.org/jira/browse/HIVE-5817
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 0.13.0
>Reporter: Sergey Shelukhin
>Assignee: Remus Rusanu
>Priority: Critical
> Fix For: 0.13.0
>
> Attachments: HIVE-5817-uniquecols.broken.patch, 
> HIVE-5817.00-broken.patch, HIVE-5817.4.patch, HIVE-5817.5.patch, 
> HIVE-5817.6.patch
>
>
> Columns coming from different operators may have the same internal names 
> ("_colNN"). There exists a query in the form {{select b.cb, a.ca from a JOIN 
> b ON ... JOIN x ON ...;}}  (distilled from a more complex query), which runs 
> ok w/o vectorization. With vectorization, it will run ok for most ca, but for 
> some ca it will fail (or can probably return incorrect results). That is 
> because when building column-to-VRG-index map in VectorizationContext, 
> internal column name for ca that the first map join operator adds to the 
> mapping may be the same as internal name for cb that the 2nd one tries to 
> add. 2nd VMJ doesn't add it (see code in ctor), and when it's time for it to 
> output stuff, it retrieves wrong index from the map by name, and then wrong 
> vector from VRG.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5951:
---

Attachment: HIVE-5951.02.patch

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.02.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5951:
---

Attachment: HIVE-5951.nogen.patch

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.02.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5951:
---

Status: Patch Available  (was: Open)

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.02.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5951:
---

Status: Open  (was: Patch Available)

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.02.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 16074: HIVE-5951 improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16074/
---

(Updated Dec. 7, 2013, 1:18 a.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

See JIRA. RB does not include generated code.


Diffs (updated)
-

  metastore/if/hive_metastore.thrift 43b3907 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
01c2626 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
65406d9 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
cacfa07 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 04d399f 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 27ae3c4 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 57f1e67 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 c0e720f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java d32be59 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 947b65c 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java f4476a9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 321759b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 7443ea4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
e97d948 
  ql/src/java/org/apache/hadoop/hive/ql/plan/AddPartitionDesc.java ff60e44 
  ql/src/test/results/clientnegative/addpart1.q.out b4be19c 
  ql/src/test/results/clientnegative/alter_view_failure4.q.out b218c19 
  ql/src/test/results/clientpositive/add_part_exist.q.out 559cb26 
  ql/src/test/results/clientpositive/add_part_multiple.q.out b2525cf 
  ql/src/test/results/clientpositive/create_view_partitioned.q.out e90ffc7 
  ql/src/test/results/clientpositive/partitions_json.q.out deb7a1f 

Diff: https://reviews.apache.org/r/16074/diff/


Testing
---

running clidriver, some more query results will change


Thanks,

Sergey Shelukhin



[jira] [Commented] (HIVE-3245) UTF encoded data not displayed correctly by Hive driver

2013-12-06 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841965#comment-13841965
 ] 

Szehon Ho commented on HIVE-3245:
-

I created the table as described in the JIRA and ran select * both from beeline 
and my own java program embedding the JDBC driver.  In both instances, the 
Japanese characters displayed correctly:

0: jdbc:hive2://localhost:1> select * from japan_j;
+---++--+
| rnum  |   c1   | ord  |
+---++--+
| 11| (1)インデックス | 36   |
| 12| <5>Switches| 37   |
| 10| 400ranku   | 39   |
| 9 | 666Sink| 40   |
| 14| P-Cabels   | 35   |
| 13| R-Bench| 38   |
| 27| エコー| 34   |
| 26| エチャント  | 24   |
| 25| ガード| 4|
| 28| コート| 3|
| 29| ゴム | 1|
| 41| ざぶと| 2|
| 40| さんしょう  | 6|
| 31| ズボン| 5|
| 30| スワップ   | 41   |
| 37| せっけい   | 42   |
| 36| せんたくざい | 46   |
| 32| ダイエル   | 45   |
| 39| はっぽ| 43   |
| 38| はつ剤| 44   |
| 34| ファイル   | 48   |
| 33| フィルター  | 50   |
| 35| フッコク   | 49   |
| 8 | 「2」計画  | 47   |
| 46| 暗視 | 9|
| 45| 音楽 | 8|
| 47| 音声認識   | 7|
| 44| 記載 | 10   |
| 43| 記録機| 11   |
| 42| 高機能| 15   |
| 50| 国家利益   | 14   |
| 48| 国立公園   | 18   |
| 49| 国立大学   | 22   |
| 7 | ⑤号線路   | 21   |
| 5 | (Ⅰ)番号列 | 23   |
| 1 | 356CAL | 17   |
| 2 | 980Series  | 16   |
| 6 | <ⅸ>Pattern | 20   |
| 3 | PVDF   | 19   |
| 4 | ROMAN-8| 13   |
| 15| アンカー   | 12   |
| 16| エンジン  | 30   |
| 19| カットマシン | 29   |
| 20| カード   | 28   |
| 18| コーラ| 26   |
| 17| ゴールド | 25   |
| 24| サイフ| 27   |
| 21| ツーウィング| 32   |
| 23| フォルダー | 33   |
| 22| マンボ   | 31   |
+---++--+


I tested with the new JDBCDriver (org.apache.hive.jdbc.HiveDriver) against 
HiveServer2.  

The platform running Beeline should be set to utf8 ("echo $LANG"), or any other 
java application using JDBC driver should have be started with utf-8 JVM args 
("java -Dfile.encoding=UTF-8").  That should already be a requirement for 
client's wishing to display utf-8 characters.

The code that Mark Grover mentioned does not apply anymore, as new JDBCDriver 
gets results from HiveServer directly via ThriftString field, and does not do 
another round of serialization/deserialization on client side, where it is said 
the error occurred.  So in my opinion, the issue can be closed for Hive driver.

> UTF encoded data not displayed correctly by Hive driver
> ---
>
> Key: HIVE-3245
> URL: https://issues.apache.org/jira/browse/HIVE-3245
> Proje

[jira] [Commented] (HIVE-5878) Hive standard avg UDAF returns double as the return type for some exact input types

2013-12-06 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841961#comment-13841961
 ] 

Eric Hanson commented on HIVE-5878:
---

Does anybody else want to voice an opinion here?

> Hive standard avg UDAF returns double as the return type for some exact input 
> types
> ---
>
> Key: HIVE-5878
> URL: https://issues.apache.org/jira/browse/HIVE-5878
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5878.1.patch, HIVE-5878.patch
>
>
> For standard, no-partial avg result, hive currently returns double as the 
> result type.
> {code}
> hive> desc test;
> OK
> d int None
> Time taken: 0.051 seconds, Fetched: 1 row(s)
> hive> explain select avg(`d`) from test;  
> ...
>   Reduce Operator Tree:
> Group By Operator
>   aggregations:
> expr: avg(VALUE._col0)
>   bucketGroup: false
>   mode: mergepartial
>   outputColumnNames: _col0
>   Select Operator
> expressions:
>   expr: _col0
>   type: double
> {code}
> However, exact types including integers and decimal should yield exact type. 
> Here is what MySQL does:
> {code}
> mysql> desc test;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | i | int(11)  | YES  | | NULL|   |
> | b | tinyint(1)   | YES  | | NULL|   |
> | d | double   | YES  | | NULL|   |
> | s | varchar(5)   | YES  | | NULL|   |
> | dd| decimal(5,2) | YES  | | NULL|   |
> +---+--+--+-+-+---+
> mysql> create table test62 as select avg(i) from test;
> mysql> desc test62;
> +---+---+--+-+-+---+
> | Field | Type  | Null | Key | Default | Extra |
> +---+---+--+-+-+---+
> | avg(i) | decimal(14,4) | YES  | | NULL|   |
> +---+---+--+-+-+---+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Hive-trunk-h0.21 - Build # 2489 - Still Failing

2013-12-06 Thread Apache Jenkins Server
Changes for Build #2453
[hashutosh] HIVE-5685 : partition column type validation doesn't work in some 
cases (Vikram Dixit via Ashutosh Chauhan)

[hashutosh] HIVE-5788 : select * fails for table after adding new columns using 
rcfile storage format (Szehon Ho via Ashutosh Chauhan)


Changes for Build #2454
[brock] HIVE-5782 - PTest2 should be able to ride out price spikes

[brock] HIVE-5729 - Beeline displays version as  after mavenization (Navis 
via Brock Noland)

[brock] HIVE-5732 - HiveServer2: Duplicated new OperationManager in 
SessionManager (Navis via Brock Noland)

[brock] HIVE-5717 - Generate javadoc and source jars (Szehon Ho via Brock 
Noland)

[hashutosh] HIVE-5791 : TestUseDatabase in hcategory failed to pass when 
illegal filename in /tmp (Jin Jie via Ashutosh Chauhan)


Changes for Build #2455
[hashutosh] HIVE-5813 : Multi-way Left outer join fails in vectorized mode 
(Ashutosh Chauhan via Thejas Nair, Eric Hanson & Remus Rusanu)


Changes for Build #2456
[xuefu] HIVE-5825: Case statement type checking too restrictive for 
parameterized types (Jason via Xuefu)

[xuefu] HIVE-5564: Need to accomodate table decimal columns that were defined 
prior to HIVE-3976 (Reviewed by Brock)


Changes for Build #2457

Changes for Build #2458
[rhbutani] HIVE-5369 Annotate hive operator tree with statistics from metastore 
(Prasanth Jayachandran via Harish Butani)

[hashutosh] HIVE-5809 : incorrect stats in some cases with 
hive.stats.autogather=true (Ashutosh Chauhan via Navis)

[brock] HIVE-5741: Fix binary packaging build eg include hcatalog, resolve pom 
issues (Brock Noland reviewed by Xuefu Zhang)


Changes for Build #2459
[hashutosh] HIVE-5844 : dynamic_partition_skip_default.q test fails on trunk 
(Prasanth J via Ashutosh Chauhan)


Changes for Build #2460
[hashutosh] HIVE-5846 : Analyze command fails with vectorization on (Remus 
Rusanu via Ashutosh Chauhan)

[hashutosh] HIVE-2055 : Hive should add HBase classpath dependencies when 
available (Nick Dimiduk via Ashutosh Chauhan)

[hashutosh] HIVE-4632 : Use hadoop counter as a stat publisher (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-3107 : Improve semantic analyzer to better handle column name 
references in group by/sort by clauses (Harish Butani via Ashutosh Chauhan)


Changes for Build #2461
[xuefu] HIVE-5565: Limit Hive decimal type maximum precision and scale to 38 
(reviewed by Brock)

[brock] HIVE-5842 - Fix issues with new paths to jar in hcatalog (Brock Noland 
reviewed by Prasad Mujumdar)

[xuefu] HIVE-5356: Move arithmatic UDFs to generic UDF implementations 
(reviewed by Brock)


Changes for Build #2462
[hashutosh] HIVE-5692 : Make VectorGroupByOperator parameters configurable 
(Remus Rusanu via Ashutosh Chauhan)

[hashutosh] HIVE-5845 : CTAS failed on vectorized code path (Remus Rusanu via 
Ashutosh Chauhan)

[thejas] HIVE-5635 : WebHCatJTShim23 ignores security/user context (Eugene 
Koifman via Thejas Nair)

[hashutosh] HIVE-5663 : Refactor ORC RecordReader to operate on direct & 
wrapped ByteBuffers (Gopal V via Owen Omalley)


Changes for Build #2463

Changes for Build #2464
[thejas] HIVE-5618 : Hive local task fails to run when run from oozie in a 
secure cluster (Prasad Mujumdar via Thejas Nair)


Changes for Build #2465

Changes for Build #2466
[thejas] HIVE-3815 : hive table rename fails if filesystem cache is disabled 
(Thejas Nair reviewed by Navis)


Changes for Build #2467

Changes for Build #2468
[hashutosh] HIVE-5614 : Subquery support: allow subquery expressions in having 
clause (Harish Butani via Ashutosh Chauhan)


Changes for Build #2469
[xuefu] HIVE-5763: ExprNodeGenericFuncDesc.toString() generating unbalanced 
parenthesises (reviewed by Ashutosh)


Changes for Build #2470

Changes for Build #2471
[rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the 
absence of any column statistics (Prasanth Jayachandran via Harish Butani)

[hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis 
via Ashutosh Chauhan)


Changes for Build #2472
[navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.)

[navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu 
Zhang via Navis)

[navis] HIVE-4518 : Missing file (HiveFatalException)

[navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and 
Jason Dere via Navis)


Changes for Build #2473
[brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad 
Mujumdar, Navis via Brock Noland)


Changes for Build #2474
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column na

Hive-trunk-hadoop2 - Build # 587 - Still Failing

2013-12-06 Thread Apache Jenkins Server
Changes for Build #553
[hashutosh] HIVE-5685 : partition column type validation doesn't work in some 
cases (Vikram Dixit via Ashutosh Chauhan)

[hashutosh] HIVE-5788 : select * fails for table after adding new columns using 
rcfile storage format (Szehon Ho via Ashutosh Chauhan)


Changes for Build #554
[brock] HIVE-5782 - PTest2 should be able to ride out price spikes

[brock] HIVE-5729 - Beeline displays version as  after mavenization (Navis 
via Brock Noland)

[brock] HIVE-5732 - HiveServer2: Duplicated new OperationManager in 
SessionManager (Navis via Brock Noland)

[brock] HIVE-5717 - Generate javadoc and source jars (Szehon Ho via Brock 
Noland)

[hashutosh] HIVE-5791 : TestUseDatabase in hcategory failed to pass when 
illegal filename in /tmp (Jin Jie via Ashutosh Chauhan)


Changes for Build #555
[hashutosh] HIVE-5813 : Multi-way Left outer join fails in vectorized mode 
(Ashutosh Chauhan via Thejas Nair, Eric Hanson & Remus Rusanu)


Changes for Build #556
[xuefu] HIVE-5825: Case statement type checking too restrictive for 
parameterized types (Jason via Xuefu)

[xuefu] HIVE-5564: Need to accomodate table decimal columns that were defined 
prior to HIVE-3976 (Reviewed by Brock)


Changes for Build #557

Changes for Build #558
[rhbutani] HIVE-5369 Annotate hive operator tree with statistics from metastore 
(Prasanth Jayachandran via Harish Butani)

[hashutosh] HIVE-5809 : incorrect stats in some cases with 
hive.stats.autogather=true (Ashutosh Chauhan via Navis)

[brock] HIVE-5741: Fix binary packaging build eg include hcatalog, resolve pom 
issues (Brock Noland reviewed by Xuefu Zhang)


Changes for Build #559
[hashutosh] HIVE-3107 : Improve semantic analyzer to better handle column name 
references in group by/sort by clauses (Harish Butani via Ashutosh Chauhan)

[hashutosh] HIVE-5844 : dynamic_partition_skip_default.q test fails on trunk 
(Prasanth J via Ashutosh Chauhan)


Changes for Build #560
[xuefu] HIVE-5356: Move arithmatic UDFs to generic UDF implementations 
(reviewed by Brock)

[hashutosh] HIVE-5846 : Analyze command fails with vectorization on (Remus 
Rusanu via Ashutosh Chauhan)

[hashutosh] HIVE-2055 : Hive should add HBase classpath dependencies when 
available (Nick Dimiduk via Ashutosh Chauhan)

[hashutosh] HIVE-4632 : Use hadoop counter as a stat publisher (Navis via 
Ashutosh Chauhan)


Changes for Build #561
[hashutosh] HIVE-5845 : CTAS failed on vectorized code path (Remus Rusanu via 
Ashutosh Chauhan)

[thejas] HIVE-5635 : WebHCatJTShim23 ignores security/user context (Eugene 
Koifman via Thejas Nair)

[hashutosh] HIVE-5663 : Refactor ORC RecordReader to operate on direct & 
wrapped ByteBuffers (Gopal V via Owen Omalley)

[xuefu] HIVE-5565: Limit Hive decimal type maximum precision and scale to 38 
(reviewed by Brock)

[brock] HIVE-5842 - Fix issues with new paths to jar in hcatalog (Brock Noland 
reviewed by Prasad Mujumdar)


Changes for Build #562
[hashutosh] HIVE-5692 : Make VectorGroupByOperator parameters configurable 
(Remus Rusanu via Ashutosh Chauhan)


Changes for Build #563
[thejas] HIVE-5618 : Hive local task fails to run when run from oozie in a 
secure cluster (Prasad Mujumdar via Thejas Nair)


Changes for Build #564

Changes for Build #565
[thejas] HIVE-3815 : hive table rename fails if filesystem cache is disabled 
(Thejas Nair reviewed by Navis)


Changes for Build #566

Changes for Build #567
[hashutosh] HIVE-5614 : Subquery support: allow subquery expressions in having 
clause (Harish Butani via Ashutosh Chauhan)


Changes for Build #568
[xuefu] HIVE-5763: ExprNodeGenericFuncDesc.toString() generating unbalanced 
parenthesises (reviewed by Ashutosh)


Changes for Build #569

Changes for Build #570
[rhbutani] HIVE-5849 Improve the stats of operators based on heuristics in the 
absence of any column statistics (Prasanth Jayachandran via Harish Butani)

[hashutosh] HIVE-5793 : Update hive-default.xml.template for HIVE4002 (Navis 
via Ashutosh Chauhan)


Changes for Build #571
[navis] HIVE-4518 : Should be removed files (OptrStatsGroupByHook, etc.)

[navis] HIVE-5839 : BytesRefArrayWritable compareTo violates contract (Xuefu 
Zhang via Navis)

[navis] HIVE-4518 : Missing file (HiveFatalException)

[navis] HIVE-4518 : Counter Strike: Operation Operator (Gunther Hagleitner and 
Jason Dere via Navis)


Changes for Build #572
[brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad 
Mujumdar, Navis via Brock Noland)


Changes for Build #573
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is

[jira] [Commented] (HIVE-5978) Rollups not supported in vector mode.

2013-12-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841939#comment-13841939
 ] 

Hive QA commented on HIVE-5978:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12617500/HIVE-5978.1.patch

{color:green}SUCCESS:{color} +1 4460 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/558/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/558/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12617500

> Rollups not supported in vector mode.
> -
>
> Key: HIVE-5978
> URL: https://issues.apache.org/jira/browse/HIVE-5978
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5978.1.patch
>
>
> Rollups are not supported in vector mode, the query should fail to vectorize. 
> A separate jira will be filed to implement rollups in vector mode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 16063: HIVE-4395: Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-12-06 Thread Prasad Mujumdar


> On Dec. 6, 2013, 6:45 p.m., Carl Steinbach wrote:
> > service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java,
> >  line 197
> > 
> >
> > Since this EnumSet if constant I think we should make it a static final 
> > class variable.
> > 
> > Also, formatting.

right.
moved it to Operations class and added a another wrapper API to verify against 
the default fetch orientation.


> On Dec. 6, 2013, 6:45 p.m., Carl Steinbach wrote:
> > service/src/java/org/apache/hive/service/cli/operation/Operation.java, line 
> > 138
> > 
> >
> > Use supportedOrientations.contains(orientation) instead of manually 
> > iterating through the set. Also, should this method be protected instead of 
> > public?

Updated.


- Prasad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16063/#review29893
---


On Dec. 6, 2013, 5:37 a.m., Prasad Mujumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16063/
> ---
> 
> (Updated Dec. 6, 2013, 5:37 a.m.)
> 
> 
> Review request for hive, Brock Noland and Thejas Nair.
> 
> 
> Bugs: HIVE-4395
> https://issues.apache.org/jira/browse/HIVE-4395
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Support fetch-from-start for hiveserver2 fetch operations. 
>  - Handle new fetch orientation for various HS2 operations.
>  - Added support to reset the read position in Hive driver
>  - Enabled scroll cursors with support for positioning cursor to start of 
> resultset
> 
> 
> Diffs
> -
> 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> 7b1c9da 
>   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java ef39573 
>   jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 812ee56 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java fce19bf 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ed502a7 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 86db406 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 343f760 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java ce54e0c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  581e69c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  af87a90 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  0fe01c0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  bafe40c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  2be018e 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  7e8a06b 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  2daa9cd 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  a1ac55b 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 6f4b8dc 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> 301187d 
> 
> Diff: https://reviews.apache.org/r/16063/diff/
> 
> 
> Testing
> ---
> 
> Added new testcases to TestJdbcDriver2
> 
> 
> Thanks,
> 
> Prasad Mujumdar
> 
>



Number of tests run via Hive QA has decreased

2013-12-06 Thread Ashutosh Chauhan
It seems like number of tests run via Hive QA has gone down from ~4650
tests to ~4450 tests over last 2 weeks.
Brock, do you know what could be the reason for it? Also, is there a quick
way to know how many tests are suppose to run?

Ashutosh


[jira] [Commented] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841910#comment-13841910
 ] 

Sergey Shelukhin commented on HIVE-5951:


no, this is only for static partitions for now

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841912#comment-13841912
 ] 

Sergey Shelukhin commented on HIVE-5951:


I am looking at neg tests, forgot to run those

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5824) Support generation of html test reports in maven

2013-12-06 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841891#comment-13841891
 ] 

Prasanth J commented on HIVE-5824:
--

Ping!

> Support generation of html test reports in maven
> 
>
> Key: HIVE-5824
> URL: https://issues.apache.org/jira/browse/HIVE-5824
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Minor
>  Labels: build, maven, test
> Attachments: HIVE-5824.2.patch.txt, HIVE-5824.patch.txt
>
>
> {code}ant testreport{code} generated output of test results in html format. 
> It is useful to support the same in maven. The default test report generated 
> by maven is in XML format which is hard to read.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841888#comment-13841888
 ] 

Ashutosh Chauhan commented on HIVE-5951:


Does this handle the case of insert query which dynamically creates partition 
and load partitions in metastore at the end of MR job. That code-path is in 
MoveTask::execute() -> Hive.loadDynamicPartitions() ? That path is likely to be 
more relevant for this problem instead of from a ddl command someone trying to 
add 100s of partitions?

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5941) SQL std auth - support 'show all roles'

2013-12-06 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841880#comment-13841880
 ] 

Thejas M Nair commented on HIVE-5941:
-

[~navis] I believe your internal patch "NHIVE-22 implement show roles" must be 
doing the same as what is described in this jira. Will you able able to 
contribute that ?



> SQL std auth - support 'show all roles'
> ---
>
> Key: HIVE-5941
> URL: https://issues.apache.org/jira/browse/HIVE-5941
> Project: Hive
>  Issue Type: Sub-task
>  Components: Authorization
>Reporter: Thejas M Nair
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> SHOW ALL ROLES - This will list all
> currently existing roles. This will be available only to the superuser.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5978) Rollups not supported in vector mode.

2013-12-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841881#comment-13841881
 ] 

Ashutosh Chauhan commented on HIVE-5978:


+1

> Rollups not supported in vector mode.
> -
>
> Key: HIVE-5978
> URL: https://issues.apache.org/jira/browse/HIVE-5978
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5978.1.patch
>
>
> Rollups are not supported in vector mode, the query should fail to vectorize. 
> A separate jira will be filed to implement rollups in vector mode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (HIVE-5954) SQL std auth - get_privilege_set should check role hierarchy

2013-12-06 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841879#comment-13841879
 ] 

Thejas M Nair edited comment on HIVE-5954 at 12/6/13 11:24 PM:
---

[~navis] I believe your internal patch "NHIVE-26 Indirect roles are not 
reflected in authorization" must be doing the same as what is described in this 
jira. Will you able able to contribute that ?



was (Author: thejas):
~navis] I believe your internal patch "NHIVE-26 Indirect roles are not 
reflected in authorization" must be doing the same as what is described in this 
jira. Will you able able to contribute that ?


> SQL std auth - get_privilege_set should check role hierarchy
> 
>
> Key: HIVE-5954
> URL: https://issues.apache.org/jira/browse/HIVE-5954
> Project: Hive
>  Issue Type: Sub-task
>  Components: Authorization
>Reporter: Thejas M Nair
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A role can belong to another role. But get_privilege_set in hive metastore 
> api checks only the privileges of the immediate roles a user belongs to.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5954) SQL std auth - get_privilege_set should check role hierarchy

2013-12-06 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841879#comment-13841879
 ] 

Thejas M Nair commented on HIVE-5954:
-

~navis] I believe your internal patch "NHIVE-26 Indirect roles are not 
reflected in authorization" must be doing the same as what is described in this 
jira. Will you able able to contribute that ?


> SQL std auth - get_privilege_set should check role hierarchy
> 
>
> Key: HIVE-5954
> URL: https://issues.apache.org/jira/browse/HIVE-5954
> Project: Hive
>  Issue Type: Sub-task
>  Components: Authorization
>Reporter: Thejas M Nair
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A role can belong to another role. But get_privilege_set in hive metastore 
> api checks only the privileges of the immediate roles a user belongs to.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5830) SubQuery: Not In subqueries should check if subquery contains nulls in matching column

2013-12-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841877#comment-13841877
 ] 

Ashutosh Chauhan commented on HIVE-5830:


Patch looks good. Can you add comments on what kind of query form we are 
generating through all new static methods.

Also cross join of subq with count * will be expensive. As an optimization we 
should convert such kind of joins where other side is an output of a function 
always into a map-join. This will also give us an opportunity in this case to 
pack subsequent LOJ into same MR task. 
Though, this optimization is independent of this work and would be generally 
useful outside of this use-case. Can you create a jira for that one?

> SubQuery: Not In subqueries should check if subquery contains nulls in 
> matching column
> --
>
> Key: HIVE-5830
> URL: https://issues.apache.org/jira/browse/HIVE-5830
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.13.0
>
> Attachments: HIVE-5830.2.patch
>
>
> As pointed out by [~snarayanan]  in HIVE-784 for not in when there are nulls 
> in the SubQuery's matching column the Outer Query should return no rows.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5931) SQL std auth - add metastore get_role_participants api - to support DESCRIBE ROLE

2013-12-06 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841874#comment-13841874
 ] 

Thejas M Nair commented on HIVE-5931:
-

[~navis] I believe your internal patch "NHIVE-31 Add API for retrieving 
principals endowed with the specific role" must be doing the same as what is 
described in this jira. Will you able able to contribute that ?


> SQL std auth - add metastore get_role_participants api - to support DESCRIBE 
> ROLE
> -
>
> Key: HIVE-5931
> URL: https://issues.apache.org/jira/browse/HIVE-5931
> Project: Hive
>  Issue Type: Sub-task
>  Components: Authorization
>Reporter: Thejas M Nair
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> This is necessary for DESCRIBE ROLE role statement. This will list
> all users and roles that participate in a role. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 15898: HIVE-5830: SubQuery: Not In subqueries should check if subquery contains nulls in matching column

2013-12-06 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15898/#review29932
---



ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java


It will be good to add a comment for all these static methods to tell what 
kind of query form each of these methods are generating like:
select ... from T1 LOJ T2 .. 
and so on.. 


- Ashutosh Chauhan


On Dec. 2, 2013, 6:58 p.m., Harish Butani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15898/
> ---
> 
> (Updated Dec. 2, 2013, 6:58 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: hive-5830
> https://issues.apache.org/jira/browse/hive-5830
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> As pointed out by Sivaramakrishnan Narayanan in HIVE-784 for not in when 
> there are nulls in the SubQuery's matching column the Outer Query should 
> return no rows.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/QBSubQuery.java 753df79a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java e9d9ee7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SubQueryUtils.java ddc096d 
>   ql/src/test/queries/clientpositive/subquery_notin.q d7eca3e 
>   ql/src/test/results/clientpositive/subquery_multiinsert.q.out a917a13 
>   ql/src/test/results/clientpositive/subquery_notin.q.out bf87e3b 
>   ql/src/test/results/clientpositive/subquery_notin_having.q.out f9598c2 
> 
> Diff: https://reviews.apache.org/r/15898/diff/
> 
> 
> Testing
> ---
> 
> ran all subquery tests.
> added new test for notin with null values.
> 
> 
> Thanks,
> 
> Harish Butani
> 
>



[jira] [Updated] (HIVE-5979) Failure in cast to timestamps.

2013-12-06 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5979:
---

Description: 
Query ran:
{code}
select cast(t as timestamp), cast(si as timestamp),
   cast(i as timestamp), cast(b as timestamp),
   cast(f as string), cast(d as timestamp),
   cast(bo as timestamp), cast(b * 0 as timestamp),
   cast(ts as timestamp), cast(s as timestamp),
   cast(substr(s, 1, 1) as timestamp)
from Table1;
{code}
Running this query with hive.vectorized.execution.enabled=true fails with the 
following exception:
{noformat}
13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
at 
org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at 
org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
... 8 more
Caused by: java.lang.IllegalArgumentException: nanos > 9 or < 0
at java.sql.Timestamp.setNanos(Timestamp.java:383)
at 
org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
... 9 more
{noformat}
Full log is attached.
Schema for the table is as follows:
{code}
hive> desc Table1;
OK
t   tinyint from deserializer
si  smallintfrom deserializer
i   int from deserializer
b   bigint  from deserializer
f   float   from deserializer
d   double  from deserializer
bo  boolean from deserializer
s   string  from deserializer
s2  string  from deserializer
ts  timestamp   from deserializer
Time taken: 0.521 seconds, Fetched: 10 row(s)
{code}

  was:
Query ran:
{code}
select cast(t as timestamp), cast(si as timestamp),
   cast(i as timestamp), cast(b as timestamp),
   cast(f as string), cast(d as timestamp),
   cast(bo as timestamp), cast(b * 0 as timestamp),
   cast(ts as timestamp), cast(s as timestamp),
   cast(substr(s, 1, 1) as timestamp)
from vectortab10korc;
{code}
Running this query with hive.vectorized.execution.enabled=true fails with the 
following exception:
{noformat}
13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
diagnostics=[Task failed, taskId=task_13

[jira] [Created] (HIVE-5979) Failure in cast to timestamps.

2013-12-06 Thread Jitendra Nath Pandey (JIRA)
Jitendra Nath Pandey created HIVE-5979:
--

 Summary: Failure in cast to timestamps.
 Key: HIVE-5979
 URL: https://issues.apache.org/jira/browse/HIVE-5979
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey


Query ran:
{code}
select cast(t as timestamp), cast(si as timestamp),
   cast(i as timestamp), cast(b as timestamp),
   cast(f as string), cast(d as timestamp),
   cast(bo as timestamp), cast(b * 0 as timestamp),
   cast(ts as timestamp), cast(s as timestamp),
   cast(substr(s, 1, 1) as timestamp)
from vectortab10korc;
{code}
Running this query with hive.vectorized.execution.enabled=true fails with the 
following exception:
{noformat}
13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, 
diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, 
diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: 
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error while processing row
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201)
at 
org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at 
org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193)
... 8 more
Caused by: java.lang.IllegalArgumentException: nanos > 9 or < 0
at java.sql.Timestamp.setNanos(Timestamp.java:383)
at 
org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412)
at 
org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43)
... 9 more
{noformat}
Full log is attached.
Schema for the table is as follows:
{code}
hive> desc vectortab10korc;
OK
t   tinyint from deserializer
si  smallintfrom deserializer
i   int from deserializer
b   bigint  from deserializer
f   float   from deserializer
d   double  from deserializer
bo  boolean from deserializer
s   string  from deserializer
s2  string  from deserializer
ts  timestamp   from deserializer
Time taken: 0.521 seconds, Fetched: 10 row(s)
{code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841855#comment-13841855
 ] 

Carl Steinbach commented on HIVE-5783:
--

[~jcoffey] Would you and your coworkers be willing to consider the option of 
committing the SerDe code directly to Hive instead of having Hive depend on a 
third-party JAR? I appreciate that this will make it a little less convenient 
for you to push in changes. However, I think there are two big drawbacks to the 
third-party JAR approach: 1) existing Hive contributors will be much less 
likely contribute improvements to this code since it lives in a different 
repository, and 2) Hive won't be able to benefit from parquet-serde 
improvements until they appear in a new parquet-serde release.

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-5783.patch, hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-06 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841854#comment-13841854
 ] 

Thejas M Nair edited comment on HIVE-2093 at 12/6/13 11:10 PM:
---

btw, thanks for these changes to improve the authorization codebase!



was (Author: thejas):
btw, thanks for helping improve the authorization codebase!


> create/drop database should populate inputs/outputs and check concurrency and 
> user permission
> -
>
> Key: HIVE-2093
> URL: https://issues.apache.org/jira/browse/HIVE-2093
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Locking, Metastore, Security
>Reporter: Namit Jain
>Assignee: Navis
> Attachments: D12807.3.patch, HIVE-2093.6.patch, 
> HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.D12807.1.patch, 
> HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, 
> HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch
>
>
> concurrency and authorization are needed for create/drop table. Also to make 
> concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
> DATABASE



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-06 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841854#comment-13841854
 ] 

Thejas M Nair commented on HIVE-2093:
-

btw, thanks for helping improve the authorization codebase!


> create/drop database should populate inputs/outputs and check concurrency and 
> user permission
> -
>
> Key: HIVE-2093
> URL: https://issues.apache.org/jira/browse/HIVE-2093
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Locking, Metastore, Security
>Reporter: Namit Jain
>Assignee: Navis
> Attachments: D12807.3.patch, HIVE-2093.6.patch, 
> HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.D12807.1.patch, 
> HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, 
> HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch
>
>
> concurrency and authorization are needed for create/drop table. Also to make 
> concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
> DATABASE



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5978) Rollups not supported in vector mode.

2013-12-06 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5978:
---

Attachment: HIVE-5978.1.patch

> Rollups not supported in vector mode.
> -
>
> Key: HIVE-5978
> URL: https://issues.apache.org/jira/browse/HIVE-5978
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5978.1.patch
>
>
> Rollups are not supported in vector mode, the query should fail to vectorize. 
> A separate jira will be filed to implement rollups in vector mode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5978) Rollups not supported in vector mode.

2013-12-06 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5978:
---

Status: Patch Available  (was: Open)

> Rollups not supported in vector mode.
> -
>
> Key: HIVE-5978
> URL: https://issues.apache.org/jira/browse/HIVE-5978
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5978.1.patch
>
>
> Rollups are not supported in vector mode, the query should fail to vectorize. 
> A separate jira will be filed to implement rollups in vector mode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-06 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841841#comment-13841841
 ] 

Thejas M Nair commented on HIVE-2093:
-

I have added some comments on reviewboard.
Is there a way to grant these permissions for the database ? Specifically the 
global create permission needed for creating databases. Is that a follow-up 
work ?


> create/drop database should populate inputs/outputs and check concurrency and 
> user permission
> -
>
> Key: HIVE-2093
> URL: https://issues.apache.org/jira/browse/HIVE-2093
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Locking, Metastore, Security
>Reporter: Namit Jain
>Assignee: Navis
> Attachments: D12807.3.patch, HIVE-2093.6.patch, 
> HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.D12807.1.patch, 
> HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, 
> HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch
>
>
> concurrency and authorization are needed for create/drop table. Also to make 
> concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
> DATABASE



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission

2013-12-06 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841839#comment-13841839
 ] 

Phabricator commented on HIVE-2093:
---

thejas has commented on the revision "HIVE-2093 [jira] create/drop database 
should populate inputs/outputs and check concurrency and user permission".

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java:257 why not return 
the location uri here ?
  ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java:1233 
what qnName means becomes clear only after reading the code, can you expand the 
variable name or add a javadoc comment ?
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:2290 can 
you update comment to say "SHOW LOCKS DATABASE [database] [extended]"
  ql/src/java/org/apache/hadoop/hive/ql/parse/ExportSemanticAnalyzer.java:104 
you can use - new Path(toURI)
  Its there since hadoop 0.20.2
  ql/src/java/org/apache/hadoop/hive/ql/hooks/Entity.java:83 why is this block 
of changes needed ?
  It does not seem to be used anyway. I think the separation between entity and 
privileges is a good thing.

REVISION DETAIL
  https://reviews.facebook.net/D12807

To: JIRA, navis
Cc: thejas


> create/drop database should populate inputs/outputs and check concurrency and 
> user permission
> -
>
> Key: HIVE-2093
> URL: https://issues.apache.org/jira/browse/HIVE-2093
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Locking, Metastore, Security
>Reporter: Namit Jain
>Assignee: Navis
> Attachments: D12807.3.patch, HIVE-2093.6.patch, 
> HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.D12807.1.patch, 
> HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, 
> HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch
>
>
> concurrency and authorization are needed for create/drop table. Also to make 
> concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS 
> DATABASE



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5978) Rollups not supported in vector mode.

2013-12-06 Thread Jitendra Nath Pandey (JIRA)
Jitendra Nath Pandey created HIVE-5978:
--

 Summary: Rollups not supported in vector mode.
 Key: HIVE-5978
 URL: https://issues.apache.org/jira/browse/HIVE-5978
 Project: Hive
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey


Rollups are not supported in vector mode, the query should fail to vectorize. A 
separate jira will be filed to implement rollups in vector mode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5978) Rollups not supported in vector mode.

2013-12-06 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5978:
---

Issue Type: Sub-task  (was: Bug)
Parent: HIVE-4160

> Rollups not supported in vector mode.
> -
>
> Key: HIVE-5978
> URL: https://issues.apache.org/jira/browse/HIVE-5978
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
>
> Rollups are not supported in vector mode, the query should fail to vectorize. 
> A separate jira will be filed to implement rollups in vector mode.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5783:
--

Status: Open  (was: Patch Available)

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-5783.patch, hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841822#comment-13841822
 ] 

Hive QA commented on HIVE-5783:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12617485/HIVE-5783.patch

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/557/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/557/console

Messages:
{noformat}
 This message was trimmed, see log for full details 
Decision can match input such as "KW_ORDER KW_BY LPAREN" using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:121:5: 
Decision can match input such as "KW_CLUSTER KW_BY LPAREN" using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:133:5: 
Decision can match input such as "KW_PARTITION KW_BY LPAREN" using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:144:5: 
Decision can match input such as "KW_DISTRIBUTE KW_BY LPAREN" using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:155:5: 
Decision can match input such as "KW_SORT KW_BY LPAREN" using multiple 
alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:172:7: 
Decision can match input such as "STAR" using multiple alternatives: 1, 2

As a result, alternative(s) 2 were disabled for that input
warning(200): IdentifiersParser.g:185:5: 
Decision can match input such as "KW_UNIONTYPE" using multiple alternatives: 5, 
6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:185:5: 
Decision can match input such as "KW_STRUCT" using multiple alternatives: 4, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:185:5: 
Decision can match input such as "KW_ARRAY" using multiple alternatives: 2, 6

As a result, alternative(s) 6 were disabled for that input
warning(200): IdentifiersParser.g:267:5: 
Decision can match input such as "KW_DATE StringLiteral" using multiple 
alternatives: 2, 3

As a result, alternative(s) 3 were disabled for that input
warning(200): IdentifiersParser.g:267:5: 
Decision can match input such as "KW_NULL" using multiple alternatives: 1, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:267:5: 
Decision can match input such as "KW_FALSE" using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:267:5: 
Decision can match input such as "KW_TRUE" using multiple alternatives: 3, 8

As a result, alternative(s) 8 were disabled for that input
warning(200): IdentifiersParser.g:399:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_ORDER 
KW_BY" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:399:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_LATERAL 
KW_VIEW" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:399:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT 
KW_INTO" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:399:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_DISTRIBUTE 
KW_BY" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:399:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_INSERT 
KW_OVERWRITE" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:399:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_SORT KW_BY" 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:399:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_MAP LPAREN" 
using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:399:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_CLUSTER 
KW_BY" using multiple alternatives: 2, 9

As a result, alternative(s) 9 were disabled for that input
warning(200): IdentifiersParser.g:399:5: 
Decision can match input such as "{KW_LIKE, KW_REGEXP, KW_RLIKE} KW_GROUP 
KW_BY" using multiple alternatives: 2, 9

As a

[jira] [Commented] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841818#comment-13841818
 ] 

Hive QA commented on HIVE-5951:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12617474/HIVE-5951.01.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 4460 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_addpart1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_view_failure4
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_archive5
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_default_partition_name
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_join28
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_join32
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partialscan_autogether
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_partscan_norcfile
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/556/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/556/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12617474

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5911) Recent change to schema upgrade scripts breaks file naming conventions

2013-12-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841791#comment-13841791
 ] 

Sergey Shelukhin commented on HIVE-5911:


ping? note I cannot commit :)

> Recent change to schema upgrade scripts breaks file naming conventions
> --
>
> Key: HIVE-5911
> URL: https://issues.apache.org/jira/browse/HIVE-5911
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Carl Steinbach
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5911.01.patch, HIVE-5911.patch
>
>
> The changes made in HIVE-5700 break the convention for naming schema upgrade 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-5783:
--

Attachment: HIVE-5783.patch

Patch HIVE-5783.patch was same as original but rebased with trunk. Patch 
doesn't build pending pom file changes.

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-5783.patch, hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5977) Ability to selectively enable/disable WebHCat REST API components

2013-12-06 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841772#comment-13841772
 ] 

Carl Steinbach commented on HIVE-5977:
--

The RPCs in the [WebHCat REST 
API|https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference] can be 
divided divided into the following categories:
# General (version, status, etc)
# DDL
# MapReduce job submission
# Pig job submission
# Hive job submission
# Queue (job management, deprecated)
# Jobs (job management)

We should provide administrators with the ability to selectively disable 
categories 2-7 by setting properties in the WebHCat configuration file. By 
default all categories will be enabled.


> Ability to selectively enable/disable WebHCat REST API components
> -
>
> Key: HIVE-5977
> URL: https://issues.apache.org/jira/browse/HIVE-5977
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5977) Ability to selectively enable/disable WebHCat REST API components

2013-12-06 Thread Carl Steinbach (JIRA)
Carl Steinbach created HIVE-5977:


 Summary: Ability to selectively enable/disable WebHCat REST API 
components
 Key: HIVE-5977
 URL: https://issues.apache.org/jira/browse/HIVE-5977
 Project: Hive
  Issue Type: Bug
  Components: WebHCat
Reporter: Carl Steinbach
Assignee: Carl Steinbach






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5970) ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java

2013-12-06 Thread Aleksei (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841743#comment-13841743
 ] 

Aleksei commented on HIVE-5970:
---

My findings show that there is a problem in run length encoding.
You can reproduce the problem by doing the following steps:
1. Create the table:
{code:sql}
CREATE TABLE test_orc_format(
  site STRING,
  a DOUBLE,
  b BIGINT,
  c BIGINT,
  d BIGINT,
  e DOUBLE,
  f DOUBLE,
  g DOUBLE,
  h DOUBLE,
  i DOUBLE,
  j DOUBLE,
  k BIGINT,
  l BIGINT,
  m BIGINT,
  n BIGINT,
  o BIGINT,
  p BIGINT,
  q ARRAY,
  r ARRAY
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
STORED AS ORC
;
{code}
2. Load the data from attached file.
{code:sql}
load data local inpath 'test_data' overwrite into test_orc_format;
{code}
3. Use one of the following queries:
{code:sql}
select * from test_orc_format;
select o from test_orc_format;
{code}

Note, the attached file was created by hive during a job execution and not 
crafted by hands, it might be wrongly encoded as well. Also, note that the 
query that does calculation for column "o" cannot give negative results.

> ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java
> ---
>
> Key: HIVE-5970
> URL: https://issues.apache.org/jira/browse/HIVE-5970
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Eric Chu
>Priority: Critical
>  Labels: orcfile
> Attachments: test_data
>
>
> A workload involving ORC tables starts getting the following 
> ArrayIndexOutOfBoundsException AFTER the upgrade to Hive 0.12. The file is 
> added as part of HIVE-4123. 
> 2013-12-04 14:42:08,537 ERROR 
> cause:java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> 2013-12-04 14:42:08,537 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
> ... 11 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:171)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:54)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:287)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(Recor

[jira] [Updated] (HIVE-5970) ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java

2013-12-06 Thread Aleksei (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksei updated HIVE-5970:
--

Attachment: test_data

> ArrayIndexOutOfBoundsException in RunLengthIntegerReaderV2.java
> ---
>
> Key: HIVE-5970
> URL: https://issues.apache.org/jira/browse/HIVE-5970
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.12.0
>Reporter: Eric Chu
>Priority: Critical
>  Labels: orcfile
> Attachments: test_data
>
>
> A workload involving ORC tables starts getting the following 
> ArrayIndexOutOfBoundsException AFTER the upgrade to Hive 0.12. The file is 
> added as part of HIVE-4123. 
> 2013-12-04 14:42:08,537 ERROR 
> cause:java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> 2013-12-04 14:42:08,537 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> java.io.IOException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:215)
> at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:200)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
> ... 11 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readPatchedBaseValues(RunLengthIntegerReaderV2.java:171)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:54)
> at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:287)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$LongTreeReader.next(RecordReaderImpl.java:473)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl$StructTreeReader.next(RecordReaderImpl.java:1157)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.next(RecordReaderImpl.java:2196)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:129)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:80)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:274)
> ... 15 more



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Assigned] (HIVE-3245) UTF encoded data not displayed correctly by Hive driver

2013-12-06 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho reassigned HIVE-3245:
---

Assignee: Szehon Ho

> UTF encoded data not displayed correctly by Hive driver
> ---
>
> Key: HIVE-3245
> URL: https://issues.apache.org/jira/browse/HIVE-3245
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.8.0
>Reporter: N Campbell
>Assignee: Szehon Ho
> Attachments: ASF.LICENSE.NOT.GRANTED--screenshot-1.jpg, CERT.TLJA.txt
>
>
> various foreign language data (i.e. japanese, thai etc) is loaded into string 
> columns via tab delimited text files. A simple projection of the columns in 
> the table is not displaying the correct data. Exporting the data from Hive 
> and looking at the files implies the data is loaded properly. it appears to 
> be an encoding issue at the driver but unaware of any required URL connection 
> properties re encoding that Hive JDBC requires.
> create table if not exists CERT.TLJA_JP_E ( RNUM int , C1 string, ORD int)
> row format delimited
> fields terminated by '\t'
> stored as textfile;
> create table if not exists CERT.TLJA_JP ( RNUM int , C1 string, ORD int)
> stored as sequencefile;
> load data local inpath '/home/hadoopadmin/jdbc-cert/CERT/CERT.TLJA_JP.txt'
> overwrite into table CERT.TLJA_JP_E;
> insert overwrite table CERT.TLJA_JP  select * from CERT.TLJA_JP_E;



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HIVE-5238) "ant testreport" does not include any storage-handlers/hbase unit test results

2013-12-06 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-5238.
--

Resolution: Not A Problem

> "ant testreport" does not include any storage-handlers/hbase unit test results
> --
>
> Key: HIVE-5238
> URL: https://issues.apache.org/jira/browse/HIVE-5238
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> hcatalog/build-support/ant/test.xml defines _junit macro which copies 
> TEST-*.xml to hive/build so that "ant testreport" at hive root includes test 
> results in the html page.  All hcatalog modules except storage-handlers use 
> this.
> Need to fix this so that all test results are clearly visible.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3181) getDatabaseMajor/Minor version does not return values

2013-12-06 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841718#comment-13841718
 ] 

Szehon Ho commented on HIVE-3181:
-

Thanks for the commit!

> getDatabaseMajor/Minor version does not return values
> -
>
> Key: HIVE-3181
> URL: https://issues.apache.org/jira/browse/HIVE-3181
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 0.8.1
>Reporter: N Campbell
>Assignee: Szehon Ho
> Fix For: 0.13.0
>
> Attachments: HIVE-3181.2.patch, HIVE-3181.patch
>
>
> This is really a sub-issue of HIVE-3174 (which is a lot of properties) but 
> given that the driver will return databaseProductVersion it makes no sense to 
> not have implemented these as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5951:
---

Status: Patch Available  (was: Open)

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5951) improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-5951:
---

Attachment: HIVE-5951.01.patch
HIVE-5951.nogen.patch

updated patch, I think tests should pass although for some tests I got 
different results on my and other machine

> improve performance of adding partitions from client
> 
>
> Key: HIVE-5951
> URL: https://issues.apache.org/jira/browse/HIVE-5951
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-5951.01.patch, HIVE-5951.nogen.patch, 
> HIVE-5951.nogen.patch, HIVE-5951.patch
>
>
> Adding partitions to metastore is currently very inefficient. There are small 
> things like, for !ifNotExists case, DDLSemanticAnalyzer gets the full 
> partition object for every spec (which is a network call to metastore), and 
> then discards it instantly; there's also general problem that too much 
> processing is done on client side. DDLSA should analyze the query and make 
> one call to metastore (or maybe a set of batched  calls if there are too many 
> partitions in the command), metastore should then figure out stuff and insert 
> in batch.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 16074: HIVE-5951 improve performance of adding partitions from client

2013-12-06 Thread Sergey Shelukhin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16074/
---

(Updated Dec. 6, 2013, 9:16 p.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

See JIRA. RB does not include generated code.


Diffs (updated)
-

  metastore/if/hive_metastore.thrift 43b3907 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 
01c2626 
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java 
65406d9 
  metastore/src/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java 
cacfa07 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 04d399f 
  metastore/src/java/org/apache/hadoop/hive/metastore/RawStore.java 27ae3c4 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 57f1e67 
  
metastore/src/test/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 c0e720f 
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java d32be59 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 947b65c 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Partition.java f4476a9 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 321759b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 7443ea4 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ImportSemanticAnalyzer.java 
e97d948 
  ql/src/java/org/apache/hadoop/hive/ql/plan/AddPartitionDesc.java ff60e44 
  ql/src/test/results/clientpositive/add_part_exist.q.out 559cb26 
  ql/src/test/results/clientpositive/add_part_multiple.q.out b2525cf 
  ql/src/test/results/clientpositive/create_view_partitioned.q.out e90ffc7 
  ql/src/test/results/clientpositive/partitions_json.q.out deb7a1f 

Diff: https://reviews.apache.org/r/16074/diff/


Testing
---

running clidriver, some more query results will change


Thanks,

Sergey Shelukhin



[jira] [Commented] (HIVE-3181) getDatabaseMajor/Minor version does not return values

2013-12-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841707#comment-13841707
 ] 

Xuefu Zhang commented on HIVE-3181:
---

Patch committed to trunk. Thanks to Szehon for the contribution and to Navis 
for the review.

> getDatabaseMajor/Minor version does not return values
> -
>
> Key: HIVE-3181
> URL: https://issues.apache.org/jira/browse/HIVE-3181
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 0.8.1
>Reporter: N Campbell
>Assignee: Szehon Ho
> Fix For: 0.13.0
>
> Attachments: HIVE-3181.2.patch, HIVE-3181.patch
>
>
> This is really a sub-issue of HIVE-3174 (which is a lot of properties) but 
> given that the driver will return databaseProductVersion it makes no sense to 
> not have implemented these as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-3181) getDatabaseMajor/Minor version does not return values

2013-12-06 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-3181:
--

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

> getDatabaseMajor/Minor version does not return values
> -
>
> Key: HIVE-3181
> URL: https://issues.apache.org/jira/browse/HIVE-3181
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 0.8.1
>Reporter: N Campbell
>Assignee: Szehon Ho
> Fix For: 0.13.0
>
> Attachments: HIVE-3181.2.patch, HIVE-3181.patch
>
>
> This is really a sub-issue of HIVE-3174 (which is a lot of properties) but 
> given that the driver will return databaseProductVersion it makes no sense to 
> not have implemented these as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841693#comment-13841693
 ] 

Edward Capriolo edited comment on HIVE-5783 at 12/6/13 8:55 PM:


{quote}
I would normally agree with this, but I suppose I was trying to make as minor a 
change as possible.
{quote}
Right I am not demanding that we do it one way or the other, just pointing out 
that we should not build tech dept. hive does not have a dedicated cleanup crew 
to handle all the non-sexy features :)


was (Author: appodictic):
{quote}
I would normally agree with this, but I suppose I was trying to make as minor a 
change as possible.
{quote}
Right I am not demanding that we do it one way or the other, just pointing out 
that we should not build tech dept.

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841693#comment-13841693
 ] 

Edward Capriolo commented on HIVE-5783:
---

{quote}
I would normally agree with this, but I suppose I was trying to make as minor a 
change as possible.
{quote}
Right I am not demanding that we do it one way or the other, just pointing out 
that we should not build tech dept.

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841692#comment-13841692
 ] 

Xuefu Zhang commented on HIVE-5783:
---

[~jcoffey] To rebase, we need to specify external dependency in Hive 0.13 pom 
file. What external lib does your patch need, such repo, groupid, artifactid, 
and version?

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Justin Coffey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841681#comment-13841681
 ] 

Justin Coffey commented on HIVE-5783:
-

{quote}
I think that was done before maven. I am sure there is a reason why RCFILE, 
ORCFILE and this add there own syntax, but this is something we might not want 
to copy-and-paste repeat just because the last person did it that way.
{quote}

I would normally agree with this, but I suppose I was trying to make as minor a 
change as possible.

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841674#comment-13841674
 ] 

Edward Capriolo commented on HIVE-5783:
---

{quote}
regarding the support being built into the semantic analyzer, I mimicked what 
was done for ORC support{quote}
I think that was done before maven. I am sure there is a reason why RCFILE, 
ORCFILE and this add there own syntax, but this is something we might not want 
to copy-and-paste repeat just because the last person did it that way. 


> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Justin Coffey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841666#comment-13841666
 ] 

Justin Coffey commented on HIVE-5783:
-

[~appodictic], regarding the support being built into the semantic analyzer, I 
mimicked what was done for ORC support.  I agree that a hard coded switch 
statement is not the best approach, but thought a larger refactoring was out of 
scope for this request--and definitely not something to be done against the 
0.11 branch :).  Now with trunk support for parquet-hive I suppose we could 
tackle this in a more generic/robust way.

[~xuefuz], do you mean the actual parquet input/output formats and serde?  If 
so, these are in the parquet-hive project 
(https://github.com/Parquet/parquet-mr/tree/master/parquet-hive).

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 16063: HIVE-4395: Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-12-06 Thread Thejas Nair

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16063/#review29900
---


Thanks for the comprehensive tests!


itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java


comment not applicable ?



itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java


why not re-use execFetchFirst here ?



itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java


comment not applicable ?



itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java


why not re-use execFetchFirst here ?



itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java


you can use fail("..") instead of assertTrue here.



itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java


I think the error message would need to be updated in the test case.



itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java


I think the error message would need to be updated in the test case.



itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java


nit - a trailing white space



itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java


Thanks for these comments!
can you extend the comment to - "@param oneRowOnly - read only one row from 
result"



- Thejas Nair


On Dec. 6, 2013, 5:37 a.m., Prasad Mujumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16063/
> ---
> 
> (Updated Dec. 6, 2013, 5:37 a.m.)
> 
> 
> Review request for hive, Brock Noland and Thejas Nair.
> 
> 
> Bugs: HIVE-4395
> https://issues.apache.org/jira/browse/HIVE-4395
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Support fetch-from-start for hiveserver2 fetch operations. 
>  - Handle new fetch orientation for various HS2 operations.
>  - Added support to reset the read position in Hive driver
>  - Enabled scroll cursors with support for positioning cursor to start of 
> resultset
> 
> 
> Diffs
> -
> 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> 7b1c9da 
>   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java ef39573 
>   jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 812ee56 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java fce19bf 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ed502a7 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 86db406 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 343f760 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java ce54e0c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  581e69c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  af87a90 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  0fe01c0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  bafe40c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  2be018e 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  7e8a06b 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  2daa9cd 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  a1ac55b 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 6f4b8dc 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> 301187d 
> 
> Diff: https://reviews.apache.org/r/16063/diff/
> 
> 
> Testing
> ---
> 
> Added new testcases to TestJdbcDriver2
> 
> 
> Thanks,
> 
> Prasad Mujumdar
> 
>



[jira] [Commented] (HIVE-3181) getDatabaseMajor/Minor version does not return values

2013-12-06 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841629#comment-13841629
 ] 

Szehon Ho commented on HIVE-3181:
-

Thanks for the review.  The patch still seems to apply cleanly, can it be 
committed?

> getDatabaseMajor/Minor version does not return values
> -
>
> Key: HIVE-3181
> URL: https://issues.apache.org/jira/browse/HIVE-3181
> Project: Hive
>  Issue Type: Improvement
>  Components: JDBC
>Affects Versions: 0.8.1
>Reporter: N Campbell
>Assignee: Szehon Ho
> Attachments: HIVE-3181.2.patch, HIVE-3181.patch
>
>
> This is really a sub-issue of HIVE-3174 (which is a lot of properties) but 
> given that the driver will return databaseProductVersion it makes no sense to 
> not have implemented these as well.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841620#comment-13841620
 ] 

Brock Noland commented on HIVE-5783:


bq. I don't see any new files as expected.

It looks complete to me.

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HIVE-5976) Decouple input formats from STORED as keywords

2013-12-06 Thread Brock Noland (JIRA)
Brock Noland created HIVE-5976:
--

 Summary: Decouple input formats from STORED as keywords
 Key: HIVE-5976
 URL: https://issues.apache.org/jira/browse/HIVE-5976
 Project: Hive
  Issue Type: Task
Reporter: Brock Noland


As noted in HIVE-5783, we hard code the input formats mapped to keywords. It'd 
be nice if there was a registration system so we didn't need to do that.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841618#comment-13841618
 ] 

Brock Noland commented on HIVE-5783:


bq. Why does support need to be build directly into the semantic analyzer?

At present this is required to get STORED AS. 

bq. I think input format/serde's should be decoupled from the hive code as much 
as possible. hard codes like this make it hard to evolve support.

Yes, I agree. We should have some kind of registration system. I have created a 
jira for that HIVE-5976, but I don't see that as a blocker.

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841606#comment-13841606
 ] 

Xuefu Zhang commented on HIVE-5783:
---

[~jcoffey] Thanks for your contribution. I can help rebase with the latest 
trunk. However, are you sure your patch is complete? I don't see any new files 
as expected.

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841601#comment-13841601
 ] 

Edward Capriolo commented on HIVE-5783:
---

Why does support need to be build directly into the semantic analyzer? I think 
input format/serde's should be decoupled from the hive code as much as 
possible. hard codes like this make it hard to evolve support. I *think* you 
should be only adding the libs as a dependency to the pom files and building 
some tests. 

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5926) Load Data OverWrite Into Table Throw org.apache.hadoop.hive.ql.metadata.HiveException

2013-12-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841570#comment-13841570
 ] 

Xuefu Zhang commented on HIVE-5926:
---

[~tianyi] Thanks for your contribution. It would be nice if the following can 
be provided:

1. Test cases in your patch
2. Review board item so that reviewers can post comments.


> Load Data OverWrite Into Table Throw 
> org.apache.hadoop.hive.ql.metadata.HiveException
> -
>
> Key: HIVE-5926
> URL: https://issues.apache.org/jira/browse/HIVE-5926
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 0.12.0
> Environment: OS: Red Hat Enterprise Linux Server release 6.2
> HDFS: CDH-4.2.1
> MAPRED: CDH-4.2.1-mr1
>Reporter: Yi Tian
> Attachments: HIVE-5926.patch
>
>
> step1: create table 
> step2: load data 
> load data inpath '/tianyi/usys_etl_map_total.del' overwrite into table 
> tianyi_test3
> step3: copy file back
> hadoop fs -cp /user/hive/warehouse/tianyi_test3/usys_etl_map_total.del /tianyi
> step4: load data again
> load data inpath '/tianyi/usys_etl_map_total.del' overwrite into table 
> tianyi_test3
> here we can see the error in console:
> Failed with exception Error moving: 
> hdfs://ocdccluster/tianyi/usys_etl_map_total.del into: 
> /user/hive/warehouse/tianyi_test3/usys_etl_map_total.del
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> we can find error detail in hive.log:
> 2013-12-03 17:26:41,717 ERROR exec.Task (SessionState.java:printError(419)) - 
> Failed with exception Error moving: 
> hdfs://ocdccluster/tianyi/usys_etl_map_total.del into: 
> /user/hive/warehouse/tianyi_test3/usys_etl_map_total.del
> org.apache.hadoop.hive.ql.metadata.HiveException: Error moving: 
> hdfs://ocdccluster/tianyi/usys_etl_map_total.del into: 
> /user/hive/warehouse/tianyi_test3/usys_etl_map_total.del
>   at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2323)
>   at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:639)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1441)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:283)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.io.IOException: Error moving: 
> hdfs://ocdccluster/tianyi/usys_etl_map_total.del into: 
> /user/hive/warehouse/tianyi_test3/usys_etl_map_total.del
>   at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2317)
>   ... 20 more
> 2013-12-03 17:26:41,718 ERROR ql.Driver (SessionState.java:printError(419)) - 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5926) Load Data OverWrite Into Table Throw org.apache.hadoop.hive.ql.metadata.HiveException

2013-12-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841571#comment-13841571
 ] 

Xuefu Zhang commented on HIVE-5926:
---

3. Please use --no-prefix when generating patch using git.

> Load Data OverWrite Into Table Throw 
> org.apache.hadoop.hive.ql.metadata.HiveException
> -
>
> Key: HIVE-5926
> URL: https://issues.apache.org/jira/browse/HIVE-5926
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 0.12.0
> Environment: OS: Red Hat Enterprise Linux Server release 6.2
> HDFS: CDH-4.2.1
> MAPRED: CDH-4.2.1-mr1
>Reporter: Yi Tian
> Attachments: HIVE-5926.patch
>
>
> step1: create table 
> step2: load data 
> load data inpath '/tianyi/usys_etl_map_total.del' overwrite into table 
> tianyi_test3
> step3: copy file back
> hadoop fs -cp /user/hive/warehouse/tianyi_test3/usys_etl_map_total.del /tianyi
> step4: load data again
> load data inpath '/tianyi/usys_etl_map_total.del' overwrite into table 
> tianyi_test3
> here we can see the error in console:
> Failed with exception Error moving: 
> hdfs://ocdccluster/tianyi/usys_etl_map_total.del into: 
> /user/hive/warehouse/tianyi_test3/usys_etl_map_total.del
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> we can find error detail in hive.log:
> 2013-12-03 17:26:41,717 ERROR exec.Task (SessionState.java:printError(419)) - 
> Failed with exception Error moving: 
> hdfs://ocdccluster/tianyi/usys_etl_map_total.del into: 
> /user/hive/warehouse/tianyi_test3/usys_etl_map_total.del
> org.apache.hadoop.hive.ql.metadata.HiveException: Error moving: 
> hdfs://ocdccluster/tianyi/usys_etl_map_total.del into: 
> /user/hive/warehouse/tianyi_test3/usys_etl_map_total.del
>   at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2323)
>   at org.apache.hadoop.hive.ql.metadata.Table.replaceFiles(Table.java:639)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1441)
>   at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:283)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1414)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1192)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1020)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:888)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Caused by: java.io.IOException: Error moving: 
> hdfs://ocdccluster/tianyi/usys_etl_map_total.del into: 
> /user/hive/warehouse/tianyi_test3/usys_etl_map_total.del
>   at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2317)
>   ... 20 more
> 2013-12-03 17:26:41,718 ERROR ql.Driver (SessionState.java:printError(419)) - 
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Justin Coffey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Coffey updated HIVE-5783:


Fix Version/s: 0.11.0
 Release Note: adds stored as parquet and setting parquet as the default 
storage engine.
   Status: Patch Available  (was: Open)

built and tested against hive 0.11--a rebase will be necessary to work against 
the trunk

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5783) Native Parquet Support in Hive

2013-12-06 Thread Justin Coffey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Coffey updated HIVE-5783:


Attachment: hive-0.11-parquet.patch

> Native Parquet Support in Hive
> --
>
> Key: HIVE-5783
> URL: https://issues.apache.org/jira/browse/HIVE-5783
> Project: Hive
>  Issue Type: New Feature
>Reporter: Justin Coffey
>Assignee: Justin Coffey
>Priority: Minor
> Fix For: 0.11.0
>
> Attachments: hive-0.11-parquet.patch
>
>
> Problem Statement:
> Hive would be easier to use if it had native Parquet support. Our 
> organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
> Hive integration and would like to now contribute that integration to Hive.
> About Parquet:
> Parquet is a columnar storage format for Hadoop and integrates with many 
> Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
> Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
> Parquet integration.
> Changes Details:
> Parquet was built with dependency management in mind and therefore only a 
> single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5878) Hive standard avg UDAF returns double as the return type for some exact input types

2013-12-06 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841534#comment-13841534
 ] 

Eric Hanson commented on HIVE-5878:
---

To me, the issue is really about the importance of obeying the SQL standard 
(which would argue for using exact types for results in the situations the 
standard describes) vs. maintaining backward compatibility.

I don't think that accuracy is the issue. E.g. avg(int) yields int in SQL 
Server which is less accurate than yielding double, but it does meet the SQL 
standard's requirements of being an exact type. I would not favor making Hive 
yield int for avg(int) to meet the SQL standard because that loses information 
compared to the previous Hive behavior (yielding double) and that would be 
perceived even more as a breaking change in existing applications than 
producing decimal.

I agree that vectorization is an implementation detail. Vectorization can be 
extended to handle decimal.

> Hive standard avg UDAF returns double as the return type for some exact input 
> types
> ---
>
> Key: HIVE-5878
> URL: https://issues.apache.org/jira/browse/HIVE-5878
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5878.1.patch, HIVE-5878.patch
>
>
> For standard, no-partial avg result, hive currently returns double as the 
> result type.
> {code}
> hive> desc test;
> OK
> d int None
> Time taken: 0.051 seconds, Fetched: 1 row(s)
> hive> explain select avg(`d`) from test;  
> ...
>   Reduce Operator Tree:
> Group By Operator
>   aggregations:
> expr: avg(VALUE._col0)
>   bucketGroup: false
>   mode: mergepartial
>   outputColumnNames: _col0
>   Select Operator
> expressions:
>   expr: _col0
>   type: double
> {code}
> However, exact types including integers and decimal should yield exact type. 
> Here is what MySQL does:
> {code}
> mysql> desc test;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | i | int(11)  | YES  | | NULL|   |
> | b | tinyint(1)   | YES  | | NULL|   |
> | d | double   | YES  | | NULL|   |
> | s | varchar(5)   | YES  | | NULL|   |
> | dd| decimal(5,2) | YES  | | NULL|   |
> +---+--+--+-+-+---+
> mysql> create table test62 as select avg(i) from test;
> mysql> desc test62;
> +---+---+--+-+-+---+
> | Field | Type  | Null | Key | Default | Extra |
> +---+---+--+-+-+---+
> | avg(i) | decimal(14,4) | YES  | | NULL|   |
> +---+---+--+-+-+---+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-4395) Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-12-06 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-4395:
-

Status: Open  (was: Patch Available)

I added a couple comments on RB. Thanks.

> Support TFetchOrientation.FIRST for HiveServer2 FetchResults
> 
>
> Key: HIVE-4395
> URL: https://issues.apache.org/jira/browse/HIVE-4395
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC
>Affects Versions: 0.11.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Attachments: HIVE-4395-1.patch, HIVE-4395.1.patch, HIVE-4395.2.patch, 
> HIVE-4395.3.patch, HIVE-4395.4.patch
>
>
> Currently HiveServer2 only support fetching next row 
> (TFetchOrientation.NEXT). This ticket is to implement support for 
> TFetchOrientation.FIRST that resets the fetch position at the begining of the 
> resultset. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-1033) change default value of hive.exec.parallel to true

2013-12-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1033:
---

Status: Open  (was: Patch Available)

Looks like some valid failures. Needs investigation.

> change default value of hive.exec.parallel to true
> --
>
> Key: HIVE-1033
> URL: https://issues.apache.org/jira/browse/HIVE-1033
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-1033.2.patch, hive.1033.1.patch
>
>
> There is no harm in changing it to true. 
> Inside facebook, we have been testing it and it seems to be stable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


Re: Review Request 16063: HIVE-4395: Support TFetchOrientation.FIRST for HiveServer2 FetchResults

2013-12-06 Thread Carl Steinbach

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16063/#review29893
---



ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java


Formatting.



service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java


Since this EnumSet if constant I think we should make it a static final 
class variable.

Also, formatting.



service/src/java/org/apache/hive/service/cli/operation/Operation.java


Use supportedOrientations.contains(orientation) instead of manually 
iterating through the set. Also, should this method be protected instead of 
public?


- Carl Steinbach


On Dec. 6, 2013, 5:37 a.m., Prasad Mujumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16063/
> ---
> 
> (Updated Dec. 6, 2013, 5:37 a.m.)
> 
> 
> Review request for hive, Brock Noland and Thejas Nair.
> 
> 
> Bugs: HIVE-4395
> https://issues.apache.org/jira/browse/HIVE-4395
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Support fetch-from-start for hiveserver2 fetch operations. 
>  - Handle new fetch orientation for various HS2 operations.
>  - Added support to reset the read position in Hive driver
>  - Enabled scroll cursors with support for positioning cursor to start of 
> resultset
> 
> 
> Diffs
> -
> 
>   itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 
> 7b1c9da 
>   jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java ef39573 
>   jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java 812ee56 
>   jdbc/src/java/org/apache/hive/jdbc/HiveStatement.java fce19bf 
>   ql/src/java/org/apache/hadoop/hive/ql/Context.java ed502a7 
>   ql/src/java/org/apache/hadoop/hive/ql/Driver.java 86db406 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 343f760 
>   ql/src/java/org/apache/hadoop/hive/ql/processors/DfsProcessor.java ce54e0c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java
>  581e69c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java
>  af87a90 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java
>  0fe01c0 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java
>  bafe40c 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java
>  2be018e 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
>  7e8a06b 
>   
> service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java
>  2daa9cd 
>   
> service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java
>  a1ac55b 
>   service/src/java/org/apache/hive/service/cli/operation/Operation.java 
> 6f4b8dc 
>   service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java 
> 301187d 
> 
> Diff: https://reviews.apache.org/r/16063/diff/
> 
> 
> Testing
> ---
> 
> Added new testcases to TestJdbcDriver2
> 
> 
> Thanks,
> 
> Prasad Mujumdar
> 
>



[jira] [Commented] (HIVE-5878) Hive standard avg UDAF returns double as the return type for some exact input types

2013-12-06 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841516#comment-13841516
 ] 

Xuefu Zhang commented on HIVE-5878:
---

[~ehans] Thank you for your concern. However, I respectfully disagree that the 
behavior WAS and IS reasonable for several reasons. First, AVG was probably 
introduced before decimal, so there was no better choice than double. Hive has 
the concept of exact types (int, long, decimal, etc.) vs approximate types 
(double, float, etc), and Arithmetic operations (plus, divide, etc) on exact 
types generates exact type for accuracy. If average is defined mathematically 
as sum/count, then sum(int)/count should result an exact type. Otherwise, avg() 
and sum()/count give different result. Another inconsistency exists when 
avg(decimal) results a decimal. All those cause inconsistency in Hive's 
mathematical concept and function behavior, and can create confusions among 
users as well.

I understand vectorized current implementation chooses double for sum and uses 
sum/count to get another double for average. While this extends the scope of 
the changes, to me, however, vectorization is just implementation, which should 
not dictate high-level concept and consistency.

> Hive standard avg UDAF returns double as the return type for some exact input 
> types
> ---
>
> Key: HIVE-5878
> URL: https://issues.apache.org/jira/browse/HIVE-5878
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5878.1.patch, HIVE-5878.patch
>
>
> For standard, no-partial avg result, hive currently returns double as the 
> result type.
> {code}
> hive> desc test;
> OK
> d int None
> Time taken: 0.051 seconds, Fetched: 1 row(s)
> hive> explain select avg(`d`) from test;  
> ...
>   Reduce Operator Tree:
> Group By Operator
>   aggregations:
> expr: avg(VALUE._col0)
>   bucketGroup: false
>   mode: mergepartial
>   outputColumnNames: _col0
>   Select Operator
> expressions:
>   expr: _col0
>   type: double
> {code}
> However, exact types including integers and decimal should yield exact type. 
> Here is what MySQL does:
> {code}
> mysql> desc test;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | i | int(11)  | YES  | | NULL|   |
> | b | tinyint(1)   | YES  | | NULL|   |
> | d | double   | YES  | | NULL|   |
> | s | varchar(5)   | YES  | | NULL|   |
> | dd| decimal(5,2) | YES  | | NULL|   |
> +---+--+--+-+-+---+
> mysql> create table test62 as select avg(i) from test;
> mysql> desc test62;
> +---+---+--+-+-+---+
> | Field | Type  | Null | Key | Default | Extra |
> +---+---+--+-+-+---+
> | avg(i) | decimal(14,4) | YES  | | NULL|   |
> +---+---+--+-+-+---+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-1033) change default value of hive.exec.parallel to true

2013-12-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841496#comment-13841496
 ] 

Hive QA commented on HIVE-1033:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12617417/HIVE-1033.2.patch

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 4460 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_metadata_only_queries
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_skewjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_smb_mapjoin_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_only_null
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/554/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/554/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12617417

> change default value of hive.exec.parallel to true
> --
>
> Key: HIVE-1033
> URL: https://issues.apache.org/jira/browse/HIVE-1033
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-1033.2.patch, hive.1033.1.patch
>
>
> There is no harm in changing it to true. 
> Inside facebook, we have been testing it and it seems to be stable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5878) Hive standard avg UDAF returns double as the return type for some exact input types

2013-12-06 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841449#comment-13841449
 ] 

Eric Hanson commented on HIVE-5878:
---

To be clear, I think the existing Hive behavior, where the result of 
avg() is a double, is reasonable behavior and should not be 
changed.

> Hive standard avg UDAF returns double as the return type for some exact input 
> types
> ---
>
> Key: HIVE-5878
> URL: https://issues.apache.org/jira/browse/HIVE-5878
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5878.1.patch, HIVE-5878.patch
>
>
> For standard, no-partial avg result, hive currently returns double as the 
> result type.
> {code}
> hive> desc test;
> OK
> d int None
> Time taken: 0.051 seconds, Fetched: 1 row(s)
> hive> explain select avg(`d`) from test;  
> ...
>   Reduce Operator Tree:
> Group By Operator
>   aggregations:
> expr: avg(VALUE._col0)
>   bucketGroup: false
>   mode: mergepartial
>   outputColumnNames: _col0
>   Select Operator
> expressions:
>   expr: _col0
>   type: double
> {code}
> However, exact types including integers and decimal should yield exact type. 
> Here is what MySQL does:
> {code}
> mysql> desc test;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | i | int(11)  | YES  | | NULL|   |
> | b | tinyint(1)   | YES  | | NULL|   |
> | d | double   | YES  | | NULL|   |
> | s | varchar(5)   | YES  | | NULL|   |
> | dd| decimal(5,2) | YES  | | NULL|   |
> +---+--+--+-+-+---+
> mysql> create table test62 as select avg(i) from test;
> mysql> desc test62;
> +---+---+--+-+-+---+
> | Field | Type  | Null | Key | Default | Extra |
> +---+---+--+-+-+---+
> | avg(i) | decimal(14,4) | YES  | | NULL|   |
> +---+---+--+-+-+---+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-5878) Hive standard avg UDAF returns double as the return type for some exact input types

2013-12-06 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841442#comment-13841442
 ] 

Eric Hanson commented on HIVE-5878:
---

I'm not comfortable with this change. If the data type of expression and 
aggregate results in Hive was reasonable before, I think it would be best to 
leave the type the same as it was, for backward compatibility, so we don't 
break people's applications. 

Also, the vectorized implementation of aggregates was built to return the same 
data types as the row-at-a-time implementation, for compatibility. It is 
important that any changes in semantics or types be implemented in both the 
row-at-a-time, and vectorized execution paths.

Different database systems make different choices about expression and 
aggregate result types. For example, in SQL Server, avg applied to an int is an 
int:

{code}
create table test(i int);
select avg(i) avg_i into res2 from test;
{code}

gives table res2 with this schema:

{code}
CREATE TABLE [dbo].[res2](
[avg_i] [int] NULL
) 
{code}



> Hive standard avg UDAF returns double as the return type for some exact input 
> types
> ---
>
> Key: HIVE-5878
> URL: https://issues.apache.org/jira/browse/HIVE-5878
> Project: Hive
>  Issue Type: Bug
>  Components: Types, UDF
>Affects Versions: 0.12.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-5878.1.patch, HIVE-5878.patch
>
>
> For standard, no-partial avg result, hive currently returns double as the 
> result type.
> {code}
> hive> desc test;
> OK
> d int None
> Time taken: 0.051 seconds, Fetched: 1 row(s)
> hive> explain select avg(`d`) from test;  
> ...
>   Reduce Operator Tree:
> Group By Operator
>   aggregations:
> expr: avg(VALUE._col0)
>   bucketGroup: false
>   mode: mergepartial
>   outputColumnNames: _col0
>   Select Operator
> expressions:
>   expr: _col0
>   type: double
> {code}
> However, exact types including integers and decimal should yield exact type. 
> Here is what MySQL does:
> {code}
> mysql> desc test;
> +---+--+--+-+-+---+
> | Field | Type | Null | Key | Default | Extra |
> +---+--+--+-+-+---+
> | i | int(11)  | YES  | | NULL|   |
> | b | tinyint(1)   | YES  | | NULL|   |
> | d | double   | YES  | | NULL|   |
> | s | varchar(5)   | YES  | | NULL|   |
> | dd| decimal(5,2) | YES  | | NULL|   |
> +---+--+--+-+-+---+
> mysql> create table test62 as select avg(i) from test;
> mysql> desc test62;
> +---+---+--+-+-+---+
> | Field | Type  | Null | Key | Default | Extra |
> +---+---+--+-+-+---+
> | avg(i) | decimal(14,4) | YES  | | NULL|   |
> +---+---+--+-+-+---+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-1033) change default value of hive.exec.parallel to true

2013-12-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1033:
---

Attachment: HIVE-1033.2.patch

> change default value of hive.exec.parallel to true
> --
>
> Key: HIVE-1033
> URL: https://issues.apache.org/jira/browse/HIVE-1033
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: HIVE-1033.2.patch, hive.1033.1.patch
>
>
> There is no harm in changing it to true. 
> Inside facebook, we have been testing it and it seems to be stable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-1033) change default value of hive.exec.parallel to true

2013-12-06 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1033:
---

Assignee: Ashutosh Chauhan  (was: Namit Jain)
  Status: Patch Available  (was: Open)

> change default value of hive.exec.parallel to true
> --
>
> Key: HIVE-1033
> URL: https://issues.apache.org/jira/browse/HIVE-1033
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-1033.2.patch, hive.1033.1.patch
>
>
> There is no harm in changing it to true. 
> Inside facebook, we have been testing it and it seems to be stable.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-549) Parallel Execution Mechanism

2013-12-06 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13841412#comment-13841412
 ] 

Ashutosh Chauhan commented on HIVE-549:
---

nevermind there is already HIVE-1033. I will update the patch there soon.

> Parallel Execution Mechanism
> 
>
> Key: HIVE-549
> URL: https://issues.apache.org/jira/browse/HIVE-549
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Adam Kramer
>Assignee: Chaitanya Mishra
>  Labels: hive-appu
> Fix For: 0.5.0
>
> Attachments: HIVE549-v7.patch
>
>
> In a massively parallel database system, it would be awesome to also 
> parallelize some of the mapreduce phases that our data needs to go through.
> One example that just occurred to me is UNION ALL: when you union two SELECT 
> statements, effectively you could run those statements in parallel. There's 
> no situation (that I can think of, but I don't have a formal proof) in which 
> the left statement would rely on the right statement, or vice versa. So, they 
> could be run at the same time...and perhaps they should be. Or, perhaps there 
> should be a way to make this happen...PARALLEL UNION ALL? PUNION ALL?



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HIVE-5641) BeeLineOpts ignores Throwable

2013-12-06 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5641:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Thank you for the review! I committed this to trunk.

> BeeLineOpts ignores Throwable
> -
>
> Key: HIVE-5641
> URL: https://issues.apache.org/jira/browse/HIVE-5641
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: HIVE-5641.patch
>
>
> BeelineOpts has the following code:
> {noformat}
> } catch (Throwable t) {
>return -1;
>  }
> {noformat}
> which is bad. I believe it is there because the code called maybe buggy and 
> throw such things as ArrayIndex or NoSuchElement etc.
> I propose we:
> 1) catch exception, not throwable. Usage should remain the same without 
> having a giant black hole
> 2) that we log the exception so we can figure out what is wrong with the 
> underlying code



--
This message was sent by Atlassian JIRA
(v6.1#6144)


  1   2   >