[jira] [Updated] (HIVE-6756) alter table set fileformat should set serde too

2014-05-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6756:
---

Status: Open  (was: Patch Available)

While some of the failures in Hive QA run are also on trunk, but some other 
failures look relevant.

> alter table set fileformat should set serde too
> ---
>
> Key: HIVE-6756
> URL: https://issues.apache.org/jira/browse/HIVE-6756
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Owen O'Malley
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-6756.1.patch, HIVE-6756.patch
>
>
> Currently doing alter table set fileformat doesn't change the serde. This is 
> unexpected by customers because the serdes are largely file format specific.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7087) Remove lineage information after query completion

2014-05-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001430#comment-14001430
 ] 

Ashutosh Chauhan commented on HIVE-7087:


In DataContainer.toString() you may want to add delimiter between table name 
and partition values.  part.getDbName() + "." + part.getTableName() + "@" + 
part.getValues()  (or any other delimiter of your choice)

Other than look goods to me.

> Remove lineage information after query completion
> -
>
> Key: HIVE-7087
> URL: https://issues.apache.org/jira/browse/HIVE-7087
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-7087.1.patch.txt
>
>
> Lineage information is stacked in session and is not cleared before the 
> session is closed. That also makes redundant lineage logs in q.out files for 
> all of the queries after any inserts, which should be available only for 
> insert queries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6999) Add streaming mode to PTFs

2014-05-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-6999:
---

   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Test failures are unrelated to this patch and are tracked elsewhere. Committed 
to trunk. Thanks, Harish!

> Add streaming mode to PTFs
> --
>
> Key: HIVE-6999
> URL: https://issues.apache.org/jira/browse/HIVE-6999
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.11.0, 0.12.0, 0.13.0
>Reporter: Harish Butani
>Assignee: Harish Butani
> Fix For: 0.14.0
>
> Attachments: HIVE-6999.1.patch, HIVE-6999.2.patch, HIVE-6999.3.patch, 
> HIVE-6999.4.patch, HIVE-6999.4.patch
>
>
> There are a set of use cases where the Table Function can operate on a 
> Partition row by row or on a subset(window) of rows as it is being streamed 
> to it.
> - Windowing has couple of use cases of this:processing of Rank functions, 
> processing of Window Aggregations.
> - But this is a generic concept: any analysis that operates on an Ordered 
> partition maybe able to operate in Streaming mode.
> This patch introduces streaming mode in PTFs and provides the mechanics to 
> handle PTF chains that contain both modes of PTFs.
> Subsequent patches will introduce Streaming mode for Windowing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7087) Remove lineage information after query completion

2014-05-18 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-7087:


Attachment: HIVE-7087.1.patch.txt

Attaches sources only first. This makes huge diff file.

> Remove lineage information after query completion
> -
>
> Key: HIVE-7087
> URL: https://issues.apache.org/jira/browse/HIVE-7087
> Project: Hive
>  Issue Type: Bug
>  Components: Logging
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-7087.1.patch.txt
>
>
> Lineage information is stacked in session and is not cleared before the 
> session is closed. That also makes redundant lineage logs in q.out files for 
> all of the queries after any inserts, which should be available only for 
> insert queries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7087) Remove lineage information after query completion

2014-05-18 Thread Navis (JIRA)
Navis created HIVE-7087:
---

 Summary: Remove lineage information after query completion
 Key: HIVE-7087
 URL: https://issues.apache.org/jira/browse/HIVE-7087
 Project: Hive
  Issue Type: Bug
  Components: Logging
Reporter: Navis
Assignee: Navis
Priority: Minor


Lineage information is stacked in session and is not cleared before the session 
is closed. That also makes redundant lineage logs in q.out files for all of the 
queries after any inserts, which should be available only for insert queries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7083) Fix test failures on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7083:
---

Attachment: HIVE-7083.1.patch

> Fix test failures on trunk
> --
>
> Key: HIVE-7083
> URL: https://issues.apache.org/jira/browse/HIVE-7083
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-7083.1.patch
>
>
> After move to jdk7 we need to update .q.out files for few tests. Also, in 
> HIVE-6901 few .q.out updates were missed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4867) Deduplicate columns appearing in both the key list and value list of ReduceSinkOperator

2014-05-18 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001416#comment-14001416
 ] 

Navis commented on HIVE-4867:
-

I think the patch is almost ready. But the diff file cannot be attached 
here(bigger than 10MB). The most part of change is from removing duplicated 
lineage information. So I'm thinking of fixing that first.

> Deduplicate columns appearing in both the key list and value list of 
> ReduceSinkOperator
> ---
>
> Key: HIVE-4867
> URL: https://issues.apache.org/jira/browse/HIVE-4867
> Project: Hive
>  Issue Type: Improvement
>Reporter: Yin Huai
>Assignee: Navis
> Attachments: HIVE-4867.1.patch.txt, source_only.txt
>
>
> A ReduceSinkOperator emits data in the format of keys and values. Right now, 
> a column may appear in both the key list and value list, which result in 
> unnecessary overhead for shuffling. 
> Example:
> We have a query shown below ...
> {code:sql}
> explain select ss_ticket_number from store_sales cluster by ss_ticket_number;
> {\code}
> The plan is ...
> {code}
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-1
> Map Reduce
>   Alias -> Map Operator Tree:
> store_sales 
>   TableScan
> alias: store_sales
> Select Operator
>   expressions:
> expr: ss_ticket_number
> type: int
>   outputColumnNames: _col0
>   Reduce Output Operator
> key expressions:
>   expr: _col0
>   type: int
> sort order: +
> Map-reduce partition columns:
>   expr: _col0
>   type: int
> tag: -1
> value expressions:
>   expr: _col0
>   type: int
>   Reduce Operator Tree:
> Extract
>   File Output Operator
> compressed: false
> GlobalTableId: 0
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
> {\code}
> The column 'ss_ticket_number' is in both the key list and value list of the 
> ReduceSinkOperator. The type of ss_ticket_number is int. For this case, 
> BinarySortableSerDe will introduce 1 byte more for every int in the key. 
> LazyBinarySerDe will also introduce overhead when recording the length of a 
> int. For every int, 10 bytes should be a rough estimation of the size of data 
> emitted from the Map phase. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7083) Fix test failures on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7083:
---

Status: Patch Available  (was: Open)

> Fix test failures on trunk
> --
>
> Key: HIVE-7083
> URL: https://issues.apache.org/jira/browse/HIVE-7083
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-7083.1.patch
>
>
> After move to jdk7 we need to update .q.out files for few tests. Also, in 
> HIVE-6901 few .q.out updates were missed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Review Request 21619: Fix test fails on jdk7

2014-05-18 Thread Ashutosh Chauhan

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21619/
---

Review request for hive.


Bugs: HIVE-7083
https://issues.apache.org/jira/browse/HIVE-7083


Repository: hive-git


Description
---

Fix test fails on jdk7


Diffs
-

  ql/src/test/queries/clientpositive/udf_java_method.q 51280b2 
  ql/src/test/queries/clientpositive/udf_reflect.q cef1e4a 
  ql/src/test/queries/clientpositive/vector_decimal_math_funcs.q 6e2c0b1 
  ql/src/test/results/clientpositive/auto_sortmerge_join_11.q.out ec4532e 
  ql/src/test/results/clientpositive/tez/script_pipe.q.out 336276e 
  ql/src/test/results/clientpositive/tez/transform1.q.out 755d7d9 
  ql/src/test/results/clientpositive/tez/transform_ppr1.q.out be7e979 
  ql/src/test/results/clientpositive/tez/transform_ppr2.q.out 3cac552 
  ql/src/test/results/clientpositive/udf_java_method.q.out 97efa6e 
  ql/src/test/results/clientpositive/udf_reflect.q.out 44e10ec 
  ql/src/test/results/clientpositive/vector_decimal_math_funcs.q.out 952a7a4 
  ql/src/test/results/compiler/plan/input20.q.xml 6cc5c81 
  ql/src/test/results/compiler/plan/input4.q.xml 0626e64 
  ql/src/test/results/compiler/plan/input5.q.xml 036834e 

Diff: https://reviews.apache.org/r/21619/diff/


Testing
---

Ran tests on jdk-7


Thanks,

Ashutosh Chauhan



[jira] [Created] (HIVE-7086) TestHiveServer2.testConnection is failing on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-7086:
--

 Summary: TestHiveServer2.testConnection is failing on trunk
 Key: HIVE-7086
 URL: https://issues.apache.org/jira/browse/HIVE-7086
 Project: Hive
  Issue Type: Test
  Components: HiveServer2, JDBC
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan


Able to repro locally on fresh checkout



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7086) TestHiveServer2.testConnection is failing on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001410#comment-14001410
 ] 

Ashutosh Chauhan commented on HIVE-7086:


Test log:
{code}
Running org.apache.hive.jdbc.miniHS2.TestHiveServer2
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 7.914 sec <<< 
FAILURE! - in org.apache.hive.jdbc.miniHS2.TestHiveServer2
testConnection(org.apache.hive.jdbc.miniHS2.TestHiveServer2)  Time elapsed: 
1.995 sec  <<< ERROR!
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: 
FAILED: SemanticException [Error 10001]: Table not found tab
at 
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.checkStatus(ThriftCLIServiceClient.java:52)
at 
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementInternal(ThriftCLIServiceClient.java:151)
at 
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatement(ThriftCLIServiceClient.java:129)
at 
org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection(TestHiveServer2.java:66)


Results :

Tests in error: 
  TestHiveServer2.testConnection:66 » HiveSQL Error while compiling statement: 
F...

Tests run: 2, Failures: 0, Errors: 1, Skipped: 0

{code}

> TestHiveServer2.testConnection is failing on trunk
> --
>
> Key: HIVE-7086
> URL: https://issues.apache.org/jira/browse/HIVE-7086
> Project: Hive
>  Issue Type: Test
>  Components: HiveServer2, JDBC
>Affects Versions: 0.14.0
>Reporter: Ashutosh Chauhan
>
> Able to repro locally on fresh checkout



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4118) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully qualified table name

2014-05-18 Thread Bing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001403#comment-14001403
 ] 

Bing Li commented on HIVE-4118:
---

Generated HIVE-4118.1.patch against the trunk

> ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully 
> qualified table name
> 
>
> Key: HIVE-4118
> URL: https://issues.apache.org/jira/browse/HIVE-4118
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.10.0
>Reporter: Lenni Kuff
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-4118.1.patch
>
>
> Computing column stats fails when using fully qualified table name. Issuing a 
> "USE db" and using only the table name succeeds.
> {code}
> hive -e "ANALYZE TABLE somedb.some_table COMPUTE STATISTICS FOR COLUMNS 
> int_col"
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> NoSuchObjectException(message:Table somedb.some_table for which stats is 
> gathered doesn't exist.)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2201)
>   at 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:336)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
>   at $Proxy9.updateTableColumnStatistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.update_table_column_statistics(HiveMetaStore.java:3171)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
>   at $Proxy10.update_table_column_statistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.updateTableColumnStatistics(HiveMetaStoreClient.java:973)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
>   at $Proxy11.updateTableColumnStatistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2198)
>   ... 18 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-4118) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully qualified table name

2014-05-18 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4118:
--

Status: Patch Available  (was: Reopened)

> ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully 
> qualified table name
> 
>
> Key: HIVE-4118
> URL: https://issues.apache.org/jira/browse/HIVE-4118
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.10.0
>Reporter: Lenni Kuff
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-4118.1.patch
>
>
> Computing column stats fails when using fully qualified table name. Issuing a 
> "USE db" and using only the table name succeeds.
> {code}
> hive -e "ANALYZE TABLE somedb.some_table COMPUTE STATISTICS FOR COLUMNS 
> int_col"
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> NoSuchObjectException(message:Table somedb.some_table for which stats is 
> gathered doesn't exist.)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2201)
>   at 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:336)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
>   at $Proxy9.updateTableColumnStatistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.update_table_column_statistics(HiveMetaStore.java:3171)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
>   at $Proxy10.update_table_column_statistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.updateTableColumnStatistics(HiveMetaStoreClient.java:973)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
>   at $Proxy11.updateTableColumnStatistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2198)
>   ... 18 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7085) TestOrcHCatPigStorer.testWriteDecimal tests are failing on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-7085:
--

 Summary: TestOrcHCatPigStorer.testWriteDecimal tests are failing 
on trunk
 Key: HIVE-7085
 URL: https://issues.apache.org/jira/browse/HIVE-7085
 Project: Hive
  Issue Type: Test
  Components: HCatalog
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan


TestOrcHCatPigStorer.testWriteDecimal, TestOrcHCatPigStorer.testWriteDecimalX, 
TestOrcHCatPigStorer.testWriteDecimalXY

are failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-4118) ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully qualified table name

2014-05-18 Thread Bing Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Li updated HIVE-4118:
--

Attachment: HIVE-4118.1.patch

> ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS fails when using fully 
> qualified table name
> 
>
> Key: HIVE-4118
> URL: https://issues.apache.org/jira/browse/HIVE-4118
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics
>Affects Versions: 0.10.0
>Reporter: Lenni Kuff
>Assignee: Shreepadma Venugopalan
> Attachments: HIVE-4118.1.patch
>
>
> Computing column stats fails when using fully qualified table name. Issuing a 
> "USE db" and using only the table name succeeds.
> {code}
> hive -e "ANALYZE TABLE somedb.some_table COMPUTE STATISTICS FOR COLUMNS 
> int_col"
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> NoSuchObjectException(message:Table somedb.some_table for which stats is 
> gathered doesn't exist.)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2201)
>   at 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask.persistTableStats(ColumnStatsTask.java:325)
>   at 
> org.apache.hadoop.hive.ql.exec.ColumnStatsTask.execute(ColumnStatsTask.java:336)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1352)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1138)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
>   at $Proxy9.updateTableColumnStatistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.update_table_column_statistics(HiveMetaStore.java:3171)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
>   at $Proxy10.update_table_column_statistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.updateTableColumnStatistics(HiveMetaStoreClient.java:973)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
>   at $Proxy11.updateTableColumnStatistics(Unknown Source)
>   at 
> org.apache.hadoop.hive.ql.metadata.Hive.updateTableColumnStatistics(Hive.java:2198)
>   ... 18 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7085) TestOrcHCatPigStorer.testWriteDecimal tests are failing on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001402#comment-14001402
 ] 

Ashutosh Chauhan commented on HIVE-7085:


Test log : 
{code}
Running org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer
Tests run: 27, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 94.214 sec <<< 
FAILURE! - in org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer
testWriteDecimalXY(org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer)  Time 
elapsed: 3.801 sec  <<< ERROR!
org.apache.pig.impl.logicalLayer.FrontendException: Unable to open iterator for 
alias B
at org.apache.pig.PigServer.openIterator(PigServer.java:880)
at 
org.apache.hive.hcatalog.pig.TestHCatStorer.pigValueRangeTest(TestHCatStorer.java:290)
at 
org.apache.hive.hcatalog.pig.TestHCatStorer.pigValueRangeTestOverflow(TestHCatStorer.java:207)
at 
org.apache.hive.hcatalog.pig.TestHCatStorer.testWriteDecimalXY(TestHCatStorer.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Caused by: org.apache.pig.PigException: Unable to store alias B
at org.apache.pig.PigServer.storeEx(PigServer.java:982)
at org.apache.pig.PigServer.store(PigServer.java:942)
at org.apache.pig.PigServer.openIterator(PigServer.java:855)
at 
org.apache.hive.hcatalog.pig.TestHCatStorer.pigValueRangeTest(TestHCatStorer.java:290)
at 
org.apache.hive.hcatalog.pig.TestHCatStorer.pigValueRangeTestOverflow(TestHCatStorer.java:207)
at 
org.apache.hive.hcatalog.pig.TestHCatStorer.testWriteDecimalXY(TestHCatStorer.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.jun

[jira] [Comment Edited] (HIVE-7084) TestWebHCatE2e is failing on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001401#comment-14001401
 ] 

Ashutosh Chauhan edited comment on HIVE-7084 at 5/19/14 5:48 AM:
-

Test log:
{code}
Running org.apache.hive.hcatalog.templeton.TestWebHCatE2e
Tests run: 12, Failures: 5, Errors: 0, Skipped: 7, Time elapsed: 1.854 sec <<< 
FAILURE! - in org.apache.hive.hcatalog.templeton.TestWebHCatE2e
getStatus(org.apache.hive.hcatalog.templeton.TestWebHCatE2e)  Time elapsed: 
0.204 sec  <<< FAILURE!
junit.framework.AssertionFailedError: GET 
http://localhost:52286/templeton/v1/status?user.name=johndoe 


Error 503 java.lang.RuntimeException: Could not load wadl generators 
from wadlGeneratorDescriptions.


HTTP ERROR: 503
Problem accessing /templeton/v1/status. Reason:
java.lang.RuntimeException: Could not load wadl generators from 
wadlGeneratorDescriptions.
Powered by Jetty://






















 expected:<200> but was:<503>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at 
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus(TestWebHCatE2e.java:96)

invalidPath(org.apache.hive.hcatalog.templeton.TestWebHCatE2e)  Time elapsed: 
0.013 sec  <<< FAILURE!
junit.framework.AssertionFailedError: GET 
http://localhost:52286/templeton/v1/no_such_mapping/database?user.name=johndoe 



Error 503 java.lang.RuntimeException: Could not load wadl generators 
from wadlGeneratorDescriptions.


HTTP ERROR: 503
Problem accessing /templeton/v1/no_such_mapping/database. Reason:
java.lang.RuntimeException: Could not load wadl generators from 
wadlGeneratorDescriptions.
Powered by Jetty://






















 expected:<404> but was:<503>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at 
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath(TestWebHCatE2e.java:116)

getHadoopVersion(org.apache.hive.hcatalog.templeton.TestWebHCatE2e)  Time 
elapsed: 0.012 sec  <<< FAILURE!
junit.framework.AssertionFailedError: expected:<200> but was:<503>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at 
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion(TestWebHCatE2e.java:205)

getHiveVersion(org.apache.hive.hcatalog.templeton.TestWebHCatE2e)  Time 
elapsed: 0.012 sec  <<< FAILURE!
junit.framework.AssertionFailedError: e

[jira] [Commented] (HIVE-7084) TestWebHCatE2e is failing on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001401#comment-14001401
 ] 

Ashutosh Chauhan commented on HIVE-7084:


Test run:
{code}
Running org.apache.hive.hcatalog.templeton.TestWebHCatE2e
Tests run: 12, Failures: 5, Errors: 0, Skipped: 7, Time elapsed: 1.854 sec <<< 
FAILURE! - in org.apache.hive.hcatalog.templeton.TestWebHCatE2e
getStatus(org.apache.hive.hcatalog.templeton.TestWebHCatE2e)  Time elapsed: 
0.204 sec  <<< FAILURE!
junit.framework.AssertionFailedError: GET 
http://localhost:52286/templeton/v1/status?user.name=johndoe 


Error 503 java.lang.RuntimeException: Could not load wadl generators 
from wadlGeneratorDescriptions.


HTTP ERROR: 503
Problem accessing /templeton/v1/status. Reason:
java.lang.RuntimeException: Could not load wadl generators from 
wadlGeneratorDescriptions.
Powered by Jetty://






















 expected:<200> but was:<503>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at 
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus(TestWebHCatE2e.java:96)

invalidPath(org.apache.hive.hcatalog.templeton.TestWebHCatE2e)  Time elapsed: 
0.013 sec  <<< FAILURE!
junit.framework.AssertionFailedError: GET 
http://localhost:52286/templeton/v1/no_such_mapping/database?user.name=johndoe 



Error 503 java.lang.RuntimeException: Could not load wadl generators 
from wadlGeneratorDescriptions.


HTTP ERROR: 503
Problem accessing /templeton/v1/no_such_mapping/database. Reason:
java.lang.RuntimeException: Could not load wadl generators from 
wadlGeneratorDescriptions.
Powered by Jetty://






















 expected:<404> but was:<503>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at 
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath(TestWebHCatE2e.java:116)

getHadoopVersion(org.apache.hive.hcatalog.templeton.TestWebHCatE2e)  Time 
elapsed: 0.012 sec  <<< FAILURE!
junit.framework.AssertionFailedError: expected:<200> but was:<503>
at junit.framework.Assert.fail(Assert.java:50)
at junit.framework.Assert.failNotEquals(Assert.java:287)
at junit.framework.Assert.assertEquals(Assert.java:67)
at junit.framework.Assert.assertEquals(Assert.java:199)
at junit.framework.Assert.assertEquals(Assert.java:205)
at 
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion(TestWebHCatE2e.java:205)

getHiveVersion(org.apache.hive.hcatalog.templeton.TestWebHCatE2e)  Time 
elapsed: 0.012 sec  <<< FAILURE!
junit.framework.AssertionFailedError: expected:<200> but was:<503>
at junit.fram

[jira] [Updated] (HIVE-7084) TestWebHCatE2e is failing on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7084:
---

Description: I am able to repro it consistently on fresh checkout.

> TestWebHCatE2e is failing on trunk
> --
>
> Key: HIVE-7084
> URL: https://issues.apache.org/jira/browse/HIVE-7084
> Project: Hive
>  Issue Type: Test
>  Components: WebHCat
>Affects Versions: 0.14.0
>Reporter: Ashutosh Chauhan
>
> I am able to repro it consistently on fresh checkout.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7084) TestWebHCatE2e is failing on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-7084:
--

 Summary: TestWebHCatE2e is failing on trunk
 Key: HIVE-7084
 URL: https://issues.apache.org/jira/browse/HIVE-7084
 Project: Hive
  Issue Type: Test
  Components: WebHCat
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7082) Vectorized parquet reader should create assigners only for the columns it assigns, not for scratch columns

2014-05-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001399#comment-14001399
 ] 

Ashutosh Chauhan commented on HIVE-7082:


LGTM +1 [~jnp] Would you like to take a look too?

> Vectorized parquet reader should create assigners only for the columns it 
> assigns, not for scratch columns
> --
>
> Key: HIVE-7082
> URL: https://issues.apache.org/jira/browse/HIVE-7082
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
>  Labels: parquet, vectorization
> Attachments: HIVE-7082.1.patch
>
>
> The code in VectorizedParquetRecordReader.next and in 
> VectorColumnAssignFactory.buildAssigners iterates for all columns in the 
> vectorization context trying to build an assigner for it. Scratch columns do 
> not require assigners. Fix is to simply use the writables.length instead of 
> outputBatch.numCols.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive

2014-05-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001397#comment-14001397
 ] 

Ashutosh Chauhan commented on HIVE-5771:


Thanks [~rusanu] for helping out on this one!

> Constant propagation optimizer for Hive
> ---
>
> Key: HIVE-5771
> URL: https://issues.apache.org/jira/browse/HIVE-5771
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ted Xu
>Assignee: Ted Xu
> Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
> HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, 
> HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, 
> HIVE-5771.patch, HIVE-5771.patch.javaonly
>
>
> Currently there is no constant folding/propagation optimizer, all expressions 
> are evaluated at runtime. 
> HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
> however, it is still a runtime evaluation and it doesn't propagate constants 
> from a subquery to outside.
> It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7083) Fix test failures on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-7083:
---

Description: 
After move to jdk7 we need to update .q.out files for few tests. Also, in 
HIVE-6901 few .q.out updates were missed.


> Fix test failures on trunk
> --
>
> Key: HIVE-7083
> URL: https://issues.apache.org/jira/browse/HIVE-7083
> Project: Hive
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.14.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
>
> After move to jdk7 we need to update .q.out files for few tests. Also, in 
> HIVE-6901 few .q.out updates were missed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7083) Fix test failures on trunk

2014-05-18 Thread Ashutosh Chauhan (JIRA)
Ashutosh Chauhan created HIVE-7083:
--

 Summary: Fix test failures on trunk
 Key: HIVE-7083
 URL: https://issues.apache.org/jira/browse/HIVE-7083
 Project: Hive
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.14.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive

2014-05-18 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001395#comment-14001395
 ] 

Remus Rusanu commented on HIVE-5771:


I have opened a separate issue to track the failure for vectorization_parquet, 
HIVE-7082 and I uploaded a patch. I think these changes in HIVE-5771 just 
exposed the problem so I though it warrants its own tracking. I've tested the 
HIVE-7082.1.patch both on trunk and on HIVE-5771.10.patch and it fixes the 
problem.

> Constant propagation optimizer for Hive
> ---
>
> Key: HIVE-5771
> URL: https://issues.apache.org/jira/browse/HIVE-5771
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ted Xu
>Assignee: Ted Xu
> Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
> HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, 
> HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, 
> HIVE-5771.patch, HIVE-5771.patch.javaonly
>
>
> Currently there is no constant folding/propagation optimizer, all expressions 
> are evaluated at runtime. 
> HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
> however, it is still a runtime evaluation and it doesn't propagate constants 
> from a subquery to outside.
> It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7082) Vectorized parquet reader should create assigners only for the columns it assigns, not for scratch columns

2014-05-18 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-7082:
---

Attachment: HIVE-7082.1.patch

> Vectorized parquet reader should create assigners only for the columns it 
> assigns, not for scratch columns
> --
>
> Key: HIVE-7082
> URL: https://issues.apache.org/jira/browse/HIVE-7082
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
>  Labels: parquet, vectorization
> Attachments: HIVE-7082.1.patch
>
>
> The code in VectorizedParquetRecordReader.next and in 
> VectorColumnAssignFactory.buildAssigners iterates for all columns in the 
> vectorization context trying to build an assigner for it. Scratch columns do 
> not require assigners. Fix is to simply use the writables.length instead of 
> outputBatch.numCols.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7082) Vectorized parquet reader should create assigners only for the columns it assigns, not for scratch columns

2014-05-18 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-7082:
---

Status: Patch Available  (was: Open)

> Vectorized parquet reader should create assigners only for the columns it 
> assigns, not for scratch columns
> --
>
> Key: HIVE-7082
> URL: https://issues.apache.org/jira/browse/HIVE-7082
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.13.0, 0.14.0
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>Priority: Minor
>  Labels: parquet, vectorization
> Attachments: HIVE-7082.1.patch
>
>
> The code in VectorizedParquetRecordReader.next and in 
> VectorColumnAssignFactory.buildAssigners iterates for all columns in the 
> vectorization context trying to build an assigner for it. Scratch columns do 
> not require assigners. Fix is to simply use the writables.length instead of 
> outputBatch.numCols.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-7082) Vectorized parquet reader should create assigners only for the columns it assigns, not for scratch columns

2014-05-18 Thread Remus Rusanu (JIRA)
Remus Rusanu created HIVE-7082:
--

 Summary: Vectorized parquet reader should create assigners only 
for the columns it assigns, not for scratch columns
 Key: HIVE-7082
 URL: https://issues.apache.org/jira/browse/HIVE-7082
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.13.0, 0.14.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
Priority: Minor


The code in VectorizedParquetRecordReader.next and in 
VectorColumnAssignFactory.buildAssigners iterates for all columns in the 
vectorization context trying to build an assigner for it. Scratch columns do 
not require assigners. Fix is to simply use the writables.length instead of 
outputBatch.numCols.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive

2014-05-18 Thread Remus Rusanu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001349#comment-14001349
 ] 

Remus Rusanu commented on HIVE-5771:


I'm looking at the Parquet failure. Looks like the first row returned by 
parquet reader has a different number of fields than expected. Truth is that 
the building of the vectorizatized batch based on the first Parquet returned 
object was a hack to work around HIVE-6414. Now that it's fixed, perhaps I 
should also fix the hack and properly build the factorized batch out of the 
object inspectors, not out of the first row. 

> Constant propagation optimizer for Hive
> ---
>
> Key: HIVE-5771
> URL: https://issues.apache.org/jira/browse/HIVE-5771
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Ted Xu
>Assignee: Ted Xu
> Attachments: HIVE-5771.1.patch, HIVE-5771.10.patch, 
> HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.5.patch, 
> HIVE-5771.6.patch, HIVE-5771.7.patch, HIVE-5771.8.patch, HIVE-5771.9.patch, 
> HIVE-5771.patch, HIVE-5771.patch.javaonly
>
>
> Currently there is no constant folding/propagation optimizer, all expressions 
> are evaluated at runtime. 
> HIVE-2470 did a great job on evaluating constants on UDF initializing phase, 
> however, it is still a runtime evaluation and it doesn't propagate constants 
> from a subquery to outside.
> It may reduce I/O and accelerate process if we introduce such an optimizer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4440) SMB Operator spills to disk like it's 1999

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001280#comment-14001280
 ] 

Lefty Leverenz commented on HIVE-4440:
--

Here's the comment I added to HIVE-6586:

* [comment about hive.mapjoin.bucket.cache.size and hive.smbjoin.cache.rows | 
https://issues.apache.org/jira/browse/HIVE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001274#comment-14001274]

> SMB Operator spills to disk like it's 1999
> --
>
> Key: HIVE-4440
> URL: https://issues.apache.org/jira/browse/HIVE-4440
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.12.0
>
> Attachments: HIVE-4440.1.patch, HIVE-4440.2.patch
>
>
> I was recently looking into some performance issue with a query that used SMB 
> join and was running really slow. Turns out that the SMB join by default 
> caches only 100 values per key before spilling to disk. That seems overly 
> conservative to me. Changing the parameter resulted in a ~5x speedup - quite 
> significant.
> The parameter is: hive.mapjoin.bucket.cache.size
> Which right now is only used the SMB Operator as far as I can tell.
> The parameter was introduced originally (3 yrs ago) for the map join operator 
> (looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in 
> a different context though where you had to avoid running out of memory with 
> the cached hash table in the same process, I think.
> Two things I'd like to propose:
> a) Rename it to what it does: hive.smbjoin.cache.rows
> b) Set it to something less restrictive: 1
> If you string together a 5 table smb join with a map join and a map-side 
> group by aggregation you might still run out of memory, but the renamed 
> parameter should be easier to find and reduce. For most queries, I would 
> think that 1 is still a reasonable number to cache (On the reduce side we 
> use 25000 for shuffle joins).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6586) Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001274#comment-14001274
 ] 

Lefty Leverenz commented on HIVE-6586:
--

According to a comment in HiveConf.java (from HIVE-4440), 
*hive.mapjoin.bucket.cache.size* should be removed: 

{quote}
+ // hive.mapjoin.bucket.cache.size has been replaced by hive.smbjoin.cache.row,
+ // need to remove by hive .13. Also, do not change default (see SMB operator)
{quote}

Also, that comment has a typo in the name of the parameter replacing 
*hive.mapjoin.bucket.cache.size* and the typo is replicated in the HIVE-6037 
patch -- the new parameter is *hive.smbjoin.cache.rows*, not 
hive.smbjoin.cache.row.

> Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)
> ---
>
> Key: HIVE-6586
> URL: https://issues.apache.org/jira/browse/HIVE-6586
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Lefty Leverenz
>
> HIVE-6037 puts the definitions of configuration parameters into the 
> HiveConf.java file, but several recent jiras for release 0.13.0 introduce new 
> parameters that aren't in HiveConf.java yet and some parameter definitions 
> need to be altered for 0.13.0.  This jira will patch HiveConf.java after 
> HIVE-6037 gets committed.
> Also, four typos patched in HIVE-6582 need to be fixed in the new 
> HiveConf.java.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4440) SMB Operator spills to disk like it's 1999

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001270#comment-14001270
 ] 

Lefty Leverenz commented on HIVE-4440:
--

Hive 0.13.0 did not remove *hive.mapjoin.bucket.cache.size*.  Also, the comment 
that says it should be removed has a typo in the name of the new parameter -- 
it should be *hive.smbjoin.cache.rows*, not hive.smbjoin.cache.row:

{quote}
+// hive.mapjoin.bucket.cache.size has been replaced by 
hive.smbjoin.cache.row,
+// need to remove by hive .13. Also, do not change default (see SMB 
operator)
{quote}

Instead of creating a new jira for this, I'll add a comment on HIVE-6586 (for 
HIVE-6037).

> SMB Operator spills to disk like it's 1999
> --
>
> Key: HIVE-4440
> URL: https://issues.apache.org/jira/browse/HIVE-4440
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Fix For: 0.12.0
>
> Attachments: HIVE-4440.1.patch, HIVE-4440.2.patch
>
>
> I was recently looking into some performance issue with a query that used SMB 
> join and was running really slow. Turns out that the SMB join by default 
> caches only 100 values per key before spilling to disk. That seems overly 
> conservative to me. Changing the parameter resulted in a ~5x speedup - quite 
> significant.
> The parameter is: hive.mapjoin.bucket.cache.size
> Which right now is only used the SMB Operator as far as I can tell.
> The parameter was introduced originally (3 yrs ago) for the map join operator 
> (looks like pre-SMB) and set to 100 to avoid OOM. That seems to have been in 
> a different context though where you had to avoid running out of memory with 
> the cached hash table in the same process, I think.
> Two things I'd like to propose:
> a) Rename it to what it does: hive.smbjoin.cache.rows
> b) Set it to something less restrictive: 1
> If you string together a 5 table smb join with a map join and a map-side 
> group by aggregation you might still run out of memory, but the renamed 
> parameter should be easier to find and reduce. For most queries, I would 
> think that 1 is still a reasonable number to cache (On the reduce side we 
> use 25000 for shuffle joins).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7054) Support ELT UDF in vectorized mode

2014-05-18 Thread Deepesh Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-7054:
-

Status: Patch Available  (was: Open)

> Support ELT UDF in vectorized mode
> --
>
> Key: HIVE-7054
> URL: https://issues.apache.org/jira/browse/HIVE-7054
> Project: Hive
>  Issue Type: New Feature
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Fix For: 0.14.0
>
> Attachments: HIVE-7054.2.patch, HIVE-7054.3.patch, HIVE-7054.patch
>
>
> Implement support for ELT udf in vectorized execution mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7054) Support ELT UDF in vectorized mode

2014-05-18 Thread Deepesh Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-7054:
-

Attachment: HIVE-7054.3.patch

Attaching a patch that fixes the vectorized decimal cast and math regression.

> Support ELT UDF in vectorized mode
> --
>
> Key: HIVE-7054
> URL: https://issues.apache.org/jira/browse/HIVE-7054
> Project: Hive
>  Issue Type: New Feature
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Fix For: 0.14.0
>
> Attachments: HIVE-7054.2.patch, HIVE-7054.3.patch, HIVE-7054.patch
>
>
> Implement support for ELT udf in vectorized execution mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7054) Support ELT UDF in vectorized mode

2014-05-18 Thread Deepesh Khandelwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepesh Khandelwal updated HIVE-7054:
-

Status: Open  (was: Patch Available)

> Support ELT UDF in vectorized mode
> --
>
> Key: HIVE-7054
> URL: https://issues.apache.org/jira/browse/HIVE-7054
> Project: Hive
>  Issue Type: New Feature
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Fix For: 0.14.0
>
> Attachments: HIVE-7054.2.patch, HIVE-7054.patch
>
>
> Implement support for ELT udf in vectorized execution mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6586) Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001253#comment-14001253
 ] 

Lefty Leverenz commented on HIVE-6586:
--

The old hive-default.xml.template file includes two entries for 
*hive.metastore.integral.jdo.pushdown* (from HIVE-6070 and HIVE-6188).

That will be fixed by HIVE-6037, but we should add the name of another 
parameter to the description as done in the HIVE-6188 patch and the wiki:

* [HIVE-6188 patch | 
https://issues.apache.org/jira/secure/attachment/12637478/HIVE-6188.patch]
* [wikidoc for hive.metastore.integral.jdo.pushdown | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.metastore.integral.jdo.pushdown]

For details see:

* [HIVE-6188 comment about duplicate parameter | 
https://issues.apache.org/jira/browse/HIVE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001236#comment-14001236]

> Add new parameters to HiveConf.java after commit HIVE-6037 (also fix typos)
> ---
>
> Key: HIVE-6586
> URL: https://issues.apache.org/jira/browse/HIVE-6586
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Lefty Leverenz
>
> HIVE-6037 puts the definitions of configuration parameters into the 
> HiveConf.java file, but several recent jiras for release 0.13.0 introduce new 
> parameters that aren't in HiveConf.java yet and some parameter definitions 
> need to be altered for 0.13.0.  This jira will patch HiveConf.java after 
> HIVE-6037 gets committed.
> Also, four typos patched in HIVE-6582 need to be fixed in the new 
> HiveConf.java.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6052) metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001247#comment-14001247
 ] 

Lefty Leverenz commented on HIVE-6052:
--

*hive.metastore.integral.jdo.pushdown* is now documented in the wiki:

* [hive.metastore.integral.jdo.pushdown | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.metastore.integral.jdo.pushdown]

> metastore JDO filter pushdown for integers may produce unexpected results 
> with non-normalized integer columns
> -
>
> Key: HIVE-6052
> URL: https://issues.apache.org/jira/browse/HIVE-6052
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, 0.13.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.13.0
>
> Attachments: HIVE-6052.01.patch, HIVE-6052.02.patch, HIVE-6052.patch
>
>
> If integer partition columns have values stores in non-canonical form, for 
> example with leading zeroes, the integer filter doesn't work. That is because 
> JDO pushdown uses substrings to compare for equality, and SQL pushdown is 
> intentionally crippled to do the same to produce same results.
> Probably, since both SQL pushdown and integers pushdown are just perf 
> optimizations, we can remove it for JDO (or make configurable and disable by 
> default), and uncripple SQL.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5341) Link doesn't work. Needs to be updated as mentioned in the Description

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001215#comment-14001215
 ] 

Lefty Leverenz commented on HIVE-5341:
--

The Getting Started wiki has been revised, please review:

* [Getting Started - MovieLens User Ratings | 
https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-MovieLensUserRatings]

> Link doesn't work. Needs to be updated as mentioned in the Description
> --
>
> Key: HIVE-5341
> URL: https://issues.apache.org/jira/browse/HIVE-5341
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Rakesh Chouhan
>Assignee: Lefty Leverenz
>  Labels: documentation
>
> Go to.. Apache HIVE Getting Started Documentation
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted
> Under Section ...
> Simple Example Use Cases
> MovieLens User Ratings
> wget http://www.grouplens.org/system/files/ml-data.tar+0.gz
> The link mentioned as per the document does not work. It needs to be updated 
> to the below URL.
> http://www.grouplens.org/sites/www.grouplens.org/external_files/data/ml-data.tar.gz
> I am setting this defect's priority as a Blocker because, user's will not be 
> able to continue their hands on exercises, unless they find the correct URL 
> to download the mentioned file.
> Referenced from:
> http://mail-archives.apache.org/mod_mbox/hive-user/201302.mbox/%3c8a0c145b-4db9-4d26-8613-8ca1bd741...@daum.net%3E.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6070) document HIVE-6052

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001243#comment-14001243
 ] 

Lefty Leverenz commented on HIVE-6070:
--

The config parameters are now documented in the wiki and in 
hive-default.xml.template.

* [hive.metastore.integral.jdo.pushdown | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.metastore.integral.jdo.pushdown]
* [hive.metastore.try.direct.sql | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.metastore.try.direct.sql]

> document HIVE-6052
> --
>
> Key: HIVE-6070
> URL: https://issues.apache.org/jira/browse/HIVE-6070
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: HIVE-6070.patch
>
>
> See comments in HIVE-6052 - this is the followup jira



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6188) Document hive.metastore.try.direct.sql & hive.metastore.try.direct.sql.ddl

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001239#comment-14001239
 ] 

Lefty Leverenz commented on HIVE-6188:
--

HIVE-6070 provided the first description of 
*hive.metastore.integral.jdo.pushdown*.

> Document hive.metastore.try.direct.sql & hive.metastore.try.direct.sql.ddl
> --
>
> Key: HIVE-6188
> URL: https://issues.apache.org/jira/browse/HIVE-6188
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lefty Leverenz
>Assignee: Sergey Shelukhin
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: HIVE-6188.patch
>
>
> The hive.metastore.try.direct.sql and hive.metastore.try.direct.sql.ddl 
> configuration properties need to be documented in hive-default.xml.template 
> and the wiki.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6188) Document hive.metastore.try.direct.sql & hive.metastore.try.direct.sql.ddl

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001236#comment-14001236
 ] 

Lefty Leverenz commented on HIVE-6188:
--

Also documented:

* [hive.metastore.integral.jdo.pushdown | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.metastore.integral.jdo.pushdown]

But this patch added *hive.metastore.integral.jdo.pushdown* to 
hive-default.xml.template a second time.  That will get fixed when HIVE-6037 is 
committed.  HIVE-6037 uses the other description:  "Allow JDO query pushdown 
for integral partition columns in metastore. Off by default. This improves 
metastore perf for integral columns, especially if there's a large number of 
partitions. However, it doesn't work correctly with integral values that are 
not normalized (e.g. have leading zeroes, like 0012). If metastore direct SQL 
is enabled and works, this optimization is also irrelevant."

In the wiki I added the parameter name for metastore direct SQL, which is the 
main difference between the two descriptions.

Here's the description in this patch:  "Whether to enable JDO pushdown for 
integral types. Off by default. Irrelevant if hive.metastore.try.direct.sql is 
enabled. Otherwise, filter pushdown in metastore can improve performance, but 
for partition columns storing integers in non-canonical form, (e.g. '012'), it 
can produce incorrect results."

> Document hive.metastore.try.direct.sql & hive.metastore.try.direct.sql.ddl
> --
>
> Key: HIVE-6188
> URL: https://issues.apache.org/jira/browse/HIVE-6188
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lefty Leverenz
>Assignee: Sergey Shelukhin
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: HIVE-6188.patch
>
>
> The hive.metastore.try.direct.sql and hive.metastore.try.direct.sql.ddl 
> configuration properties need to be documented in hive-default.xml.template 
> and the wiki.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6188) Document hive.metastore.try.direct.sql & hive.metastore.try.direct.sql.ddl

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001229#comment-14001229
 ] 

Lefty Leverenz commented on HIVE-6188:
--

Now documented in the wiki:

* [hive.metastore.try.direct.sql | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.metastore.try.direct.sql]
* [hive.metastore.try.direct.sql.ddl | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.metastore.try.direct.sql.ddl]

> Document hive.metastore.try.direct.sql & hive.metastore.try.direct.sql.ddl
> --
>
> Key: HIVE-6188
> URL: https://issues.apache.org/jira/browse/HIVE-6188
> Project: Hive
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Lefty Leverenz
>Assignee: Sergey Shelukhin
>Priority: Minor
> Fix For: 0.13.0
>
> Attachments: HIVE-6188.patch
>
>
> The hive.metastore.try.direct.sql and hive.metastore.try.direct.sql.ddl 
> configuration properties need to be documented in hive-default.xml.template 
> and the wiki.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5341) Link doesn't work. Needs to be updated as mentioned in the Description

2014-05-18 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001196#comment-14001196
 ] 

Lefty Leverenz commented on HIVE-5341:
--

Glad to help.  I'll post a link to this jira in the wiki, asking people to 
comment here if the link breaks again.

> Link doesn't work. Needs to be updated as mentioned in the Description
> --
>
> Key: HIVE-5341
> URL: https://issues.apache.org/jira/browse/HIVE-5341
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Rakesh Chouhan
>Assignee: Lefty Leverenz
>  Labels: documentation
>
> Go to.. Apache HIVE Getting Started Documentation
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted
> Under Section ...
> Simple Example Use Cases
> MovieLens User Ratings
> wget http://www.grouplens.org/system/files/ml-data.tar+0.gz
> The link mentioned as per the document does not work. It needs to be updated 
> to the below URL.
> http://www.grouplens.org/sites/www.grouplens.org/external_files/data/ml-data.tar.gz
> I am setting this defect's priority as a Blocker because, user's will not be 
> able to continue their hands on exercises, unless they find the correct URL 
> to download the mentioned file.
> Referenced from:
> http://mail-archives.apache.org/mod_mbox/hive-user/201302.mbox/%3c8a0c145b-4db9-4d26-8613-8ca1bd741...@daum.net%3E.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-5341) Link doesn't work. Needs to be updated as mentioned in the Description

2014-05-18 Thread Leandro dos Santos Coutinho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001091#comment-14001091
 ] 

Leandro dos Santos Coutinho commented on HIVE-5341:
---

Thank you Lefty!

> Link doesn't work. Needs to be updated as mentioned in the Description
> --
>
> Key: HIVE-5341
> URL: https://issues.apache.org/jira/browse/HIVE-5341
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Rakesh Chouhan
>Assignee: Lefty Leverenz
>  Labels: documentation
>
> Go to.. Apache HIVE Getting Started Documentation
> https://cwiki.apache.org/confluence/display/Hive/GettingStarted
> Under Section ...
> Simple Example Use Cases
> MovieLens User Ratings
> wget http://www.grouplens.org/system/files/ml-data.tar+0.gz
> The link mentioned as per the document does not work. It needs to be updated 
> to the below URL.
> http://www.grouplens.org/sites/www.grouplens.org/external_files/data/ml-data.tar.gz
> I am setting this defect's priority as a Blocker because, user's will not be 
> able to continue their hands on exercises, unless they find the correct URL 
> to download the mentioned file.
> Referenced from:
> http://mail-archives.apache.org/mod_mbox/hive-user/201302.mbox/%3c8a0c145b-4db9-4d26-8613-8ca1bd741...@daum.net%3E.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Apache Hive 0.13.1 Release Candidate 1

2014-05-18 Thread Edward Capriolo
"Voting will conclude in 72 hours."

This statement is not correct. The vote stays open as long as it needs to.
The 72 hour window is a window for someone to register a -1 vote. The vote
stays open until it passes. So... if there are two +1 votes in 24 hours and
6 days later someone registers the third +1 the vote passes.

Again the 72 hours is the minimal amount of time for someone to get in a
-1. If someone does not have time to put in a -1 they missed their boat.

Anyway +1 on hive 13.1



On Sun, May 18, 2014 at 12:15 AM, Lefty Leverenz wrote:

> Hive bylaws<
> https://cwiki.apache.org/confluence/display/Hive/Bylaws#Bylaws-Voting>say
> the mailing list is used for voting, but as I recall bylaws have some
> wiggle room.
>
> Decisions regarding the project are made by votes on the primary project
> > development mailing list (u...@hive.apache.org ).
> > Where necessary, PMC voting may take place on the private Hive PMC
> mailing
> > list. Votes are clearly indicated by subject line starting with [VOTE].
> > Votes may contain multiple items for approval and these should be clearly
> > separated. Voting is carried out by replying to the vote mail.
>
>
> (Hm, the text says "primary project development mailing list" but then
> user@hive is shown in parentheses -- is that a typo in the bylaws?)
>
> Would people be willing to vote simultaneously by mail and on a jira?  It's
> inconvenient but shouldn't be necessary after this release.
>
> -- Lefty
>
>
> On Sat, May 17, 2014 at 7:30 PM, Sushanth Sowmyan  >wrote:
>
> > There is a technical issue as well now, as raised by Prashant. But
> > there is also the issue that people aren't reliably able to
> > respond/object/approve, and not knowing if/when it'll go through.
> >
> > I think I like Lefty's jira proposal - we could open out a jira for it
> > and address votes there, I think I'll do that for RC2.
> >
> > On Fri, May 16, 2014 at 2:53 PM, Alan Gates 
> wrote:
> > > So this isn’t a technical issue, just concern about the delays in the
> > mailing list?  Why not just extend the voting period then, until say
> Monday?
> > >
> > > Alan.
> > >
> > > On May 15, 2014, at 3:17 PM, Sushanth Sowmyan 
> > wrote:
> > >
> > >> Hi Folks,
> > >>
> > >> I'm canceling this vote and withdrawing the RC1 candidate for the
> > >> following reasons:
> > >>
> > >> a) I've talked to a couple of other people who haven't seen my mail
> > >> updates to this thread, and saw my initial vote mail a bit late too.
> > >> b) There's at least one other person that has attempted to reply to
> > >> this thread, and I don't see the replies yet.
> > >>
> > >> Thus, when the mailing list channel isn't reliably working, the
> > >> ability for people to +1 or -1 is taken away, and this does not work.
> > >> (We don't want a situation where 3 people go ahead and +1, and that
> > >> arrives before today evening, thus making the release releasable,
> > >> while someone else discovers a breaking issue that should stop it, but
> > >> is not able to have their objection or -1 appear in time.)
> > >>
> > >> I'm open to suggestions on how to proceed with the voting process. We
> > >> could wait out this week and hope the ASF mailing list issues are
> > >> resolved, but if it takes too much longer than that, we also have the
> > >> issue of delaying an important bugfix release.
> > >>
> > >> Thoughts?
> > >>
> > >> -Sushanth
> > >> (3:15PM PDT, May 15 2014)
> > >>
> > >>
> > >>
> > >> On Thu, May 15, 2014 at 11:46 AM, Sushanth Sowmyan <
> khorg...@gmail.com>
> > wrote:
> > >>> The apache dev list seems to still be a little wonky, Prasanth mailed
> > >>> me saying he'd replied to this thread with the following content,
> that
> > >>> I don't see in this thread:
> > >>>
> > >>> "Hi Sushanth
> > >>>
> > >>> https://issues.apache.org/jira/browse/HIVE-7067
> > >>> This bug is critical as it returns wrong results for min(), max(),
> > >>> join queries that uses date/timestamp columns from ORC table.
> > >>> The reason for this issue is, for these datatypes ORC returns java
> > >>> objects whereas for all other types ORC returns writables.
> > >>> When get() is performed on their corresponding object inspectors,
> > >>> writables return a new object where as java object returns reference.
> > >>> This will cause issue when any operator perform comparison on
> > >>> date/timestamp values (references will be overwritten with next
> > >>> values).
> > >>> More information is provided in the description of the jira.
> > >>>
> > >>> I think the severity of this bug is critical and should be included
> as
> > >>> part of 0.13.1. Can you please include this patch in RC2?”
> > >>>
> > >>> I think this meets the bar for criticality(actual bug in core
> feature,
> > >>> no workaround) and severity( incorrect results, effectively data
> > >>> corruption when used as source for other data), and I'm willing to
> > >>> spin an RC2 for this, but I would still like to follow the process I
> > >>> set up for jira inclusion though, 

[jira] [Commented] (HIVE-7054) Support ELT UDF in vectorized mode

2014-05-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001051#comment-14001051
 ] 

Hive QA commented on HIVE-7054:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12645362/HIVE-7054.2.patch

{color:red}ERROR:{color} -1 due to 34 failed/errored test(s), 5526 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketsortoptimize_insert_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_java_method
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorization_short_regress
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_casts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_date_funcs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_math_funcs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_string_funcs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_timestamp_funcs
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_script_pipe
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_transform_ppr2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testArithmeticExpressionVectorization
org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testIfConditionalExprs
org.apache.hadoop.hive.ql.exec.vector.TestVectorizationContext.testMathFunctions
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/228/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/228/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 34 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12645362

> Support ELT UDF in vectorized mode
> --
>
> Key: HIVE-7054
> URL: https://issues.apache.org/jira/browse/HIVE-7054
> Project: Hive
>  Issue Type: New Feature
>  Components: Vectorization
>Affects Versions: 0.14.0
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Fix For: 0.14.0
>
> Attachments: HIVE-7054.2.patch, HIVE-7054.patch
>
>
> Implement support for ELT udf in vectorized execution mode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-7050) Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE

2014-05-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-7050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001039#comment-14001039
 ] 

Hive QA commented on HIVE-7050:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12645388/HIVE-7050.5.patch

{color:red}ERROR:{color} -1 due to 19 failed/errored test(s), 5451 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_syntax
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_java_method
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_reflect
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vector_decimal_math_funcs
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_stats_counter
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input20
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input4
org.apache.hadoop.hive.ql.parse.TestParse.testParse_input5
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimal
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalX
org.apache.hive.hcatalog.pig.TestOrcHCatPigStorer.testWriteDecimalXY
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHadoopVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getHiveVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getPigVersion
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.getStatus
org.apache.hive.hcatalog.templeton.TestWebHCatE2e.invalidPath
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/226/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-Build/226/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 19 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12645388

> Display table level column stats in DESCRIBE EXTENDED/FORMATTED TABLE
> -
>
> Key: HIVE-7050
> URL: https://issues.apache.org/jira/browse/HIVE-7050
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth J
>Assignee: Prasanth J
> Attachments: HIVE-7050.1.patch, HIVE-7050.2.patch, HIVE-7050.3.patch, 
> HIVE-7050.4.patch, HIVE-7050.5.patch
>
>
> There is currently no way to display the column level stats from hive CLI. It 
> will be good to show them in DESCRIBE EXTENDED/FORMATTED TABLE



--
This message was sent by Atlassian JIRA
(v6.2#6252)