[jira] [Created] (HIVE-5318) Import Throws Error when Importing from a table export Hive 0.9 to Hive 0.10

2013-09-18 Thread Brad Ruderman (JIRA)
Brad Ruderman created HIVE-5318:
---

 Summary: Import Throws Error when Importing from a table export 
Hive 0.9 to Hive 0.10
 Key: HIVE-5318
 URL: https://issues.apache.org/jira/browse/HIVE-5318
 Project: Hive
  Issue Type: Bug
  Components: Import/Export
Affects Versions: 0.10.0, 0.9.0
Reporter: Brad Ruderman
Priority: Critical


When Exporting hive tables using the hive command in Hive 0.9 "EXPORT table TO 
'hdfs_path'" then importing to another hive 0.10 instance using "IMPORT FROM 
'hdfs_path'", hive throws this error:


13/09/18 13:14:02 ERROR ql.Driver: FAILED: SemanticException Exception while 
processing
org.apache.hadoop.hive.ql.parse.SemanticException: Exception while processing
at 
org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:277)
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:459)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:349)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:938)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:347)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:706)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.NullPointerException
at java.util.ArrayList.(ArrayList.java:131)
at 
org.apache.hadoop.hive.ql.plan.CreateTableDesc.(CreateTableDesc.java:128)
at 
org.apache.hadoop.hive.ql.parse.ImportSemanticAnalyzer.analyzeInternal(ImportSemanticAnalyzer.java:99)
... 16 more

13/09/18 13:14:02 INFO ql.Driver: 
13/09/18 13:14:02 INFO ql.Driver: 
13/09/18 13:14:02 INFO ql.Driver: 
13/09/18 13:14:02 INFO ql.Driver: 
13/09/18 13:14:02 INFO ql.Driver: 


This is probably a critical blocker for people who are trying to test Hive 0.10 
in their staging environments prior to the upgrade from 0.9

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5310) commit futuama_episodes

2013-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771628#comment-13771628
 ] 

Hudson commented on HIVE-5310:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2340 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2340/])
HIVE-5310 futurama-episodes (ecapriolo: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524448)
* /hive/trunk/data/files/futurama_episodes.avro


> commit futuama_episodes
> ---
>
> Key: HIVE-5310
> URL: https://issues.apache.org/jira/browse/HIVE-5310
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> This is a small binary file that will be used for trevni. We can run the 
> pre-commit build if this is committed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5166) TestWebHCatE2e is failing intermittently on trunk

2013-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771627#comment-13771627
 ] 

Hudson commented on HIVE-5166:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2340 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2340/])
HIVE-5166 : TestWebHCatE2e is failing intermittently on trunk (Eugene Koifman 
via Ashutosh Chauhan) (hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524441)
* 
/hive/trunk/hcatalog/webhcat/svr/src/test/java/org/apache/hive/hcatalog/templeton/TestWebHCatE2e.java


> TestWebHCatE2e is failing intermittently on trunk
> -
>
> Key: HIVE-5166
> URL: https://issues.apache.org/jira/browse/HIVE-5166
> Project: Hive
>  Issue Type: Bug
>  Components: Tests, WebHCat
>Affects Versions: 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Eugene Koifman
> Fix For: 0.13.0
>
> Attachments: HIVE-5166.patch
>
>
> I observed these while running full test suite last couple of times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION by better way.

2013-09-18 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-5315:
-

Status: Open  (was: Patch Available)

> bin/hive should retrieve HADOOP_VERSION by better way.
> --
>
> Key: HIVE-5315
> URL: https://issues.apache.org/jira/browse/HIVE-5315
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Kousuke Saruta
> Fix For: 0.11.1
>
> Attachments: HIVE-5315.patch
>
>
> In current implementation, bin/hive retrieves HADOOP_VERSION like as follows
> {code}
> HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
> {code}
> But, sometimes, "hadoop version" doesn't show version information at the 
> first line.
> If HADOOP_VERSION is not retrieve collectly, Hive or related processes will 
> not be up.
> I faced this situation when I try to debug Hiveserver2 with debug option like 
> as follows 
> {code}
> -Xdebug -Xrunjdwp:trunsport=dt_socket,suspend=n,server=y,address=9876
> {code}
> Then, "hadoop version" shows -Xdebug... at the first line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION by better way.

2013-09-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771625#comment-13771625
 ] 

Hive QA commented on HIVE-5315:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603915/HIVE-5315.patch

{color:red}ERROR:{color} -1 due to 91 failed/errored test(s), 129 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables
org.apache.hive.beeline.src.test.TestBeeLineWithArgs.testPositiveScriptFile
org.apache.hive.jdbc.TestJdbcDriver2.testBadURL
org.apache.hive.jdbc.TestJdbcDriver2.testBuiltInUDFCol
org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes
org.apache.hive.jdbc.TestJdbcDriver2.testDataTypes2
org.apache.hive.jdbc.TestJdbcDriver2.testDatabaseMetaData
org.apache.hive.jdbc.TestJdbcDriver2.testDescribeTable
org.apache.hive.jdbc.TestJdbcDriver2.testDriverProperties
org.apache.hive.jdbc.TestJdbcDriver2.testDuplicateColumnNameOrder
org.apache.hive.jdbc.TestJdbcDriver2.testErrorDiag
org.apache.hive.jdbc.TestJdbcDriver2.testErrorMessages
org.apache.hive.jdbc.TestJdbcDriver2.testExecutePreparedStatement
org.apache.hive.jdbc.TestJdbcDriver2.testExecuteQueryException
org.apache.hive.jdbc.TestJdbcDriver2.testExplainStmt
org.apache.hive.jdbc.TestJdbcDriver2.testExprCol
org.apache.hive.jdbc.TestJdbcDriver2.testImportedKeys

[jira] [Commented] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-18 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771598#comment-13771598
 ] 

Carl Steinbach commented on HIVE-5317:
--

Will these features place any limitations on which storage formats you can use? 
Also, I don't think it's possible to support ACID guarantees and HCatalog (i.e. 
file permission based authorization) simultaneously on top of the same Hive 
warehouse. Is there a plan in place for fixing that?

> Implement insert, update, and delete in Hive with full ACID support
> ---
>
> Key: HIVE-5317
> URL: https://issues.apache.org/jira/browse/HIVE-5317
> Project: Hive
>  Issue Type: New Feature
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> Many customers want to be able to insert, update and delete rows from Hive 
> tables with full ACID support. The use cases are varied, but the form of the 
> queries that should be supported are:
> * INSERT INTO tbl SELECT …
> * INSERT INTO tbl VALUES ...
> * UPDATE tbl SET … WHERE …
> * DELETE FROM tbl WHERE …
> * MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN 
> ...
> * SET TRANSACTION LEVEL …
> * BEGIN/END TRANSACTION
> Use Cases
> * Once an hour, a set of inserts and updates (up to 500k rows) for various 
> dimension tables (eg. customer, inventory, stores) needs to be processed. The 
> dimension tables have primary keys and are typically bucketed and sorted on 
> those keys.
> * Once a day a small set (up to 100k rows) of records need to be deleted for 
> regulatory compliance.
> * Once an hour a log of transactions is exported from a RDBS and the fact 
> tables need to be updated (up to 1m rows)  to reflect the new data. The 
> transactions are a combination of inserts, updates, and deletes. The table is 
> partitioned and bucketed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5283) Merge vectorization branch to trunk

2013-09-18 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771592#comment-13771592
 ] 

Carl Steinbach commented on HIVE-5283:
--

[~anthony.murphy] Thanks for the very detailed explanation. Clearly a lot of 
thought has gone into writing these tests. Now we just need to find a way of 
conveying this information to people down the road.

I recommend doing the following: concatenate all of these tests into a single 
qfile named vectorization_short_regress.q and include the information from 
above in a comment at the top of the file. It would also be great if you could 
include a short comment per query so folks have an easy way of telling them 
apart.

> Merge vectorization branch to trunk
> ---
>
> Key: HIVE-5283
> URL: https://issues.apache.org/jira/browse/HIVE-5283
> Project: Hive
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5283.1.patch, HIVE-5283.2.patch
>
>
> The purpose of this jira is to upload vectorization patch, run tests etc. The 
> actual work will continue under HIVE-4160 umbrella jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 14130: Merge vectorization branch to trunk

2013-09-18 Thread Carl Steinbach


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> >
> 
> Jitendra Pandey wrote:
> I have filed jiras HIVE-5308, HIVE-5309, HIVE-5314 to address some of the 
> review comments. I will upload an updated patch as soon as those jiras are 
> committed. The individual comments are answered inline.

Thanks for addressing these issues.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt, 
> > line 1
> > 
> >
> > We currently use Apache Velocity to generate test code at compile-time 
> > (e.g. TestCliDriver, ...). I realize that the templating code in CodeGen 
> > and TestCodeGen is pretty simple, but was wondering if it might be better 
> > from a build and maintenance standpoint to use Velocity instead.
> > 
> > Also, is it possible to select a less generic file suffix for the 
> > template files, e.g. *.t or *.tmpl?
> 
> Jitendra Pandey wrote:
>   I have added a patch on HIVE-5308, that removes the generated code for 
> both expressions and tests and instead the code is generated as an ant-task 
> during the build. 
>   The patch doesn't however uses Velocity, do you think its ok to move to 
> velocity as a follow up task post merge? I will take care of template file 
> suffixes as part of move to velocity because I believe velocity requires its 
> own suffix.

I haven't looked at it closely enough to be able to say that using Velocity is 
a superior approach. I think it's something worth investigating in the future, 
but agree with you that there's no need to do it right away.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt, 
> > line 27
> > 
> >
> > In addition to the name (and preferably path) of the template I think 
> > this comment should also include the name and path of the code generator, 
> > and a warning that it should not be modified by hand.
> 
> Jitendra Pandey wrote:
> The generated code will not be committed anymore. The code will be 
> generated in the build directory and a clean will remove it.

Awesome!


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > ql/src/gen/vectorization/org/apache/hadoop/hive/ql/exec/vector/gen/TestCodeGen.java,
> >  line 1
> > 
> >
> > Maybe this should go in ql/src/test/gen. Thoughts?
> 
> Jitendra Pandey wrote:
> I have renamed this file to GenVectorTestCode and moved it to 
> ant/src/org/apache/hadoop/hive/ant/ so that it runs along with other 
> ant-tasks. Is that ok? The jira with this change is HIVE-5308.

Sounds good.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/vectorization_0.q, line 20
> > 
> >
> > What is the expected behavior when vectorized.execution=enabled and the 
> > source table is not reading ORC formatted data? I think it's worth adding 
> > some additional tests (either positive or negative) to lock down this 
> > behavior.
> 
> Jitendra Pandey wrote:
> Any query that vectorization cannot execute must fall back to non-vector 
> mode. If the input format doesn't provide vectorized input the query must 
> still succeed. I have added a test in my patch on HIVE-5309 that runs a query 
> on rc-file table with vectorization turned on. The query executes 
> successfully.

Sounds good.


- Carl


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14130/#review26119
---


On Sept. 13, 2013, 5:51 p.m., Jitendra Pandey wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14130/
> ---
> 
> (Updated Sept. 13, 2013, 5:51 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-5283
> https://issues.apache.org/jira/browse/HIVE-5283
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Merge vectorization branch to trunk.
> 
> 
> Diffs
> -
> 
>   .gitignore c0e9b3c 
>   build-common.xml ee219a9 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c5a8ff3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
> 15a2a81 
>   ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticScalar.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/ColumnCompareScalar.txt 
> PRE-CREATION 
>   ql/src/gen/vectori

[jira] [Created] (HIVE-5317) Implement insert, update, and delete in Hive with full ACID support

2013-09-18 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-5317:
---

 Summary: Implement insert, update, and delete in Hive with full 
ACID support
 Key: HIVE-5317
 URL: https://issues.apache.org/jira/browse/HIVE-5317
 Project: Hive
  Issue Type: New Feature
Reporter: Owen O'Malley
Assignee: Owen O'Malley


Many customers want to be able to insert, update and delete rows from Hive 
tables with full ACID support. The use cases are varied, but the form of the 
queries that should be supported are:

* INSERT INTO tbl SELECT …
* INSERT INTO tbl VALUES ...
* UPDATE tbl SET … WHERE …
* DELETE FROM tbl WHERE …
* MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ...
* SET TRANSACTION LEVEL …
* BEGIN/END TRANSACTION

Use Cases

* Once an hour, a set of inserts and updates (up to 500k rows) for various 
dimension tables (eg. customer, inventory, stores) needs to be processed. The 
dimension tables have primary keys and are typically bucketed and sorted on 
those keys.
* Once a day a small set (up to 100k rows) of records need to be deleted for 
regulatory compliance.
* Once an hour a log of transactions is exported from a RDBS and the fact 
tables need to be updated (up to 1m rows)  to reflect the new data. The 
transactions are a combination of inserts, updates, and deletes. The table is 
partitioned and bucketed.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION by better way.

2013-09-18 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-5315:
-

Summary: bin/hive should retrieve HADOOP_VERSION by better way.  (was: 
bin/hive should retrieve HADOOP_VERSION better way.)

> bin/hive should retrieve HADOOP_VERSION by better way.
> --
>
> Key: HIVE-5315
> URL: https://issues.apache.org/jira/browse/HIVE-5315
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Kousuke Saruta
> Fix For: 0.11.1
>
> Attachments: HIVE-5315.patch
>
>
> In current implementation, bin/hive retrieves HADOOP_VERSION like as follows
> {code}
> HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
> {code}
> But, sometimes, "hadoop version" doesn't show version information at the 
> first line.
> If HADOOP_VERSION is not retrieve collectly, Hive or related processes will 
> not be up.
> I faced this situation when I try to debug Hiveserver2 with debug option like 
> as follows 
> {code}
> -Xdebug -Xrunjdwp:trunsport=dt_socket,suspend=n,server=y,address=9876
> {code}
> Then, "hadoop version" shows -Xdebug... at the first line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5309) Update hive-default.xml.template for vectorization flag; remove unused imports from MetaStoreUtils.java

2013-09-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771575#comment-13771575
 ] 

Hive QA commented on HIVE-5309:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603954/HIVE-5309.1-vectorization.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 3955 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_plan_json
org.apache.hcatalog.cli.TestPermsGrp.testCustomPerms
org.apache.hive.hcatalog.mapreduce.TestHCatExternalPartitioned.testHCatPartitionedTable
org.apache.hive.hcatalog.mapreduce.TestHCatHiveCompatibility.testPartedRead
org.apache.hive.hcatalog.mapreduce.TestHCatHiveCompatibility.testUnpartedReadWrite
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/815/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/815/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

> Update hive-default.xml.template for vectorization flag; remove unused 
> imports from MetaStoreUtils.java
> ---
>
> Key: HIVE-5309
> URL: https://issues.apache.org/jira/browse/HIVE-5309
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: vectorization-branch
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5309.1-vectorization.patch, 
> HIVE-5309.1.vectorization.patch
>
>
> This jira provides fixes for some of the review comments on HIVE-5283.
> 1) Update hive-default.xml.template for vectorization flag.
> 2) remove unused imports from MetaStoreUtils.
> 3) Add a test to run vectorization with non-orc format. The test must still 
> pass because vectorization optimization should fall back to non-vector mode.
> 4) Hardcode the table name in QTestUtil.java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5209) JDBC support for varchar

2013-09-18 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771566#comment-13771566
 ] 

Phabricator commented on HIVE-5209:
---

jdere has commented on the revision "HIVE-5209 [jira] JDBC support for varchar".

INLINE COMMENTS
  jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveBaseResultSet.java:57 Yeah, 
removed this from patch v4.
  jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java:32 Sure, I can 
remove this import as it appears that it is not being used here. However, 
TableSchema calls ColumnDescriptor which calls TypeDescriptor, which also uses 
quite a bit of serde classes (as well as TypeQualifiers).  So there still might 
be an issue here .. perhaps I can pull the serde-specific code out of 
TypeDescriptor/TypeQualifiers, and move them to some utility class in the 
service module.

  It looks like I would have to move the following methods into a separate 
utility class:
  - TypeQualifiers.fromBaseTypeParams(BaseTypeParams)
  - TypeDescriptor.TypeDescriptor(String
  - ColumnDescriptor.ColumnDescriptor(FieldSchema, int)
  - TableSchema.TableSchema(List)
  - TableSchema(Schema)
  jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java:157 Yes, will add 
the column attributes here.
  jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java:240 sure, will 
change in next patch
  jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java:36 Ok, will have 
ColumnAttributes as its own class.
  jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java:181 will change in next 
patch
  jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java:187 will change in next 
patch
  service/src/java/org/apache/hive/service/cli/ColumnDescriptor.java:53 ok, 
will change in next patch
  service/src/java/org/apache/hive/service/cli/TypeQualifiers.java:24 see 
previous reply about usage of serde classes
  service/src/java/org/apache/hive/service/cli/TypeQualifiers.java:29 will add 
to next patch
  service/if/TCLIService.thrift:44 added this change in patch v4

REVISION DETAIL
  https://reviews.facebook.net/D12999

To: JIRA, jdere
Cc: cwsteinbach, thejas


> JDBC support for varchar
> 
>
> Key: HIVE-5209
> URL: https://issues.apache.org/jira/browse/HIVE-5209
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC, Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: D12999.1.patch, HIVE-5209.1.patch, HIVE-5209.2.patch, 
> HIVE-5209.4.patch, HIVE-5209.D12705.1.patch
>
>
> Support returning varchar length in result set metadata

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: How long will we support Hadoop 0.20.2?

2013-09-18 Thread Owen O'Malley
On Wed, Sep 18, 2013 at 5:36 PM, Edward Capriolo wrote:

> "Unless you are signing up to test every release on 0.20.2 we don't have
> anyone doing the relevant testing"
>
> We spend much more time dealing with windows issues. Unless someone signs
> up to test and certify windows should we drop that as well?
>

Actually, Hortonworks does test the Hive releases on Windows with both unit
and system tests. There is also a big difference between supporting an
operating system and supporting all versions of an library.

Just to be clear, we made the Hadoop 0.20 release back in April 2009 more
than 4 years ago, which is more than 50% of Hadoop's entire lifetime. In
fact, it was before Hive 0.3 was released. Hadoop 0.20 is very old and the
discussion over how long to support it is well overdue.

-- Owen


[jira] [Updated] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)

2013-09-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5198:


Fix Version/s: (was: 0.13.0)

> WebHCat returns exitcode 143 (w/o an explanation)
> -
>
> Key: HIVE-5198
> URL: https://issues.apache.org/jira/browse/HIVE-5198
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.11.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 0.12.0
>
> Attachments: HIVE-5198.patch
>
>
> The message might look like this:
> {"statement":"use default; show table extended like xyz;","error":"unable to 
> show table: xyz","exec":{"stdout":"","stderr":"","exitcode":143}} 
> WebHCat has a templeton.exec.timeout property which kills an HCat request 
> (i.e. something like a DDL statement that gets routed to HCat CLI) if it 
> takes longer than this timeout.
> Since WebHCat does a fork/exec to 'hcat' script, the timeout is implemented 
> as SIGTERM sent to the subprocess.  SIGTERM value is 15.  So it's reported as 
> 128 + 15 = 143.
> Error logging/reporting should be improved in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5198) WebHCat returns exitcode 143 (w/o an explanation)

2013-09-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5198:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed to trunk and 0.12 branch. Thanks Eugene for the contribution!


> WebHCat returns exitcode 143 (w/o an explanation)
> -
>
> Key: HIVE-5198
> URL: https://issues.apache.org/jira/browse/HIVE-5198
> Project: Hive
>  Issue Type: Bug
>  Components: WebHCat
>Affects Versions: 0.11.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 0.12.0
>
> Attachments: HIVE-5198.patch
>
>
> The message might look like this:
> {"statement":"use default; show table extended like xyz;","error":"unable to 
> show table: xyz","exec":{"stdout":"","stderr":"","exitcode":143}} 
> WebHCat has a templeton.exec.timeout property which kills an HCat request 
> (i.e. something like a DDL statement that gets routed to HCat CLI) if it 
> takes longer than this timeout.
> Since WebHCat does a fork/exec to 'hcat' script, the timeout is implemented 
> as SIGTERM sent to the subprocess.  SIGTERM value is 15.  So it's reported as 
> 128 + 15 = 143.
> Error logging/reporting should be improved in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-18 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4113:
---

Status: Patch Available  (was: Open)

> Optimize select count(1) with RCFile and Orc
> 
>
> Key: HIVE-4113
> URL: https://issues.apache.org/jira/browse/HIVE-4113
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Gopal V
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
> HIVE-4113.patch, HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>   } else {
> // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
> // skip all columns, this should be distinguished from the case:
> // select * from tt;
> for (int i = 0; i < skippedColIDs.length; i++) {
>   skippedColIDs[i] = false;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5122) Add partition for multiple partition ignores locations for non-first partitions

2013-09-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5122:


Status: Open  (was: Patch Available)

> Add partition for multiple partition ignores locations for non-first 
> partitions
> ---
>
> Key: HIVE-5122
> URL: https://issues.apache.org/jira/browse/HIVE-5122
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: D12411.3.patch, D12411.4.patch, 
> HIVE-5122.D12411.1.patch, HIVE-5122.D12411.2.patch
>
>
> http://www.mail-archive.com/user@hive.apache.org/msg09151.html
> When multiple partitions are being added in single alter table statement, the 
> location for first partition is being used as the location of all partitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5122) Add partition for multiple partition ignores locations for non-first partitions

2013-09-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5122:


Status: Patch Available  (was: Open)

Resubmitting patch for kicking off precommit tests on updated patch.


> Add partition for multiple partition ignores locations for non-first 
> partitions
> ---
>
> Key: HIVE-5122
> URL: https://issues.apache.org/jira/browse/HIVE-5122
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: D12411.3.patch, D12411.4.patch, 
> HIVE-5122.D12411.1.patch, HIVE-5122.D12411.2.patch
>
>
> http://www.mail-archive.com/user@hive.apache.org/msg09151.html
> When multiple partitions are being added in single alter table statement, the 
> location for first partition is being used as the location of all partitions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-18 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4113:
---

Attachment: HIVE-4113.2.patch

> Optimize select count(1) with RCFile and Orc
> 
>
> Key: HIVE-4113
> URL: https://issues.apache.org/jira/browse/HIVE-4113
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Gopal V
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.2.patch, 
> HIVE-4113.patch, HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>   } else {
> // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
> // skip all columns, this should be distinguished from the case:
> // select * from tt;
> for (int i = 0; i < skippedColIDs.length; i++) {
>   skippedColIDs[i] = false;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class

2013-09-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771547#comment-13771547
 ] 

Hive QA commented on HIVE-5306:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603955/HIVE-5306.4.patch

{color:red}ERROR:{color} -1 due to 141 failed/errored test(s), 1175 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables
org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan3
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan4
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan5
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan6
org.apache.hadoop.hive.ql.io.TestSymlinkTextInputFormat.testCombine
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_ambiguous_join_col
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_duplicate_alias
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_garbage
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_insert_wrong_number_columns
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_create_table
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_dot
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_function_param2
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_index
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_list_index
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_list_index2
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_map_index
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_map_index2
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_invalid_select
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_macro_reserved_word
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_missing_overwrite
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_nonkey_groupby
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_quoted_string
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unknown_column1
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unknown_column2
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unknown_column3
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unknown_column4
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unknown_column5
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unknown_column6
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unknown_function1
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unknown_function2
org.apache.hadoop.hive.ql.parse.TestParseNegative.testParseNegative_unkno

Re: Review Request 14221: HIVE-4113: Optimize select count(1) with RCFile and Orc

2013-09-18 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14221/
---

(Updated Sept. 19, 2013, 2:55 a.m.)


Review request for hive.


Changes
---

1: Removed the flag of column pruning. 2: Set READ_ALL_COLUMNS_DEFAULT to true.


Bugs: HIVE-4113
https://issues.apache.org/jira/browse/HIVE-4113


Repository: hive-git


Description
---

Modifies ColumnProjectionUtils such there are two flags. One for the column ids 
and one indicating whether all columns should be read. Additionally the patch 
updates all locations which uses the old method of empty string indicating all 
columns should be read.

The automatic formatter generated by ant eclipse-files is fairly aggressive so 
there are some unrelated import/whitespace cleanup.

This one is based on https://reviews.apache.org/r/11770/ and has been rebased 
to the latest trunk.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9f37d0c 
  conf/hive-default.xml.template 545026d 
  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 766056b 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
 553446a 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatRecordReader.java
 3ee6157 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
 1980ef5 
  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitioned.java
 577e06d 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java
 d38bb8d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java ab0494e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java a5a8943 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 0f29a0e 
  ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 
49145b7 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cccdc1b 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java a83f223 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 9521060 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 50c5093 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java
 cbdc2db 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java 
ed14e82 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java b97d869 
  ql/src/test/org/apache/hadoop/hive/ql/io/PerformTestRCFileAndSeqFile.java 
fb9fca1 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java dd1276d 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
83c5c38 
  serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 
0b3ef7b 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 
11f5f07 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 
1335446 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java 
e1270cc 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java
 b717278 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java
 0317024 
  serde/src/test/org/apache/hadoop/hive/serde2/TestColumnProjectionUtils.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/14221/diff/


Testing
---


Thanks,

Yin Huai



[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class

2013-09-18 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771539#comment-13771539
 ] 

Jason Dere commented on HIVE-5306:
--

Looking at latest patch:

1. Actually you may want to consider making inputConverter transient - it 
contains object inspectors and originally these weren't serializable, I had run 
into issues with UDFs attempting to serialize object inspectors when they added 
Kyro serialization a couple weeks back.  Though I think they may have fixed 
that by making object inspectors serializable. In any case I think that field 
would be overwritten since if the UDF is serialized as part of a query plan and 
then deserialized, initialize() would be called again on the newly deserialized 
UDF. 

2. How about float/string? I think those can be converted to double. 

3. Looks like initialize()/evaluate() have some code commented out - those can 
be removed

4. I think those lines with getConverter() probably break the 100-character 
rule in the Hive coding conventions, break that line up into multiple lines.

> Use new GenericUDF instead of basic UDF for UDFAbs class
> 
>
> Key: HIVE-5306
> URL: https://issues.apache.org/jira/browse/HIVE-5306
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5306.1.patch, HIVE-5306.2.patch, HIVE-5306.3.patch, 
> HIVE-5306.4.patch
>
>
> GenericUDF class is the latest  and recommended base class for any UDFs.
> This JIRA is to change the current UDFAbs class extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as 
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
>  * accept arguments of complex types, and return complex types. 2. It can 
> accept
>  * variable length of arguments. 3. It can accept an infinite number of 
> function
>  * signature - for example, it's easy to write a GenericUDF that accepts
>  * array, array> and so on (arbitrary levels of nesting). 4. 
> It
>  * can do short-circuit evaluations using DeferedObject."  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771536#comment-13771536
 ] 

Thejas M Nair commented on HIVE-5313:
-

Patch committed to 0.12 branch

> HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
> -
>
> Key: HIVE-5313
> URL: https://issues.apache.org/jira/browse/HIVE-5313
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.12.0
>
> Attachments: HIVE-5313.patch
>
>
> As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
> to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5313:


Fix Version/s: (was: 0.13.0)
   0.12.0

> HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
> -
>
> Key: HIVE-5313
> URL: https://issues.apache.org/jira/browse/HIVE-5313
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.12.0
>
> Attachments: HIVE-5313.patch
>
>
> As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
> to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4487:


Fix Version/s: (was: 0.13.0)
   0.12.0

> Hive does not set explicit permissions on hive.exec.scratchdir
> --
>
> Key: HIVE-4487
> URL: https://issues.apache.org/jira/browse/HIVE-4487
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Joey Echeverria
>Assignee: Chaoyu Tang
> Fix For: 0.12.0
>
> Attachments: HIVE-4487.patch
>
>
> The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
> creates this directory it doesn't set any explicit permission on it. This 
> means if you have the default HDFS umask setting of 022, then these 
> directories end up being world readable. These permissions also get applied 
> to the staging directories and their files, thus leaving inter-stage data 
> world readable.
> This can cause a potential leak of data especially when operating on a 
> Kerberos enabled cluster. Hive should probably default these directories to 
> only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771535#comment-13771535
 ] 

Thejas M Nair commented on HIVE-4487:
-

Patch committed to 0.12 branch


> Hive does not set explicit permissions on hive.exec.scratchdir
> --
>
> Key: HIVE-4487
> URL: https://issues.apache.org/jira/browse/HIVE-4487
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Joey Echeverria
>Assignee: Chaoyu Tang
> Fix For: 0.12.0
>
> Attachments: HIVE-4487.patch
>
>
> The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
> creates this directory it doesn't set any explicit permission on it. This 
> means if you have the default HDFS umask setting of 022, then these 
> directories end up being world readable. These permissions also get applied 
> to the staging directories and their files, thus leaving inter-stage data 
> world readable.
> This can cause a potential leak of data especially when operating on a 
> Kerberos enabled cluster. Hive should probably default these directories to 
> only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.

2013-09-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771533#comment-13771533
 ] 

Thejas M Nair commented on HIVE-5285:
-

Patch committed to 0.12 branch.

> Custom SerDes throw cast exception when there are complex nested structures 
> containing NonSettableObjectInspectors.
> ---
>
> Key: HIVE-5285
> URL: https://issues.apache.org/jira/browse/HIVE-5285
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Fix For: 0.12.0
>
> Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt
>
>
> The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is 
> incomplete. Consider a complex nested structure containing the following 
> object inspector hierarchy:
> SettableStructObjectInspector
> {
>   ListObjectInspector
> }
> In the above case, the cast exception can happen via 
> MapOperator/FetchOperator as below:
> java.io.IOException: java.lang.ClassCastException: 
> com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> Caused by: java.lang.ClassCastException: 
> com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
> cast to 
> org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.(ObjectInspectorConverters.java:294)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529)
> ... 13 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5267) Use array instead of Collections if possible in DemuxOperator

2013-09-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5267:


Fix Version/s: (was: 0.13.0)
   0.12.0

> Use array instead of Collections if possible in DemuxOperator
> -
>
> Key: HIVE-5267
> URL: https://issues.apache.org/jira/browse/HIVE-5267
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Fix For: 0.12.0
>
> Attachments: HIVE-5267.D12867.1.patch, HIVE-5267.patch
>
>
> DemuxOperator accesses Maps twice+ for each row, which can be replaced by 
> array.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5285) Custom SerDes throw cast exception when there are complex nested structures containing NonSettableObjectInspectors.

2013-09-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5285:


Fix Version/s: (was: 0.13.0)
   0.12.0

> Custom SerDes throw cast exception when there are complex nested structures 
> containing NonSettableObjectInspectors.
> ---
>
> Key: HIVE-5285
> URL: https://issues.apache.org/jira/browse/HIVE-5285
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Fix For: 0.12.0
>
> Attachments: HIVE-5285.1.patch.txt, HIVE-5285.2.patch.txt
>
>
> The approach for HIVE-5199 fix is correct.However, the fix for HIVE-5199 is 
> incomplete. Consider a complex nested structure containing the following 
> object inspector hierarchy:
> SettableStructObjectInspector
> {
>   ListObjectInspector
> }
> In the above case, the cast exception can happen via 
> MapOperator/FetchOperator as below:
> java.io.IOException: java.lang.ClassCastException: 
> com.skype.data.hadoop.hive.proto.CustomObjectInspector cannot be cast to 
> org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:545)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> Caused by: java.lang.ClassCastException: 
> com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
> cast to 
> org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.(ObjectInspectorConverters.java:294)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$ListConverter.convert(ObjectInspectorConverters.java:251)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.convert(ObjectInspectorConverters.java:316)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:529)
> ... 13 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5267) Use array instead of Collections if possible in DemuxOperator

2013-09-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771534#comment-13771534
 ] 

Thejas M Nair commented on HIVE-5267:
-

Patch committed to 0.12 branch.


> Use array instead of Collections if possible in DemuxOperator
> -
>
> Key: HIVE-5267
> URL: https://issues.apache.org/jira/browse/HIVE-5267
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Navis
>Priority: Trivial
> Fix For: 0.13.0
>
> Attachments: HIVE-5267.D12867.1.patch, HIVE-5267.patch
>
>
> DemuxOperator accesses Maps twice+ for each row, which can be replaced by 
> array.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771530#comment-13771530
 ] 

Yin Huai commented on HIVE-4113:


Three issues:
# ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR is only used in Hive. Seems 
HCatalog does not set it. So, seems when accessing ORC through HCatalog, we 
cannot do predicate pushdown.
# neededColumnIDs in TableScanOperator can be null when column pruning is 
disabled. In this case, seems we can see NPE in 
ColumnAccessAnalyzer.analyzeColumnAccess. Also, when column pruning is 
disabled, we cannot do predicate pushdown in Hive, because neededColumnIDs will 
be null when column pruning is disabled.
# With this change, we will assume that an empty neededColumnIDs means no 
needed column. Either ColumnProjectionUtils.READ_ALL_COLUMNS=true or 
READ_COLUMN_IDS_CONF_STR having all columns can represent selecting all columns.

I will make two changes.
# Remove the flag of column pruning.
# Set READ_ALL_COLUMNS_DEFAULT to true. So, if users of hcatalog do not use 
ColumnProjectionUtils, we can select all columns for them. If we use false for 
READ_ALL_COLUMNS_DEFAULT, users have to use ColumnProjectionUtils. Otherwise, 
no column will be selected.

> Optimize select count(1) with RCFile and Orc
> 
>
> Key: HIVE-4113
> URL: https://issues.apache.org/jira/browse/HIVE-4113
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Gopal V
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.patch, 
> HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>   } else {
> // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
> // skip all columns, this should be distinguished from the case:
> // select * from tt;
> for (int i = 0; i < skippedColIDs.length; i++) {
>   skippedColIDs[i] = false;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5199) Custom SerDe containing a nonSettable complex data type row object inspector throws cast exception with HIVE 0.11

2013-09-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771525#comment-13771525
 ] 

Thejas M Nair commented on HIVE-5199:
-

Patch committed to 0.12 branch.

> Custom SerDe containing a nonSettable complex data type row object inspector 
> throws cast exception with HIVE 0.11
> -
>
> Key: HIVE-5199
> URL: https://issues.apache.org/jira/browse/HIVE-5199
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Fix For: 0.12.0
>
> Attachments: 0001-HIVE-5199-0.12-branch.patch, HIVE-5199.2.patch.txt, 
> HIVE-5199.3.patch.txt, HIVE-5199.patch.4.txt, HIVE-5199.patch.txt
>
>
> The issue happens because of the changes in HIVE-3833.
> Consider a partitioned table with different custom serdes for the partition 
> and tables. The serde at table level, say, customSerDe1's object inspector is 
> of settableDataType where as the serde at partition level, say, 
> customSerDe2's object inspector is of nonSettableDataType. The current 
> implementation introduced by HIVE-3833 does not convert nested Complex Data 
> Types which extend nonSettableObjectInspector to a settableObjectInspector 
> type inside ObjectInspectorConverters.getConvertedOI(). However, it tries to 
> typecast the nonSettableObjectInspector to a settableObjectInspector inside  
> ObjectInspectorConverters.getConverter(ObjectInspector inputOI, 
> ObjectInspector outputOI).
> The attached patch HIVE-5199.2.patch.txt contains a stand-alone test case.
> The below exception can happen via FetchOperator as well as MapOperator. 
> For example, consider the FetchOperator.
> Inside FetchOperator consider the following call:
> getRecordReader()->ObjectInspectorConverters. getConverter()
> The stack trace as follows:
> 2013-08-28 17:57:25,307 ERROR CliDriver (SessionState.java:printError(432)) - 
> Failed with exception java.io.IOException:java.lang.ClassCastException: 
> com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
> cast to 
> org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
> java.io.IOException: java.lang.ClassCastException: 
> com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
> cast to 
> org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> Caused by: java.lang.ClassCastException: 
> com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
> cast to 
> org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.(ObjectInspectorConverters.java:307)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:406)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5199) Custom SerDe containing a nonSettable complex data type row object inspector throws cast exception with HIVE 0.11

2013-09-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5199:


Fix Version/s: (was: 0.13.0)
   0.12.0

> Custom SerDe containing a nonSettable complex data type row object inspector 
> throws cast exception with HIVE 0.11
> -
>
> Key: HIVE-5199
> URL: https://issues.apache.org/jira/browse/HIVE-5199
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Fix For: 0.12.0
>
> Attachments: 0001-HIVE-5199-0.12-branch.patch, HIVE-5199.2.patch.txt, 
> HIVE-5199.3.patch.txt, HIVE-5199.patch.4.txt, HIVE-5199.patch.txt
>
>
> The issue happens because of the changes in HIVE-3833.
> Consider a partitioned table with different custom serdes for the partition 
> and tables. The serde at table level, say, customSerDe1's object inspector is 
> of settableDataType where as the serde at partition level, say, 
> customSerDe2's object inspector is of nonSettableDataType. The current 
> implementation introduced by HIVE-3833 does not convert nested Complex Data 
> Types which extend nonSettableObjectInspector to a settableObjectInspector 
> type inside ObjectInspectorConverters.getConvertedOI(). However, it tries to 
> typecast the nonSettableObjectInspector to a settableObjectInspector inside  
> ObjectInspectorConverters.getConverter(ObjectInspector inputOI, 
> ObjectInspector outputOI).
> The attached patch HIVE-5199.2.patch.txt contains a stand-alone test case.
> The below exception can happen via FetchOperator as well as MapOperator. 
> For example, consider the FetchOperator.
> Inside FetchOperator consider the following call:
> getRecordReader()->ObjectInspectorConverters. getConverter()
> The stack trace as follows:
> 2013-08-28 17:57:25,307 ERROR CliDriver (SessionState.java:printError(432)) - 
> Failed with exception java.io.IOException:java.lang.ClassCastException: 
> com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
> cast to 
> org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
> java.io.IOException: java.lang.ClassCastException: 
> com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
> cast to 
> org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:544)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:488)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1412)
> at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:756)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> Caused by: java.lang.ClassCastException: 
> com.skype.data.whaleshark.hadoop.hive.proto.ProtoMapObjectInspector cannot be 
> cast to 
> org.apache.hadoop.hive.serde2.objectinspector.SettableMapObjectInspector
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:144)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.(ObjectInspectorConverters.java:307)
> at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:138)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:406)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: How long will we support Hadoop 0.20.2?

2013-09-18 Thread Xuefu Zhang
Even not for production, I think 0.20.2 is very useful for development as
well. it's simple and easy to set up, avoiding a lot of hassles that we
don't have to deal with during development. Thus, I think it makes sense to
keep supporting it, especially when there isn't much cost involved.

--Xuefu


On Wed, Sep 18, 2013 at 7:06 PM, Ashish Thusoo  wrote:

> +1 on what Ed said.
>
> I think 0.20.2 is still very real. Would be a bummer if we do not support
> it as a lot of companies are still on that version.
>
> Ashish
>
> Ashish Thusoo 
> CEO and Co-founder,
> Qubole  - a cloud based service that makes big data
> easy for analysts and data engineers
>
>
>
> On Wed, Sep 18, 2013 at 5:57 PM, Edward Capriolo  >wrote:
>
> > BTW: I am very likely to install hive 0.12 on hadoop 0.20.2 clusters. I
> > have been running hive since version 0.2. I have been running hadoop
> since
> > version 0.17.2. After 0.17.2 I moved to 0.20.2. Since then the hadoop has
> > seemingly has 10's of releases. 0.21, 0.21.append (Dead on arrival) .
> > cloudera this, cloudera that, yahoo hadoop distribution (dead on
> arrival),
> > 0.20.2.203 0.20.2.205, 1? 2.0? 2.1. None of them really have much shelf
> > life or a very clear upgrade path.
> >
> > The only thing that has remained constant for our environment is hive and
> > hadoop 0.20.2. I have been happily just upgrading hive on these clusters
> > for years now.
> >
> > So in a nutshell, I'm a long time committer, and I actively support and
> > develop hive on hadoop 0.20.2 clusters, I do not see supporting the shims
> > as complicated or difficult.
> >
> >
> >
> > On Wed, Sep 18, 2013 at 7:02 PM, Owen O'Malley 
> wrote:
> >
> > > On Wed, Sep 18, 2013 at 1:54 PM, Edward Capriolo <
> edlinuxg...@gmail.com
> > > >wrote:
> > >
> > > > I am not fine with dropping it. I still run it in several places.
> > > >
> > >
> > > The question is not whether you run Hadoop 0.20.2, but whether you are
> > > likely to install Hive 0.12 on those very old clusters.
> > >
> > >
> > > >
> > > > Believe it or now many people still run 0.20.2. I believe (correct me
> > If
> > > I
> > > > am wrong) facebook is still running a heavily patch 0.20.2.
> > > >
> > >
> > > It is more accurate to say that Facebook is running a fork of Hadoop
> > where
> > > the last common point was Hadoop 0.20.1. I haven't heard anyone (other
> > than
> > > you in this thread) say they are running 0.20.2 in years.
> > >
> > >
> > > > I could see dropping 0.20.2 if it was a huge burden but I do not see
> it
> > > > that way, it work's it is reliable, and it is a known quantity.
> > > >
> > >
> > > It is a large burden in that we have relatively complicated shims and a
> > > lack of testing. Unless you are signing up to test every release on
> > 0.20.2
> > > we don't have anyone doing the relevant testing.
> > >
> > > -- Owen
> > >
> >
>


Re: How long will we support Hadoop 0.20.2?

2013-09-18 Thread Ashish Thusoo
+1 on what Ed said.

I think 0.20.2 is still very real. Would be a bummer if we do not support
it as a lot of companies are still on that version.

Ashish

Ashish Thusoo 
CEO and Co-founder,
Qubole  - a cloud based service that makes big data
easy for analysts and data engineers



On Wed, Sep 18, 2013 at 5:57 PM, Edward Capriolo wrote:

> BTW: I am very likely to install hive 0.12 on hadoop 0.20.2 clusters. I
> have been running hive since version 0.2. I have been running hadoop since
> version 0.17.2. After 0.17.2 I moved to 0.20.2. Since then the hadoop has
> seemingly has 10's of releases. 0.21, 0.21.append (Dead on arrival) .
> cloudera this, cloudera that, yahoo hadoop distribution (dead on arrival),
> 0.20.2.203 0.20.2.205, 1? 2.0? 2.1. None of them really have much shelf
> life or a very clear upgrade path.
>
> The only thing that has remained constant for our environment is hive and
> hadoop 0.20.2. I have been happily just upgrading hive on these clusters
> for years now.
>
> So in a nutshell, I'm a long time committer, and I actively support and
> develop hive on hadoop 0.20.2 clusters, I do not see supporting the shims
> as complicated or difficult.
>
>
>
> On Wed, Sep 18, 2013 at 7:02 PM, Owen O'Malley  wrote:
>
> > On Wed, Sep 18, 2013 at 1:54 PM, Edward Capriolo  > >wrote:
> >
> > > I am not fine with dropping it. I still run it in several places.
> > >
> >
> > The question is not whether you run Hadoop 0.20.2, but whether you are
> > likely to install Hive 0.12 on those very old clusters.
> >
> >
> > >
> > > Believe it or now many people still run 0.20.2. I believe (correct me
> If
> > I
> > > am wrong) facebook is still running a heavily patch 0.20.2.
> > >
> >
> > It is more accurate to say that Facebook is running a fork of Hadoop
> where
> > the last common point was Hadoop 0.20.1. I haven't heard anyone (other
> than
> > you in this thread) say they are running 0.20.2 in years.
> >
> >
> > > I could see dropping 0.20.2 if it was a huge burden but I do not see it
> > > that way, it work's it is reliable, and it is a known quantity.
> > >
> >
> > It is a large burden in that we have relatively complicated shims and a
> > lack of testing. Unless you are signing up to test every release on
> 0.20.2
> > we don't have anyone doing the relevant testing.
> >
> > -- Owen
> >
>


[jira] [Updated] (HIVE-5316) cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe

2013-09-18 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-5316:


Fix Version/s: (was: 0.13.0)

> cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe
> --
>
> Key: HIVE-5316
> URL: https://issues.apache.org/jira/browse/HIVE-5316
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
>
> On https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#
> there are a few example that refers to avro serde name as 
> org.apache.hadoop.hive.serde2.AvroSerDe, but it should be 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe instead. This makes the examples 
> not executable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5202) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types.

2013-09-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771490#comment-13771490
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-5202:
-

Added RB link : https://reviews.apache.org/r/14224/

> Support for SettableUnionObjectInspector and implement 
> isSettable/hasAllFieldsSettable APIs for all data types.
> ---
>
> Key: HIVE-5202
> URL: https://issues.apache.org/jira/browse/HIVE-5202
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5202.2.patch.txt
>
>
> These 3 tasks should be accomplished as part of the following jira:
> 1. The current implementation lacks settable union object inspector. We can 
> run into exception inside ObjectInspectorConverters.getConvertedOI() if there 
> is a union.
> 2. Implement the following public functions for all datatypes: 
> isSettable()-> Perform shallow check to see if an object inspector is 
> inherited from settableOI type and 
> hasAllFieldsSettable() -> Perform deep check to see if this objectInspector 
> and all the underlying object inspectors are inherited from settableOI type.
> 3. ObjectInspectorConverters.getConvertedOI() is inefficient. Once (1) and 
> (2) are implemented, add the following check: outputOI.hasAllSettableFields() 
> should be added to return outputOI immediately if the object is entirely 
> settable in order to prevent redundant object instantiation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5202) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types.

2013-09-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-5202:


Status: Patch Available  (was: Open)

> Support for SettableUnionObjectInspector and implement 
> isSettable/hasAllFieldsSettable APIs for all data types.
> ---
>
> Key: HIVE-5202
> URL: https://issues.apache.org/jira/browse/HIVE-5202
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5202.2.patch.txt
>
>
> These 3 tasks should be accomplished as part of the following jira:
> 1. The current implementation lacks settable union object inspector. We can 
> run into exception inside ObjectInspectorConverters.getConvertedOI() if there 
> is a union.
> 2. Implement the following public functions for all datatypes: 
> isSettable()-> Perform shallow check to see if an object inspector is 
> inherited from settableOI type and 
> hasAllFieldsSettable() -> Perform deep check to see if this objectInspector 
> and all the underlying object inspectors are inherited from settableOI type.
> 3. ObjectInspectorConverters.getConvertedOI() is inefficient. Once (1) and 
> (2) are implemented, add the following check: outputOI.hasAllSettableFields() 
> should be added to return outputOI immediately if the object is entirely 
> settable in order to prevent redundant object instantiation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5202) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types.

2013-09-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-5202:


Attachment: HIVE-5202.2.patch.txt

> Support for SettableUnionObjectInspector and implement 
> isSettable/hasAllFieldsSettable APIs for all data types.
> ---
>
> Key: HIVE-5202
> URL: https://issues.apache.org/jira/browse/HIVE-5202
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-5202.2.patch.txt
>
>
> These 3 tasks should be accomplished as part of the following jira:
> 1. The current implementation lacks settable union object inspector. We can 
> run into exception inside ObjectInspectorConverters.getConvertedOI() if there 
> is a union.
> 2. Implement the following public functions for all datatypes: 
> isSettable()-> Perform shallow check to see if an object inspector is 
> inherited from settableOI type and 
> hasAllFieldsSettable() -> Perform deep check to see if this objectInspector 
> and all the underlying object inspectors are inherited from settableOI type.
> 3. ObjectInspectorConverters.getConvertedOI() is inefficient. Once (1) and 
> (2) are implemented, add the following check: outputOI.hasAllSettableFields() 
> should be added to return outputOI immediately if the object is entirely 
> settable in order to prevent redundant object instantiation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5072) [WebHCat]Enable directly invoke Sqoop job through Templeton

2013-09-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771483#comment-13771483
 ] 

Hive QA commented on HIVE-5072:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12598349/HIVE-5072.2.patch

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/811/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/811/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-811/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1524608.

At revision 1524608.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

> [WebHCat]Enable directly invoke Sqoop job through Templeton
> ---
>
> Key: HIVE-5072
> URL: https://issues.apache.org/jira/browse/HIVE-5072
> Project: Hive
>  Issue Type: Improvement
>  Components: HCatalog
>Affects Versions: 0.12.0
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
> Attachments: HIVE-5072.1.patch, HIVE-5072.2.patch, 
> Templeton-Sqoop-Action.pdf
>
>
> Now it is hard to invoke a Sqoop job through templeton. The only way is to 
> use the classpath jar generated by a sqoop job and use the jar delegator in 
> Templeton. We should implement Sqoop Delegator to enable directly invoke 
> Sqoop job through Templeton.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4755) Fix Templeton map-only tasks are getting killed after 10 minutes by MapReduce

2013-09-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771482#comment-13771482
 ] 

Hive QA commented on HIVE-4755:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12588499/HIVE-4755.1.patch

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/810/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/810/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-810/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1524608.

At revision 1524607.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

> Fix Templeton map-only tasks are getting killed after 10 minutes by MapReduce
> -
>
> Key: HIVE-4755
> URL: https://issues.apache.org/jira/browse/HIVE-4755
> Project: Hive
>  Issue Type: Bug
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
> Attachments: HIVE-4755.1.patch
>
>
> All mapreduce tasks are supposed to report progress back to MR framework, 
> otherwise, if there is no progress in 10 minutes (by default), TT will kill 
> the task.
> Templeton map-only task has a KeepAlive thread that is supposed to 
> periodically report the progress and keep the launcher alive, however this 
> does not seem to work as expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4760) Templenton occasional raises bad request (HTTP error 400) for queue/:jobid requests

2013-09-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771484#comment-13771484
 ] 

Hive QA commented on HIVE-4760:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12588681/HIVE-4760.1.patch

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/812/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/812/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-812/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1524608.

At revision 1524608.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

> Templenton occasional raises bad request (HTTP error 400) for queue/:jobid 
> requests
> ---
>
> Key: HIVE-4760
> URL: https://issues.apache.org/jira/browse/HIVE-4760
> Project: Hive
>  Issue Type: Bug
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
> Attachments: HIVE-4760.1.patch
>
>
> This issue randomly occurs.
> The repro we have found is to issue a job and then write a loop that calls 
> queue/:jobid over and over again until the job completes. At some point 
> depending on the timing, a bad request (jobid is not valid) - is raised.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4782) Templeton streaming bug fixes

2013-09-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771481#comment-13771481
 ] 

Hive QA commented on HIVE-4782:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12589451/HIVE-4782.1.patch

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/809/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/809/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests failed with: NonZeroExitCodeException: Command 'bash 
/data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and 
output '+ [[ -n '' ]]
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-Build-809/source-prep.txt
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java'
Reverted 'serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf build hcatalog/build hcatalog/core/build 
hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build 
hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build 
hcatalog/hcatalog-pig-adapter/build common/src/gen
+ svn update

Fetching external item into 'hcatalog/src/test/e2e/harness'
External at revision 1524607.

At revision 1524607.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0 to p2
+ exit 1
'
{noformat}

This message is automatically generated.

> Templeton streaming bug fixes
> -
>
> Key: HIVE-4782
> URL: https://issues.apache.org/jira/browse/HIVE-4782
> Project: Hive
>  Issue Type: Bug
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
> Attachments: HIVE-4782.1.patch
>
>
> 1. There are no means to point to dist cache binaries for streaming. This 
> blocks the mainstream scenario where customer uploads its binaries to ASV and 
> wants to use those in his job.
> 2. Command line options passed to hadoop.cmd that can contain an equal sign 
> or a comma must be quoted
> 3. Fix "-file" and "-cmdenv" streaming options which do not seem to work 
> properly thru Templeton
> 4. Also add -combiner option to enable adding combiner in the streaming

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5224) When creating table with AVRO serde, the "avro.schema.url" should be about to load serde schema from file system beside HDFS

2013-09-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771479#comment-13771479
 ] 

Hive QA commented on HIVE-5224:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603099/HIVE-5224.2.patch

{color:red}ERROR:{color} -1 due to 174 failed/errored test(s), 1241 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_external_table_ppd
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_external_table_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_map_queries_prefix
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_binary_storage_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_ppd_key_range
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_pushdown
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_scan_params
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats2
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats3
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_stats_empty_partition
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_ppd_key_ranges
org.apache.hadoop.hive.cli.TestHBaseNegativeCliDriver.testCliDriver_cascade_dbdrop_hadoop20
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket4
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket5
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_disable_merge_for_bucketing
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_groupby2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_dyn_part
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_merge
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_num_buckets
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_join1
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_list_bucket_dml_10
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_fs2
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate
org.apache.hadoop.hive.hwi.TestHWISessionManager.testHiveDriver
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testConversionsBaseResultSet
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDataTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDatabaseMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDescribeTable
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testDriverProperties
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testErrorMessages
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testExplainStmt
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetCatalogs
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumns
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetColumnsMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetSchemas
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTableTypes
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testMetaDataGetTables
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testNullType
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testPrepareStatement
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testResultSetMetaData
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAll
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllFetchSize
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllMaxRows
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSelectAllPartioned
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testSetCommand
org.apache.hadoop.hive.jdbc.TestJdbcDriver.testShowTables
org.apache.hadoop.hive.ql.TestLocationQueries.testAlterTablePartitionLocation_alter5
org.apache.hadoop.hive.ql.TestMTQueries.testMTQueries1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapPlan2
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan1
org.apache.hadoop.hive.ql.exec.TestExecDriver.testMapRedPlan2
org.apache.hadoop.hive.ql.e

Re: How long will we support Hadoop 0.20.2?

2013-09-18 Thread Edward Capriolo
BTW: I am very likely to install hive 0.12 on hadoop 0.20.2 clusters. I
have been running hive since version 0.2. I have been running hadoop since
version 0.17.2. After 0.17.2 I moved to 0.20.2. Since then the hadoop has
seemingly has 10's of releases. 0.21, 0.21.append (Dead on arrival) .
cloudera this, cloudera that, yahoo hadoop distribution (dead on arrival),
0.20.2.203 0.20.2.205, 1? 2.0? 2.1. None of them really have much shelf
life or a very clear upgrade path.

The only thing that has remained constant for our environment is hive and
hadoop 0.20.2. I have been happily just upgrading hive on these clusters
for years now.

So in a nutshell, I'm a long time committer, and I actively support and
develop hive on hadoop 0.20.2 clusters, I do not see supporting the shims
as complicated or difficult.



On Wed, Sep 18, 2013 at 7:02 PM, Owen O'Malley  wrote:

> On Wed, Sep 18, 2013 at 1:54 PM, Edward Capriolo  >wrote:
>
> > I am not fine with dropping it. I still run it in several places.
> >
>
> The question is not whether you run Hadoop 0.20.2, but whether you are
> likely to install Hive 0.12 on those very old clusters.
>
>
> >
> > Believe it or now many people still run 0.20.2. I believe (correct me If
> I
> > am wrong) facebook is still running a heavily patch 0.20.2.
> >
>
> It is more accurate to say that Facebook is running a fork of Hadoop where
> the last common point was Hadoop 0.20.1. I haven't heard anyone (other than
> you in this thread) say they are running 0.20.2 in years.
>
>
> > I could see dropping 0.20.2 if it was a huge burden but I do not see it
> > that way, it work's it is reliable, and it is a known quantity.
> >
>
> It is a large burden in that we have relatively complicated shims and a
> lack of testing. Unless you are signing up to test every release on 0.20.2
> we don't have anyone doing the relevant testing.
>
> -- Owen
>


Re: How long will we support Hadoop 0.20.2?

2013-09-18 Thread Edward Capriolo
"Unless you are signing up to test every release on 0.20.2 we don't have
anyone doing the relevant testing"

We spend much more time dealing with windows issues. Unless someone signs
up to test and certify windows should we drop that as well?


On Wed, Sep 18, 2013 at 7:02 PM, Owen O'Malley  wrote:

> On Wed, Sep 18, 2013 at 1:54 PM, Edward Capriolo  >wrote:
>
> > I am not fine with dropping it. I still run it in several places.
> >
>
> The question is not whether you run Hadoop 0.20.2, but whether you are
> likely to install Hive 0.12 on those very old clusters.
>
>
> >
> > Believe it or now many people still run 0.20.2. I believe (correct me If
> I
> > am wrong) facebook is still running a heavily patch 0.20.2.
> >
>
> It is more accurate to say that Facebook is running a fork of Hadoop where
> the last common point was Hadoop 0.20.1. I haven't heard anyone (other than
> you in this thread) say they are running 0.20.2 in years.
>
>
> > I could see dropping 0.20.2 if it was a huge burden but I do not see it
> > that way, it work's it is reliable, and it is a known quantity.
> >
>
> It is a large burden in that we have relatively complicated shims and a
> lack of testing. Unless you are signing up to test every release on 0.20.2
> we don't have anyone doing the relevant testing.
>
> -- Owen
>


[jira] [Commented] (HIVE-5209) JDBC support for varchar

2013-09-18 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771446#comment-13771446
 ] 

Phabricator commented on HIVE-5209:
---

thejas has commented on the revision "HIVE-5209 [jira] JDBC support for 
varchar".

INLINE COMMENTS
  jdbc/src/java/org/apache/hadoop/hive/jdbc/HiveBaseResultSet.java:57 The focus 
of the community has been on hive-server2, I don't think people who have 
upgrading to newer versions of hive would use hive-server1 given its 
limitations/problems. I don't think we should bother updating the hive-server1 
based jdbc code.
  service/src/java/org/apache/hive/service/cli/ColumnDescriptor.java:53 This 
change does not seem to be necessary, this constructor is using only using type 
from the FieldSchema argument.
  service/src/java/org/apache/hive/service/cli/TypeQualifiers.java:29 can you 
add a comment describing the class ?
  service/src/java/org/apache/hive/service/cli/TypeQualifiers.java:24 
TypeQualifiers is being used from org/apache/hive/jdbc//HiveQueryResultSet . So 
adding a dependency on serde class here would mean that jdbc applications would 
also need hive-serde jar. We should avoid that.
  jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java:36 I think this class 
should move out as a top level class in this package.

  JdbcColumn seems to be in need of some cleanup, which we can tackle in a 
separate jira. Only static functions in it are used. We should move static 
functions to a JdbcColumnUtil class and remove the rest of the class.
  jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java:187 This function isn't 
used anywhere, we should remove it.
  jdbc/src/java/org/apache/hive/jdbc/JdbcColumn.java:181 This function isn't 
used anywhere, we should remove it.
  jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java:32 This import 
does not seem to be needed. This classes uses serde classes, I am worried that 
using this would lead to a dependency on hive-serde jar as well.
  jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java:157 
this.columnAttributes.addAll() is missing. Maybe call setSchema() here ?
  jdbc/src/java/org/apache/hive/jdbc/HiveQueryResultSet.java:240 This is not 
used outside, maybe make it private as you are anyway making changes in this 
part.

REVISION DETAIL
  https://reviews.facebook.net/D12999

To: JIRA, jdere
Cc: cwsteinbach, thejas


> JDBC support for varchar
> 
>
> Key: HIVE-5209
> URL: https://issues.apache.org/jira/browse/HIVE-5209
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC, Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: D12999.1.patch, HIVE-5209.1.patch, HIVE-5209.2.patch, 
> HIVE-5209.4.patch, HIVE-5209.D12705.1.patch
>
>
> Support returning varchar length in result set metadata

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 14155: HIVE-5297 Hive does not honor type for partition columns

2013-09-18 Thread Harish Butani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14155/#review26248
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java


This will add a dependency to antlr. 
Try not to introduce this dependency in the exec package.
I suggest you move the 2 new functions here to BaseSemanticAnalyzer



ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java


after you convert successfully.
Use the converted value.
See comment on date below.



ql/src/test/queries/clientpositive/partition_type_check.q


remove ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
The data file has default delimiters; your results show nulls for data 
columns.



ql/src/test/queries/clientpositive/partition_type_check.q


the value stored for day should be 'start of epoch'; but your output shows 
'second'. Storing the converted value hopefully fixes this.


- Harish Butani


On Sept. 17, 2013, 8:55 p.m., Vikram Dixit Kumaraswamy wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14155/
> ---
> 
> (Updated Sept. 17, 2013, 8:55 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-5297
> https://issues.apache.org/jira/browse/HIVE-5297
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Hive does not consider the type of the partition column while writing 
> partitions. Consider for example the query:
> 
> create table tab1 (id1 int, id2 string) PARTITIONED BY(month string,day int) 
> row format delimited fields terminated by ',';
> alter table tab1 add partition (month='June', day='second');
> 
> Hive accepts this query. However if you try to select from this table and 
> insert into another expecting schema match, it will insert nulls instead.
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1af68a6 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2ece97e 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 
> a704462 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> fb79823 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ca667d4 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TypeCheckProcFactory.java 
> 767f545 
>   ql/src/test/queries/clientnegative/illegal_partition_type.q PRE-CREATION 
>   ql/src/test/queries/clientnegative/illegal_partition_type2.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/partition_type_check.q PRE-CREATION 
>   ql/src/test/results/clientnegative/alter_table_add_partition.q.out bd9c148 
>   ql/src/test/results/clientnegative/alter_view_failure5.q.out 4edb82c 
>   ql/src/test/results/clientnegative/illegal_partition_type.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/illegal_partition_type2.q.out 
> PRE-CREATION 
>   ql/src/test/results/clientpositive/parititon_type_check.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/partition_type_check.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/14155/diff/
> 
> 
> Testing
> ---
> 
> Ran all tests.
> 
> 
> Thanks,
> 
> Vikram Dixit Kumaraswamy
> 
>



[jira] [Updated] (HIVE-5308) The code generation should be part of the build process.

2013-09-18 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5308:
---

Status: Patch Available  (was: Open)

> The code generation should be part of the build process.
> 
>
> Key: HIVE-5308
> URL: https://issues.apache.org/jira/browse/HIVE-5308
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: vectorization-branch
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5308.1-vectorization.patch, 
> HIVE-5308.1.vectorization.patch
>
>
> We have committed lots of generated source code. Instead, we should generate 
> this code as part of the build.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5308) The code generation should be part of the build process.

2013-09-18 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5308:
---

Affects Version/s: vectorization-branch

> The code generation should be part of the build process.
> 
>
> Key: HIVE-5308
> URL: https://issues.apache.org/jira/browse/HIVE-5308
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: vectorization-branch
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5308.1-vectorization.patch, 
> HIVE-5308.1.vectorization.patch
>
>
> We have committed lots of generated source code. Instead, we should generate 
> this code as part of the build.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5308) The code generation should be part of the build process.

2013-09-18 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5308:
---

Attachment: HIVE-5308.1-vectorization.patch

Renamed the patch file to apply correctly on the branch.

> The code generation should be part of the build process.
> 
>
> Key: HIVE-5308
> URL: https://issues.apache.org/jira/browse/HIVE-5308
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5308.1-vectorization.patch, 
> HIVE-5308.1.vectorization.patch
>
>
> We have committed lots of generated source code. Instead, we should generate 
> this code as part of the build.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5309) Update hive-default.xml.template for vectorization flag; remove unused imports from MetaStoreUtils.java

2013-09-18 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5309:
---

Affects Version/s: vectorization-branch

> Update hive-default.xml.template for vectorization flag; remove unused 
> imports from MetaStoreUtils.java
> ---
>
> Key: HIVE-5309
> URL: https://issues.apache.org/jira/browse/HIVE-5309
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: vectorization-branch
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5309.1-vectorization.patch, 
> HIVE-5309.1.vectorization.patch
>
>
> This jira provides fixes for some of the review comments on HIVE-5283.
> 1) Update hive-default.xml.template for vectorization flag.
> 2) remove unused imports from MetaStoreUtils.
> 3) Add a test to run vectorization with non-orc format. The test must still 
> pass because vectorization optimization should fall back to non-vector mode.
> 4) Hardcode the table name in QTestUtil.java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5308) The code generation should be part of the build process.

2013-09-18 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5308:
---

Status: Open  (was: Patch Available)

> The code generation should be part of the build process.
> 
>
> Key: HIVE-5308
> URL: https://issues.apache.org/jira/browse/HIVE-5308
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: vectorization-branch
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5308.1-vectorization.patch, 
> HIVE-5308.1.vectorization.patch
>
>
> We have committed lots of generated source code. Instead, we should generate 
> this code as part of the build.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5309) Update hive-default.xml.template for vectorization flag; remove unused imports from MetaStoreUtils.java

2013-09-18 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5309:
---

Status: Patch Available  (was: Open)

> Update hive-default.xml.template for vectorization flag; remove unused 
> imports from MetaStoreUtils.java
> ---
>
> Key: HIVE-5309
> URL: https://issues.apache.org/jira/browse/HIVE-5309
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5309.1-vectorization.patch, 
> HIVE-5309.1.vectorization.patch
>
>
> This jira provides fixes for some of the review comments on HIVE-5283.
> 1) Update hive-default.xml.template for vectorization flag.
> 2) remove unused imports from MetaStoreUtils.
> 3) Add a test to run vectorization with non-orc format. The test must still 
> pass because vectorization optimization should fall back to non-vector mode.
> 4) Hardcode the table name in QTestUtil.java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class

2013-09-18 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5306:


Attachment: HIVE-5306.4.patch

Updated with [~jdere]'s review comments.


> Use new GenericUDF instead of basic UDF for UDFAbs class
> 
>
> Key: HIVE-5306
> URL: https://issues.apache.org/jira/browse/HIVE-5306
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5306.1.patch, HIVE-5306.2.patch, HIVE-5306.3.patch, 
> HIVE-5306.4.patch
>
>
> GenericUDF class is the latest  and recommended base class for any UDFs.
> This JIRA is to change the current UDFAbs class extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as 
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
>  * accept arguments of complex types, and return complex types. 2. It can 
> accept
>  * variable length of arguments. 3. It can accept an infinite number of 
> function
>  * signature - for example, it's easy to write a GenericUDF that accepts
>  * array, array> and so on (arbitrary levels of nesting). 4. 
> It
>  * can do short-circuit evaluations using DeferedObject."  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5309) Update hive-default.xml.template for vectorization flag; remove unused imports from MetaStoreUtils.java

2013-09-18 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5309:
---

Status: Open  (was: Patch Available)

> Update hive-default.xml.template for vectorization flag; remove unused 
> imports from MetaStoreUtils.java
> ---
>
> Key: HIVE-5309
> URL: https://issues.apache.org/jira/browse/HIVE-5309
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5309.1-vectorization.patch, 
> HIVE-5309.1.vectorization.patch
>
>
> This jira provides fixes for some of the review comments on HIVE-5283.
> 1) Update hive-default.xml.template for vectorization flag.
> 2) remove unused imports from MetaStoreUtils.
> 3) Add a test to run vectorization with non-orc format. The test must still 
> pass because vectorization optimization should fall back to non-vector mode.
> 4) Hardcode the table name in QTestUtil.java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5309) Update hive-default.xml.template for vectorization flag; remove unused imports from MetaStoreUtils.java

2013-09-18 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5309:
---

Attachment: HIVE-5309.1-vectorization.patch

Renaming the patch file so that jenkins applies it to the right branch.

> Update hive-default.xml.template for vectorization flag; remove unused 
> imports from MetaStoreUtils.java
> ---
>
> Key: HIVE-5309
> URL: https://issues.apache.org/jira/browse/HIVE-5309
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5309.1-vectorization.patch, 
> HIVE-5309.1.vectorization.patch
>
>
> This jira provides fixes for some of the review comments on HIVE-5283.
> 1) Update hive-default.xml.template for vectorization flag.
> 2) remove unused imports from MetaStoreUtils.
> 3) Add a test to run vectorization with non-orc format. The test must still 
> pass because vectorization optimization should fall back to non-vector mode.
> 4) Hardcode the table name in QTestUtil.java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5316) cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe

2013-09-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771414#comment-13771414
 ] 

Thejas M Nair commented on HIVE-5316:
-

I think as we move the documentation to be version controlled (as discussed in 
mailing list), it would make sense to create jiras for the changes and track it 
with releases. Until then I think it is best to edit wiki directly. 
Thanks again for helping improve hive documentation, we need to invest more in 
that front!


> cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe
> --
>
> Key: HIVE-5316
> URL: https://issues.apache.org/jira/browse/HIVE-5316
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.13.0
>
>
> On https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#
> there are a few example that refers to avro serde name as 
> org.apache.hadoop.hive.serde2.AvroSerDe, but it should be 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe instead. This makes the examples 
> not executable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request 14180: HIVE-4531: [WebHCat] Collecting task logs to hdfs

2013-09-18 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14180/#review26246
---



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JobIDParser.java


could this class have a JobIDParser(String statusdir, Configuration conf) 
c'tor so that subclasses can call super(statusdir, conf) and have both member 
vars private (final) ?




trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JobIDParser.java


openStatusFile() can be private



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JobIDParser.java


this can be made private as well (see comment in TestJobIDParser)



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/JobIDParser.java


finally() ?



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


THis needs some javadoc that explains the what the class is doing, at least 
the directory structure as explained in the HIVE-4531 description

may also be useful to explain why it's parsing JSPs instead of calling some 
API, etc



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


Would JavaDoc comments at method level be better than // ?



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


This class is only used from TempletonControllerJob.  Can it be moved to 
the same package and be made package scoped?



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


can all 5 member variables be private?  They are not used outside the class



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


could this be static class (since it's basically a struct)



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


could the valid values for status & type be 'enum's for readability?



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


I'd include the actual value 'jobType' in the error msg - will make 
debugging easier



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


finally {
if(listWriter != null ) listWriter.close()
}
would be better



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


if this exception is caught, the writer will be closed, but the loop will 
continue (and I assume fail trying to write to list.txt)



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


it'd be clearer if 'job' was 'jobID'



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


ArrayList



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


connection not closed



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


it seems better that this method use a try/catch(IOException)/finally and 
handle cleaning up resources here, rather than make every caller do this - all 
they do is write the stack trace to System.err



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


shouldn't the connection be closed?



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


close() should be called from finally{}



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


close connection



trunk/hcatalog/webhcat/svr/src/main/java/org/apache/hive/hcatalog/templeton/LogRetriever.java


[jira] [Commented] (HIVE-5316) cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe

2013-09-18 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771408#comment-13771408
 ] 

Xuefu Zhang commented on HIVE-5316:
---

I didn't know the process. I had some doubts when creating the jira, but 
created it anyway, saying a documentation component. Now I'm thinking we should 
probably just go straight and edit the wiki.

> cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe
> --
>
> Key: HIVE-5316
> URL: https://issues.apache.org/jira/browse/HIVE-5316
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.13.0
>
>
> On https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#
> there are a few example that refers to avro serde name as 
> org.apache.hadoop.hive.serde2.AvroSerDe, but it should be 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe instead. This makes the examples 
> not executable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5316) cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe

2013-09-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771399#comment-13771399
 ] 

Thejas M Nair commented on HIVE-5316:
-

Thanks for contributions to improve the hive documentation. I really appreciate 
that. 
I am just asking if we need to create jiras when we are not checking in some 
code changes.


> cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe
> --
>
> Key: HIVE-5316
> URL: https://issues.apache.org/jira/browse/HIVE-5316
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.13.0
>
>
> On https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#
> there are a few example that refers to avro serde name as 
> org.apache.hadoop.hive.serde2.AvroSerDe, but it should be 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe instead. This makes the examples 
> not executable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class

2013-09-18 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771394#comment-13771394
 ] 

Jason Dere commented on HIVE-5306:
--

True, transient is not really important for those particular fields, I guess 
it's really more of an issue if you have non-serializable fields.

> Use new GenericUDF instead of basic UDF for UDFAbs class
> 
>
> Key: HIVE-5306
> URL: https://issues.apache.org/jira/browse/HIVE-5306
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5306.1.patch, HIVE-5306.2.patch, HIVE-5306.3.patch
>
>
> GenericUDF class is the latest  and recommended base class for any UDFs.
> This JIRA is to change the current UDFAbs class extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as 
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
>  * accept arguments of complex types, and return complex types. 2. It can 
> accept
>  * variable length of arguments. 3. It can accept an infinite number of 
> function
>  * signature - for example, it's easy to write a GenericUDF that accepts
>  * array, array> and so on (arbitrary levels of nesting). 4. 
> It
>  * can do short-circuit evaluations using DeferedObject."  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5316) cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe

2013-09-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771392#comment-13771392
 ] 

Thejas M Nair commented on HIVE-5316:
-

Are we following a new process here ? Why create jiras for wiki edits ? We 
don't follow the review process for it, and wiki already have a change comment 
log. Things like fix version doesn't make sense for wiki, not sure if we should 
include it in hive release documentation.



> cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe
> --
>
> Key: HIVE-5316
> URL: https://issues.apache.org/jira/browse/HIVE-5316
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.13.0
>
>
> On https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#
> there are a few example that refers to avro serde name as 
> org.apache.hadoop.hive.serde2.AvroSerDe, but it should be 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe instead. This makes the examples 
> not executable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5032) Enable hive creating external table at the root directory of DFS

2013-09-18 Thread Shuaishuai Nie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuaishuai Nie updated HIVE-5032:
-

Attachment: HIVE-5032.2.patch

Sorry for the delay of response [~ashutoshc], missed the email. Attached the 
patch with unit test.

> Enable hive creating external table at the root directory of DFS
> 
>
> Key: HIVE-5032
> URL: https://issues.apache.org/jira/browse/HIVE-5032
> Project: Hive
>  Issue Type: Bug
>Reporter: Shuaishuai Nie
>Assignee: Shuaishuai Nie
> Attachments: HIVE-5032.1.patch, HIVE-5032.2.patch
>
>
> Creating external table using HIVE with location point to the root directory 
> of DFS will fail because the function 
> HiveFileFormatUtils#doGetPartitionDescFromPath treat authority of the path 
> the same as folder and cannot find a match in the "pathToPartitionInfo" table 
> when doing prefix match. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-5316) cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe

2013-09-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved HIVE-5316.
---

   Resolution: Fixed
Fix Version/s: 0.13.0

Fixed error in the cwiki page.

> cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe
> --
>
> Key: HIVE-5316
> URL: https://issues.apache.org/jira/browse/HIVE-5316
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
> Fix For: 0.13.0
>
>
> On https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#
> there are a few example that refers to avro serde name as 
> org.apache.hadoop.hive.serde2.AvroSerDe, but it should be 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe instead. This makes the examples 
> not executable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-5316) cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe

2013-09-18 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-5316:
-

Assignee: Xuefu Zhang

> cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe
> --
>
> Key: HIVE-5316
> URL: https://issues.apache.org/jira/browse/HIVE-5316
> Project: Hive
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
>
> On https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#
> there are a few example that refers to avro serde name as 
> org.apache.hadoop.hive.serde2.AvroSerDe, but it should be 
> org.apache.hadoop.hive.serde2.avro.AvroSerDe instead. This makes the examples 
> not executable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class

2013-09-18 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771383#comment-13771383
 ] 

Mohammad Kamrul Islam commented on HIVE-5306:
-

Very good feedbacks. Will include and upload a new patch.

One thing: why transient will be an issue? I found others are using Generic in 
the same context.



> Use new GenericUDF instead of basic UDF for UDFAbs class
> 
>
> Key: HIVE-5306
> URL: https://issues.apache.org/jira/browse/HIVE-5306
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5306.1.patch, HIVE-5306.2.patch, HIVE-5306.3.patch
>
>
> GenericUDF class is the latest  and recommended base class for any UDFs.
> This JIRA is to change the current UDFAbs class extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as 
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
>  * accept arguments of complex types, and return complex types. 2. It can 
> accept
>  * variable length of arguments. 3. It can accept an infinite number of 
> function
>  * signature - for example, it's easy to write a GenericUDF that accepts
>  * array, array> and so on (arbitrary levels of nesting). 4. 
> It
>  * can do short-circuit evaluations using DeferedObject."  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5316) cwiki documentation error: org.apache.hadoop.hive.serde2.AvroSerDe

2013-09-18 Thread Xuefu Zhang (JIRA)
Xuefu Zhang created HIVE-5316:
-

 Summary: cwiki documentation error: 
org.apache.hadoop.hive.serde2.AvroSerDe
 Key: HIVE-5316
 URL: https://issues.apache.org/jira/browse/HIVE-5316
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Reporter: Xuefu Zhang
Priority: Minor


On https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#
there are a few example that refers to avro serde name as 
org.apache.hadoop.hive.serde2.AvroSerDe, but it should be 
org.apache.hadoop.hive.serde2.avro.AvroSerDe instead. This makes the examples 
not executable. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771370#comment-13771370
 ] 

Yin Huai commented on HIVE-4113:


[~brocknoland] I see. Thanks. I am not sure if those changes will affect 
reading RCFile and ORC throught HCat (if we will read those unnecessary 
columns). Let me check.

> Optimize select count(1) with RCFile and Orc
> 
>
> Key: HIVE-4113
> URL: https://issues.apache.org/jira/browse/HIVE-4113
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Gopal V
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.patch, 
> HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>   } else {
> // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
> // skip all columns, this should be distinguished from the case:
> // select * from tt;
> for (int i = 0; i < skippedColIDs.length; i++) {
>   skippedColIDs[i] = false;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-18 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771367#comment-13771367
 ] 

Thejas M Nair commented on HIVE-5313:
-

I will commit this as well to 0.12 along with HIVE-4487


> HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
> -
>
> Key: HIVE-5313
> URL: https://issues.apache.org/jira/browse/HIVE-5313
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.13.0
>
> Attachments: HIVE-5313.patch
>
>
> As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
> to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: How long will we support Hadoop 0.20.2?

2013-09-18 Thread Owen O'Malley
On Wed, Sep 18, 2013 at 1:54 PM, Edward Capriolo wrote:

> I am not fine with dropping it. I still run it in several places.
>

The question is not whether you run Hadoop 0.20.2, but whether you are
likely to install Hive 0.12 on those very old clusters.


>
> Believe it or now many people still run 0.20.2. I believe (correct me If I
> am wrong) facebook is still running a heavily patch 0.20.2.
>

It is more accurate to say that Facebook is running a fork of Hadoop where
the last common point was Hadoop 0.20.1. I haven't heard anyone (other than
you in this thread) say they are running 0.20.2 in years.


> I could see dropping 0.20.2 if it was a huge burden but I do not see it
> that way, it work's it is reliable, and it is a known quantity.
>

It is a large burden in that we have relatively complicated shims and a
lack of testing. Unless you are signing up to test every release on 0.20.2
we don't have anyone doing the relevant testing.

-- Owen


[jira] [Updated] (HIVE-5070) Need to implement listLocatedStatus() in ProxyFileSystem

2013-09-18 Thread shanyu zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated HIVE-5070:
--

Attachment: HIVE-5070-v3.patch

Upload patch v3 that uses a shim to get ProxyFileSystem. 

> Need to implement listLocatedStatus() in ProxyFileSystem
> 
>
> Key: HIVE-5070
> URL: https://issues.apache.org/jira/browse/HIVE-5070
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.12.0
>Reporter: shanyu zhao
> Fix For: 0.13.0
>
> Attachments: HIVE-5070.patch.txt, HIVE-5070-v2.patch, 
> HIVE-5070-v3.patch
>
>
> MAPREDUCE-1981 introduced a new API for FileSystem - listLocatedStatus. It is 
> used in Hadoop's FileInputFormat.getSplits(). Hive's ProxyFileSystem class 
> needs to implement this API in order to make Hive unit test work.
> Otherwise, you'll see these exceptions when running TestCliDriver test case, 
> e.g. results of running allcolref_in_udf.q:
> [junit] Running org.apache.hadoop.hive.cli.TestCliDriver
> [junit] Begin query: allcolref_in_udf.q
> [junit] java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/GitHub/Monarch/project/hive-monarch/build/ql/test/data/warehouse/src, 
> expected: file:///
> [junit]   at 
> org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
> [junit]   at 
> org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:69)
> [junit]   at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:375)
> [junit]   at 
> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1482)
> [junit]   at 
> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1522)
> [junit]   at 
> org.apache.hadoop.fs.FileSystem$4.(FileSystem.java:1798)
> [junit]   at 
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1797)
> [junit]   at 
> org.apache.hadoop.fs.ChecksumFileSystem.listLocatedStatus(ChecksumFileSystem.java:579)
> [junit]   at 
> org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235)
> [junit]   at 
> org.apache.hadoop.fs.FilterFileSystem.listLocatedStatus(FilterFileSystem.java:235)
> [junit]   at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
> [junit]   at 
> org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
> [junit]   at 
> org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:69)
> [junit]   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:385)
> [junit]   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:351)
> [junit]   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:389)
> [junit]   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:503)
> [junit]   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:495)
> [junit]   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:390)
> [junit]   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1268)
> [junit]   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1265)
> [junit]   at java.security.AccessController.doPrivileged(Native Method)
> [junit]   at javax.security.auth.Subject.doAs(Subject.java:396)
> [junit]   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481)
> [junit]   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1265)
> [junit]   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
> [junit]   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:552)
> [junit]   at java.security.AccessController.doPrivileged(Native Method)
> [junit]   at javax.security.auth.Subject.doAs(Subject.java:396)
> [junit]   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1481)
> [junit]   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:552)
> [junit]   at 
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:543)
> [junit]   at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:448)
> [junit]   at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:688)
> [junit]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit]   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> [junit]   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> [junit]   at java.lang.reflect.Method.inv

Re: How long will we support Hadoop 0.20.2?

2013-09-18 Thread Owen O'Malley
On Wed, Sep 18, 2013 at 1:04 PM, Brock Noland  wrote:

> Hi,
>
> At present we require compatibility with Hadoop 0.20.2. See:
> https://issues.apache.org/jira/browse/HIVE-5313
>
> Considering 0.20.2 was released 4 years ago, how long are we going to
> continue to support it?
>

I think it would be good to drop 0.20.2 support.

-- Owen


>
> Brock
>


[jira] [Updated] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-18 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4113:
---

Attachment: HIVE-4113.1.patch

review board https://reviews.apache.org/r/14221/

> Optimize select count(1) with RCFile and Orc
> 
>
> Key: HIVE-4113
> URL: https://issues.apache.org/jira/browse/HIVE-4113
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Gopal V
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4113-0.patch, HIVE-4113.1.patch, HIVE-4113.patch, 
> HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>   } else {
> // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
> // skip all columns, this should be distinguished from the case:
> // select * from tt;
> for (int i = 0; i < skippedColIDs.length; i++) {
>   skippedColIDs[i] = false;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 14221: HIVE-4113: Optimize select count(1) with RCFile and Orc

2013-09-18 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14221/
---

Review request for hive.


Bugs: HIVE-4113
https://issues.apache.org/jira/browse/HIVE-4113


Repository: hive-git


Description
---

Modifies ColumnProjectionUtils such there are two flags. One for the column ids 
and one indicating whether all columns should be read. Additionally the patch 
updates all locations which uses the old method of empty string indicating all 
columns should be read.

The automatic formatter generated by ant eclipse-files is fairly aggressive so 
there are some unrelated import/whitespace cleanup.

This one is based on https://reviews.apache.org/r/11770/ and has been rebased 
to the latest trunk.


Diffs
-

  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseTableInputFormat.java
 766056b 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
 553446a 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatRecordReader.java
 3ee6157 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
 1980ef5 
  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatMultiOutputFormat.java
 2ea5b9e 
  
hcatalog/core/src/test/java/org/apache/hive/hcatalog/mapreduce/TestHCatPartitioned.java
 577e06d 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/TestHCatLoader.java
 d38bb8d 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java ab0494e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 0f29a0e 
  ql/src/java/org/apache/hadoop/hive/ql/io/BucketizedHiveInputFormat.java 
49145b7 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java cccdc1b 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFile.java a83f223 
  ql/src/java/org/apache/hadoop/hive/ql/io/RCFileRecordReader.java 9521060 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 50c5093 
  
ql/src/java/org/apache/hadoop/hive/ql/io/rcfile/merge/RCFileBlockMergeRecordReader.java
 cbdc2db 
  ql/src/test/org/apache/hadoop/hive/ql/io/PerformTestRCFileAndSeqFile.java 
fb9fca1 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestRCFile.java dd1276d 
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 
83c5c38 
  serde/src/java/org/apache/hadoop/hive/serde2/ColumnProjectionUtils.java 
0b3ef7b 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarSerDe.java 
11f5f07 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStruct.java 
1335446 
  serde/src/java/org/apache/hadoop/hive/serde2/columnar/ColumnarStructBase.java 
e1270cc 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarSerDe.java
 b717278 
  
serde/src/java/org/apache/hadoop/hive/serde2/columnar/LazyBinaryColumnarStruct.java
 0317024 
  serde/src/test/org/apache/hadoop/hive/serde2/TestColumnProjectionUtils.java 
PRE-CREATION 
  serde/src/test/org/apache/hadoop/hive/serde2/TestStatsSerde.java 9aa3c45 
  
serde/src/test/org/apache/hadoop/hive/serde2/columnar/TestLazyBinaryColumnarSerDe.java
 99420ca 

Diff: https://reviews.apache.org/r/14221/diff/


Testing
---


Thanks,

Yin Huai



[jira] [Commented] (HIVE-4487) Hive does not set explicit permissions on hive.exec.scratchdir

2013-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771336#comment-13771336
 ] 

Hudson commented on HIVE-4487:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #439 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/439/])
HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing 
FSPermission(string) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524578)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java


> Hive does not set explicit permissions on hive.exec.scratchdir
> --
>
> Key: HIVE-4487
> URL: https://issues.apache.org/jira/browse/HIVE-4487
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Joey Echeverria
>Assignee: Chaoyu Tang
> Fix For: 0.13.0
>
> Attachments: HIVE-4487.patch
>
>
> The hive.exec.scratchdir defaults to /tmp/hive-$\{user.name\}, but when Hive 
> creates this directory it doesn't set any explicit permission on it. This 
> means if you have the default HDFS umask setting of 022, then these 
> directories end up being world readable. These permissions also get applied 
> to the staging directories and their files, thus leaving inter-stage data 
> world readable.
> This can cause a potential leak of data especially when operating on a 
> Kerberos enabled cluster. Hive should probably default these directories to 
> only be readable by the owner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771335#comment-13771335
 ] 

Hudson commented on HIVE-5313:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #439 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/439/])
HIVE-5313 - HIVE-4487 breaks build because 0.20.2 is missing 
FSPermission(string) (brock: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524578)
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java


> HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
> -
>
> Key: HIVE-5313
> URL: https://issues.apache.org/jira/browse/HIVE-5313
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.13.0
>
> Attachments: HIVE-5313.patch
>
>
> As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
> to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5209) JDBC support for varchar

2013-09-18 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5209:
-

Attachment: (was: HIVE-5209.3.patch)

> JDBC support for varchar
> 
>
> Key: HIVE-5209
> URL: https://issues.apache.org/jira/browse/HIVE-5209
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC, Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: D12999.1.patch, HIVE-5209.1.patch, HIVE-5209.2.patch, 
> HIVE-5209.4.patch, HIVE-5209.D12705.1.patch
>
>
> Support returning varchar length in result set metadata

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5209) JDBC support for varchar

2013-09-18 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-5209:
-

Attachment: HIVE-5209.4.patch

patch v3 did not include new files .. uploading v4 which should include them.

> JDBC support for varchar
> 
>
> Key: HIVE-5209
> URL: https://issues.apache.org/jira/browse/HIVE-5209
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC, Types
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: D12999.1.patch, HIVE-5209.1.patch, HIVE-5209.2.patch, 
> HIVE-5209.4.patch, HIVE-5209.D12705.1.patch
>
>
> Support returning varchar length in result set metadata

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771309#comment-13771309
 ] 

Brock Noland commented on HIVE-4113:


Remove it and see what happens? I don't remember exactly but I thought I put 
that in their because if you don't specify anything now we won't read any 
columns while they were expecting all columns to be read.

> Optimize select count(1) with RCFile and Orc
> 
>
> Key: HIVE-4113
> URL: https://issues.apache.org/jira/browse/HIVE-4113
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Gopal V
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4113-0.patch, HIVE-4113.patch, HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>   } else {
> // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
> // skip all columns, this should be distinguished from the case:
> // select * from tt;
> for (int i = 0; i < skippedColIDs.length; i++) {
>   skippedColIDs[i] = false;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5202) Support for SettableUnionObjectInspector and implement isSettable/hasAllFieldsSettable APIs for all data types.

2013-09-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-5202:


Attachment: (was: HIVE-5202.1.patch.txt)

> Support for SettableUnionObjectInspector and implement 
> isSettable/hasAllFieldsSettable APIs for all data types.
> ---
>
> Key: HIVE-5202
> URL: https://issues.apache.org/jira/browse/HIVE-5202
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> These 3 tasks should be accomplished as part of the following jira:
> 1. The current implementation lacks settable union object inspector. We can 
> run into exception inside ObjectInspectorConverters.getConvertedOI() if there 
> is a union.
> 2. Implement the following public functions for all datatypes: 
> isSettable()-> Perform shallow check to see if an object inspector is 
> inherited from settableOI type and 
> hasAllFieldsSettable() -> Perform deep check to see if this objectInspector 
> and all the underlying object inspectors are inherited from settableOI type.
> 3. ObjectInspectorConverters.getConvertedOI() is inefficient. Once (1) and 
> (2) are implemented, add the following check: outputOI.hasAllSettableFields() 
> should be added to return outputOI immediately if the object is entirely 
> settable in order to prevent redundant object instantiation.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771307#comment-13771307
 ] 

Yin Huai commented on HIVE-4113:


[~brocknoland] I have one question. Why do we need 
"ColumnProjectionUtils.setReadAllColumns(jobConf);" in those hcat classes (e.g. 
InitializeInput)?

> Optimize select count(1) with RCFile and Orc
> 
>
> Key: HIVE-4113
> URL: https://issues.apache.org/jira/browse/HIVE-4113
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Gopal V
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4113-0.patch, HIVE-4113.patch, HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>   } else {
> // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
> // skip all columns, this should be distinguished from the case:
> // select * from tt;
> for (int i = 0; i < skippedColIDs.length; i++) {
>   skippedColIDs[i] = false;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5306) Use new GenericUDF instead of basic UDF for UDFAbs class

2013-09-18 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771299#comment-13771299
 ] 

Jason Dere commented on HIVE-5306:
--

Couple comments:

1. You can probably get away with making all of the fields transient.
2. Since you've only defined abs() to work with int/long/double/decimal, I 
believe initialize() will fail if abs() is called with other types which one 
might expect to work, such as tinyint/smallint/float/string.  For non-generic 
UDFs, what would happen would be that FunctionRegistry.getMethodInternal() 
would determine that tinyint/smallint/float/string could be converted to one of 
int/long/double/decimal and make the conversion at the time the UDF is called. 
Unfortunately for GenericUDFs, I think you have to do that logic yourself.

Maybe you can do something like so:
(define Converter inputConvertor)
switch (inputType) {
case BYTE:
case SHORT:
case INT:
  // Use converter to convert input type to your supported type
  inputConverter = ObjectInspectorConvertors.getConvertor(arguments[0], 
PrimitiveObjectInspectorFactory.writableIntObjectInspector);
  outputOI = PrimitiveObjectInspectorFactory.writableIntObjectInspector;
  break;
case LONG:
  inputConvertor = ObjectInspectorConvertors.getConvertor(arguments[0], 
PrimitiveObjectInspectorFactory.writableLongObjectInspector);
  outputOI = PrimitiveObjectInspectorFactory.writableLongObjectInspector;
  break;
...



> Use new GenericUDF instead of basic UDF for UDFAbs class
> 
>
> Key: HIVE-5306
> URL: https://issues.apache.org/jira/browse/HIVE-5306
> Project: Hive
>  Issue Type: Improvement
>  Components: UDF
>Reporter: Mohammad Kamrul Islam
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-5306.1.patch, HIVE-5306.2.patch, HIVE-5306.3.patch
>
>
> GenericUDF class is the latest  and recommended base class for any UDFs.
> This JIRA is to change the current UDFAbs class extended from GenericUDF.
> The general benefit of GenericUDF is described in comments as 
> "* The GenericUDF are superior to normal UDFs in the following ways: 1. It can
>  * accept arguments of complex types, and return complex types. 2. It can 
> accept
>  * variable length of arguments. 3. It can accept an infinite number of 
> function
>  * signature - for example, it's easy to write a GenericUDF that accepts
>  * array, array> and so on (arbitrary levels of nesting). 4. 
> It
>  * can do short-circuit evaluations using DeferedObject."  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-18 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-5313:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

> HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
> -
>
> Key: HIVE-5313
> URL: https://issues.apache.org/jira/browse/HIVE-5313
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.13.0
>
> Attachments: HIVE-5313.patch
>
>
> As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
> to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771253#comment-13771253
 ] 

Brock Noland commented on HIVE-5313:


Hey thanks! I will commit.

> HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
> -
>
> Key: HIVE-5313
> URL: https://issues.apache.org/jira/browse/HIVE-5313
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-5313.patch
>
>
> As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
> to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771248#comment-13771248
 ] 

Edward Capriolo commented on HIVE-5313:
---

Since the build is broken we can forgo the 24 hour embargo. I wont be able to 
commit before ~8:00pm est but someone else can.

> HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
> -
>
> Key: HIVE-5313
> URL: https://issues.apache.org/jira/browse/HIVE-5313
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-5313.patch
>
>
> As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
> to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771244#comment-13771244
 ] 

Edward Capriolo commented on HIVE-5313:
---

Seems fine +1. 

> HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
> -
>
> Key: HIVE-5313
> URL: https://issues.apache.org/jira/browse/HIVE-5313
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-5313.patch
>
>
> As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
> to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4113) Optimize select count(1) with RCFile and Orc

2013-09-18 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771243#comment-13771243
 ] 

Yin Huai commented on HIVE-4113:


I thought there was no flag for column pruning, so 
"tableScan.getNeededColumnIDs();" will not be null... But, there is a flag 
(hive.optimize.cp)... So, when "hive.optimize.cp=false", neededColumnIDs in 
TableScanOperator will not be set... I am so sorry that I have blocked this 
jira for a long time... I think Brock's patch is good. I will just rebase it 
and also make a minor change on comments in ColumnProjectionUtils.



> Optimize select count(1) with RCFile and Orc
> 
>
> Key: HIVE-4113
> URL: https://issues.apache.org/jira/browse/HIVE-4113
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Gopal V
>Assignee: Yin Huai
> Fix For: 0.12.0
>
> Attachments: HIVE-4113-0.patch, HIVE-4113.patch, HIVE-4113.patch
>
>
> select count(1) loads up every column & every row when used with RCFile.
> "select count(1) from store_sales_10_rc" gives
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 31.73 sec   HDFS Read: 234914410 
> HDFS Write: 8 SUCCESS
> {code}
> Where as, "select count(ss_sold_date_sk) from store_sales_10_rc;" reads far 
> less
> {code}
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 29.75 sec   HDFS Read: 28145994 
> HDFS Write: 8 SUCCESS
> {code}
> Which is 11% of the data size read by the COUNT(1).
> This was tracked down to the following code in RCFile.java
> {code}
>   } else {
> // TODO: if no column name is specified e.g, in select count(1) from 
> tt;
> // skip all columns, this should be distinguished from the case:
> // select * from tt;
> for (int i = 0; i < skippedColIDs.length; i++) {
>   skippedColIDs[i] = false;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5313) HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)

2013-09-18 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771239#comment-13771239
 ] 

Brock Noland commented on HIVE-5313:


[~appodictic] Can I ask you to review this?

> HIVE-4487 breaks build because 0.20.2 is missing FSPermission(string)
> -
>
> Key: HIVE-5313
> URL: https://issues.apache.org/jira/browse/HIVE-5313
> Project: Hive
>  Issue Type: Task
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-5313.patch
>
>
> As per HIVE-4487, 0.20.2 does not contain FSPermission(string) so we'll have 
> to shim it out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: How long will we support Hadoop 0.20.2?

2013-09-18 Thread Edward Capriolo
If anyone wants to clean some tech dept: There are still places in the code
that say something like :

"//hack for 17.X removed when we only support hadoop 0.20

:)




On Wed, Sep 18, 2013 at 4:54 PM, Edward Capriolo wrote:

> I am not fine with dropping it. I still run it in several places.
>
> Believe it or now many people still run 0.20.2. I believe (correct me If I
> am wrong) facebook is still running a heavily patch 0.20.2.
>
> Mainly most of the things hive uses were designed around the 0.20.2 mapred
> api. In fact hive's components are generally layered on top of the mapred.*
> and nothing really uses mapreduce.
>
> I think hive should go on supporting 0.20.2 indefinitely. 0.20.2 is solid.
> If you look at the shim layer, it turns out that supporting 0.20.2 is
> really not much more work then supporting any other version.
>
> Since the burden is not that high I do not even see why we would need to
> drop it, there is just as much or more shim-ing to try to deal with
> differences between 2.205, 0.23, 0.2, 2.0.
>
> I could see dropping 0.20.2 if it was a huge burden but I do not see it
> that way, it work's it is reliable, and it is a known quantity.
>
>
>
>
> On Wed, Sep 18, 2013 at 4:22 PM, Ashutosh Chauhan wrote:
>
>> I am fine with dropping support for 0.20 line. 4 years is a long time. We
>> cannot keep accumulating tech debt forever.
>>
>> Ashutosh
>>
>>
>> On Wed, Sep 18, 2013 at 1:04 PM, Brock Noland  wrote:
>>
>> > Hi,
>> >
>> > At present we require compatibility with Hadoop 0.20.2. See:
>> > https://issues.apache.org/jira/browse/HIVE-5313
>> >
>> > Considering 0.20.2 was released 4 years ago, how long are we going to
>> > continue to support it?
>> >
>> > Brock
>> >
>>
>
>


Re: How long will we support Hadoop 0.20.2?

2013-09-18 Thread Edward Capriolo
I am not fine with dropping it. I still run it in several places.

Believe it or now many people still run 0.20.2. I believe (correct me If I
am wrong) facebook is still running a heavily patch 0.20.2.

Mainly most of the things hive uses were designed around the 0.20.2 mapred
api. In fact hive's components are generally layered on top of the mapred.*
and nothing really uses mapreduce.

I think hive should go on supporting 0.20.2 indefinitely. 0.20.2 is solid.
If you look at the shim layer, it turns out that supporting 0.20.2 is
really not much more work then supporting any other version.

Since the burden is not that high I do not even see why we would need to
drop it, there is just as much or more shim-ing to try to deal with
differences between 2.205, 0.23, 0.2, 2.0.

I could see dropping 0.20.2 if it was a huge burden but I do not see it
that way, it work's it is reliable, and it is a known quantity.




On Wed, Sep 18, 2013 at 4:22 PM, Ashutosh Chauhan wrote:

> I am fine with dropping support for 0.20 line. 4 years is a long time. We
> cannot keep accumulating tech debt forever.
>
> Ashutosh
>
>
> On Wed, Sep 18, 2013 at 1:04 PM, Brock Noland  wrote:
>
> > Hi,
> >
> > At present we require compatibility with Hadoop 0.20.2. See:
> > https://issues.apache.org/jira/browse/HIVE-5313
> >
> > Considering 0.20.2 was released 4 years ago, how long are we going to
> > continue to support it?
> >
> > Brock
> >
>


Re: Review Request 14130: Merge vectorization branch to trunk

2013-09-18 Thread Jitendra Pandey


> On Sept. 16, 2013, 12:10 a.m., Carl Steinbach wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java, line 111
> > 
> >
> > AllVectorTypesRecord seems to add an additional level of indirection 
> > without providing any real benefit. I'd recommend following convention and 
> > just hardcoding it for now.

Addressed in HIVE-5309.


- Jitendra


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14130/#review26120
---


On Sept. 13, 2013, 5:51 p.m., Jitendra Pandey wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14130/
> ---
> 
> (Updated Sept. 13, 2013, 5:51 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Bugs: HIVE-5283
> https://issues.apache.org/jira/browse/HIVE-5283
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Merge vectorization branch to trunk.
> 
> 
> Diffs
> -
> 
>   .gitignore c0e9b3c 
>   build-common.xml ee219a9 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c5a8ff3 
>   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
> 15a2a81 
>   ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticScalar.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/ColumnCompareScalar.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/ColumnUnaryMinus.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterColumnCompareColumn.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterColumnCompareScalar.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/FilterScalarCompareColumn.txt 
> PRE-CREATION 
>   
> ql/src/gen/vectorization/ExpressionTemplates/FilterStringColumnCompareColumn.txt
>  PRE-CREATION 
>   
> ql/src/gen/vectorization/ExpressionTemplates/FilterStringColumnCompareScalar.txt
>  PRE-CREATION 
>   
> ql/src/gen/vectorization/ExpressionTemplates/FilterStringScalarCompareColumn.txt
>  PRE-CREATION 
>   ql/src/gen/vectorization/ExpressionTemplates/ScalarArithmeticColumn.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/TestTemplates/TestClass.txt PRE-CREATION 
>   
> ql/src/gen/vectorization/TestTemplates/TestColumnColumnFilterVectorExpressionEvaluation.txt
>  PRE-CREATION 
>   
> ql/src/gen/vectorization/TestTemplates/TestColumnColumnOperationVectorExpressionEvaluation.txt
>  PRE-CREATION 
>   
> ql/src/gen/vectorization/TestTemplates/TestColumnScalarFilterVectorExpressionEvaluation.txt
>  PRE-CREATION 
>   
> ql/src/gen/vectorization/TestTemplates/TestColumnScalarOperationVectorExpressionEvaluation.txt
>  PRE-CREATION 
>   ql/src/gen/vectorization/UDAFTemplates/VectorUDAFAvg.txt PRE-CREATION 
>   ql/src/gen/vectorization/UDAFTemplates/VectorUDAFMinMax.txt PRE-CREATION 
>   ql/src/gen/vectorization/UDAFTemplates/VectorUDAFMinMaxString.txt 
> PRE-CREATION 
>   ql/src/gen/vectorization/UDAFTemplates/VectorUDAFSum.txt PRE-CREATION 
>   ql/src/gen/vectorization/UDAFTemplates/VectorUDAFVar.txt PRE-CREATION 
>   
> ql/src/gen/vectorization/org/apache/hadoop/hive/ql/exec/vector/gen/CodeGen.java
>  PRE-CREATION 
>   
> ql/src/gen/vectorization/org/apache/hadoop/hive/ql/exec/vector/gen/TestCodeGen.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 393ef57 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java d2265e2 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java bcee201 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FilterOperator.java d2c981d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java e498327 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/KeyWrapper.java c303b30 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 3b15667 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 85a22b7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 869417f 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 2ece97e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 4cc7129 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/BytesColumnVector.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/DoubleColumnVector.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/LongColumnVector.java 
> PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/TimestampUtils.java 
> PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/Vec

Re: Review Request 14130: Merge vectorization branch to trunk

2013-09-18 Thread Jitendra Pandey


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> >

I have filed jiras HIVE-5308, HIVE-5309, HIVE-5314 to address some of the 
review comments. I will upload an updated patch as soon as those jiras are 
committed. The individual comments are answered inline.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 805
> > 
> >
> > Please add this to hive-default.xml.template along with a description.

Patch on HIVE-5309 addresses this.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt, 
> > line 1
> > 
> >
> > We currently use Apache Velocity to generate test code at compile-time 
> > (e.g. TestCliDriver, ...). I realize that the templating code in CodeGen 
> > and TestCodeGen is pretty simple, but was wondering if it might be better 
> > from a build and maintenance standpoint to use Velocity instead.
> > 
> > Also, is it possible to select a less generic file suffix for the 
> > template files, e.g. *.t or *.tmpl?

  I have added a patch on HIVE-5308, that removes the generated code for both 
expressions and tests and instead the code is generated as an ant-task during 
the build. 
  The patch doesn't however uses Velocity, do you think its ok to move to 
velocity as a follow up task post merge? I will take care of template file 
suffixes as part of move to velocity because I believe velocity requires its 
own suffix.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > ql/src/gen/vectorization/ExpressionTemplates/ColumnArithmeticColumn.txt, 
> > line 27
> > 
> >
> > In addition to the name (and preferably path) of the template I think 
> > this comment should also include the name and path of the code generator, 
> > and a warning that it should not be modified by hand.

The generated code will not be committed anymore. The code will be generated in 
the build directory and a clean will remove it.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > ql/src/gen/vectorization/org/apache/hadoop/hive/ql/exec/vector/gen/TestCodeGen.java,
> >  line 1
> > 
> >
> > Maybe this should go in ql/src/test/gen. Thoughts?

I have renamed this file to GenVectorTestCode and moved it to 
ant/src/org/apache/hadoop/hive/ant/ so that it runs along with other ant-tasks. 
Is that ok? The jira with this change is HIVE-5308.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/vectorization_0.q, line 20
> > 
> >
> > What is the expected behavior when vectorized.execution=enabled and the 
> > source table is not reading ORC formatted data? I think it's worth adding 
> > some additional tests (either positive or negative) to lock down this 
> > behavior.

Any query that vectorization cannot execute must fall back to non-vector mode. 
If the input format doesn't provide vectorized input the query must still 
succeed. I have added a test in my patch on HIVE-5309 that runs a query on 
rc-file table with vectorization turned on. The query executes successfully.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java, 
> > line 55
> > 
> >
> > Necessary?

The patch on HIVE-5309 addresses this.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > ql/src/test/queries/clientpositive/vectorization_0.q, line 4
> > 
> >
> > There are a lot of magic numbers in these these new test files. Do they 
> > have any special meaning or are they effectively random?

Tony has provided an answer on the jira.


> On Sept. 16, 2013, 12:02 a.m., Carl Steinbach wrote:
> > .gitignore, line 20
> > 
> >
> > I think we should follow established convention of checking these file 
> > instead of generating them since it serves as a useful canary for catching 
> > accidental changes to the ORC format.

I have filed HIVE-5314 to address this.


- Jitendra


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14130/#review26119
---


On Sept. 13, 2013, 5:51 p.m., Jitendra Pandey wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, vi

[jira] [Updated] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION better way.

2013-09-18 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-5315:
-

Attachment: HIVE-5315.patch

I've attached a patch for this issue.

> bin/hive should retrieve HADOOP_VERSION better way.
> ---
>
> Key: HIVE-5315
> URL: https://issues.apache.org/jira/browse/HIVE-5315
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Kousuke Saruta
> Fix For: 0.11.1
>
> Attachments: HIVE-5315.patch
>
>
> In current implementation, bin/hive retrieves HADOOP_VERSION like as follows
> {code}
> HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
> {code}
> But, sometimes, "hadoop version" doesn't show version information at the 
> first line.
> If HADOOP_VERSION is not retrieve collectly, Hive or related processes will 
> not be up.
> I faced this situation when I try to debug Hiveserver2 with debug option like 
> as follows 
> {code}
> -Xdebug -Xrunjdwp:trunsport=dt_socket,suspend=n,server=y,address=9876
> {code}
> Then, "hadoop version" shows -Xdebug... at the first line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION better way.

2013-09-18 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-5315:
-

Status: Patch Available  (was: Open)

> bin/hive should retrieve HADOOP_VERSION better way.
> ---
>
> Key: HIVE-5315
> URL: https://issues.apache.org/jira/browse/HIVE-5315
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Kousuke Saruta
> Fix For: 0.11.1
>
> Attachments: HIVE-5315.patch
>
>
> In current implementation, bin/hive retrieves HADOOP_VERSION like as follows
> {code}
> HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
> {code}
> But, sometimes, "hadoop version" doesn't show version information at the 
> first line.
> If HADOOP_VERSION is not retrieve collectly, Hive or related processes will 
> not be up.
> I faced this situation when I try to debug Hiveserver2 with debug option like 
> as follows 
> {code}
> -Xdebug -Xrunjdwp:trunsport=dt_socket,suspend=n,server=y,address=9876
> {code}
> Then, "hadoop version" shows -Xdebug... at the first line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION better way.

2013-09-18 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated HIVE-5315:
-

Description: 
In current implementation, bin/hive retrieves HADOOP_VERSION like as follows

{code}
HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
{code}

But, sometimes, "hadoop version" doesn't show version information at the first 
line.
If HADOOP_VERSION is not retrieve collectly, Hive or related processes will not 
be up.
I faced this situation when I try to debug Hiveserver2 with debug option like 
as follows 

{code}
-Xdebug -Xrunjdwp:trunsport=dt_socket,suspend=n,server=y,address=9876
{code}

Then, "hadoop version" shows -Xdebug... at the first line.


  was:
In current implementation, bin/hive retrieve HADOOP_VERSION like as follows

{code}
HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
{code}

But, sometimes, "hadoop version" doesn't show version information at first line.
If HADOOP_VERSION is not retrieve collectly, Hive or related processes will not 
be up.
I faced this situation when I try to debug Hiveserver2 with debug option like 
as follows 

{code}
-Xdebug -Xrunjdwp:trunsport=dt_socket,suspend=n,server=y,address=9876
{code}

Then, "hadoop version" shows -Xdebug... at the first line.



> bin/hive should retrieve HADOOP_VERSION better way.
> ---
>
> Key: HIVE-5315
> URL: https://issues.apache.org/jira/browse/HIVE-5315
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.11.0
>Reporter: Kousuke Saruta
> Fix For: 0.11.1
>
>
> In current implementation, bin/hive retrieves HADOOP_VERSION like as follows
> {code}
> HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
> {code}
> But, sometimes, "hadoop version" doesn't show version information at the 
> first line.
> If HADOOP_VERSION is not retrieve collectly, Hive or related processes will 
> not be up.
> I faced this situation when I try to debug Hiveserver2 with debug option like 
> as follows 
> {code}
> -Xdebug -Xrunjdwp:trunsport=dt_socket,suspend=n,server=y,address=9876
> {code}
> Then, "hadoop version" shows -Xdebug... at the first line.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5315) bin/hive should retrieve HADOOP_VERSION better way.

2013-09-18 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created HIVE-5315:


 Summary: bin/hive should retrieve HADOOP_VERSION better way.
 Key: HIVE-5315
 URL: https://issues.apache.org/jira/browse/HIVE-5315
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Kousuke Saruta
 Fix For: 0.11.1


In current implementation, bin/hive retrieve HADOOP_VERSION like as follows

{code}
HADOOP_VERSION=$($HADOOP version | awk '{if (NR == 1) {print $2;}}');
{code}

But, sometimes, "hadoop version" doesn't show version information at first line.
If HADOOP_VERSION is not retrieve collectly, Hive or related processes will not 
be up.
I faced this situation when I try to debug Hiveserver2 with debug option like 
as follows 

{code}
-Xdebug -Xrunjdwp:trunsport=dt_socket,suspend=n,server=y,address=9876
{code}

Then, "hadoop version" shows -Xdebug... at the first line.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-5314) Commit vectorization test data, comment/rename vectorization tests.

2013-09-18 Thread Jitendra Nath Pandey (JIRA)
Jitendra Nath Pandey created HIVE-5314:
--

 Summary: Commit vectorization test data, comment/rename 
vectorization tests.
 Key: HIVE-5314
 URL: https://issues.apache.org/jira/browse/HIVE-5314
 Project: Hive
  Issue Type: Sub-task
Reporter: Jitendra Nath Pandey
Assignee: Tony Murphy


Based on comments on HIVE-5823, we should commit 'alltypesorc' and provides 
some comments on vectorization tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-5309) Update hive-default.xml.template for vectorization flag; remove unused imports from MetaStoreUtils.java

2013-09-18 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-5309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HIVE-5309:
---

Status: Patch Available  (was: Open)

> Update hive-default.xml.template for vectorization flag; remove unused 
> imports from MetaStoreUtils.java
> ---
>
> Key: HIVE-5309
> URL: https://issues.apache.org/jira/browse/HIVE-5309
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HIVE-5309.1.vectorization.patch
>
>
> This jira provides fixes for some of the review comments on HIVE-5283.
> 1) Update hive-default.xml.template for vectorization flag.
> 2) remove unused imports from MetaStoreUtils.
> 3) Add a test to run vectorization with non-orc format. The test must still 
> pass because vectorization optimization should fall back to non-vector mode.
> 4) Hardcode the table name in QTestUtil.java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-5298) AvroSerde performance problem caused by HIVE-3833

2013-09-18 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771208#comment-13771208
 ] 

Hive QA commented on HIVE-5298:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12603614/HIVE-5298.1.patch

{color:green}SUCCESS:{color} +1 3126 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/806/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/806/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

> AvroSerde performance problem caused by HIVE-3833
> -
>
> Key: HIVE-5298
> URL: https://issues.apache.org/jira/browse/HIVE-5298
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.11.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.13.0
>
> Attachments: HIVE-5298.1.patch, HIVE-5298.patch
>
>
> HIVE-3833 fixed the targeted problem and made Hive to use partition-level 
> metadata to initialize object inspector. In doing that, however, it goes thru 
> every file under the table to access the partition metadata, which is very 
> inefficient, especially in case of multiple files per partition. This causes 
> more problem for AvroSerde because AvroSerde initialization accesses schema, 
> which is located on file system. As a result, before hive can process any 
> data, it needs to access every file for a table, which can take long enough 
> to cause job failure because of lack of job progress.
> The improvement can be made so that partition metadata is only access once 
> per partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   >