[jira] [Commented] (HIVE-3488) Issue trying to use the thick client (embedded) from windows.

2013-07-11 Thread Kanwaljit Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706668#comment-13706668
 ] 

Kanwaljit Singh commented on HIVE-3488:
---

We are getting a similar error after dropping all partitions:

java.io.IOException: cannot find dir = 
hdfs://HVEname:9000/tmp/hive-admin/hive_2013-07-12_05-31-36_471_398021424951
1966905/-mr-10002/1/emptyFile in pathToPartitionInfo: 
[hdfs://192.168.156.229:9000/tmp/hive-admin/hive_2013-07-12_0
5-31-36_471_3980214249511966905/-mr-10002/1]
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils
.java:298)
at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils
.java:260)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.
java:104)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:407)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:929)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:921)

> Issue trying to use the thick client (embedded) from windows.
> -
>
> Key: HIVE-3488
> URL: https://issues.apache.org/jira/browse/HIVE-3488
> Project: Hive
>  Issue Type: Bug
>  Components: Windows
>Affects Versions: 0.8.1
>Reporter: Rémy DUBOIS
>Priority: Critical
>
> I'm trying to execute a very simple SELECT query against my remote hive 
> server.
> If I'm doing a SELECT * from table, everything works well. If I'm trying to 
> execute a SELECT name from table, this error appears:
> {code:java}
> Job Submission failed with exception 'java.io.IOException(cannot find dir = 
> /user/hive/warehouse/test/city=paris/out.csv in pathToPartitionInfo: 
> [hdfs://cdh-four:8020/user/hive/warehouse/test/city=paris])'
> 12/09/19 17:18:44 ERROR exec.Task: Job Submission failed with exception 
> 'java.io.IOException(cannot find dir = 
> /user/hive/warehouse/test/city=paris/out.csv in pathToPartitionInfo: 
> [hdfs://cdh-four:8020/user/hive/warehouse/test/city=paris])'
> java.io.IOException: cannot find dir = 
> /user/hive/warehouse/test/city=paris/out.csv in pathToPartitionInfo: 
> [hdfs://cdh-four:8020/user/hive/warehouse/test/city=paris]
>   at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:290)
>   at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:257)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.(CombineHiveInputFormat.java:104)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:407)
>   at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:989)
>   at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:981)
>   at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:891)
>   at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Unknown Source)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:818)
>   at 
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)
>   at 
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
>   at 
> org.apache.hadoop.hive.service.HiveServer$HiveServerHandler.execute(HiveServer.java:191)
>   at 
> org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:187)
> {code}
> Indeed, this "dir" (/user/hive/warehouse/test/city=paris/out.csv) can't be 
> found since it deals with my data file, and not a directory.
> Could you please help me?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hiveserver2 JDBC Client -SQL Select exceptions

2013-07-11 Thread Varunkumar Manohar
I am trying to execute a hive query from JDBC CLient.I am using Hiveserver2
currently.
A very basic query throws SQL exception only from the JDBC CLient and not
from the CLI
*
The queries shown below execute successfully on the CLI*

>From the JDBC client
*select * from tableA*  works fine

whereas if I try to provide a column name and execute the query from the
JDBC CLient  I land into errors
*select col1,col2 from tableA *

throws up the following SQL exception. Is anyone facing the same issue ?
*
Exception in thread "main" java.sql.SQLException: Error while processing
statement: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:159)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:147)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:182)
at
org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:246)


*
I
s there a fix for the issue?

Thanks,Varun

-- 
_
Regards,
Varun


[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW

2013-07-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-2608:
--

Status: Open  (was: Patch Available)

PAtch needs to be rebased.

> Do not require AS a,b,c part in LATERAL VIEW
> 
>
> Key: HIVE-2608
> URL: https://issues.apache.org/jira/browse/HIVE-2608
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor, UDF
>Affects Versions: 0.10.0
>Reporter: Igor Kabiljo
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-2608.D4317.5.patch
>
>
> Currently, it is required to state column names when LATERAL VIEW is used.
> That shouldn't be necessary, since UDTF returns struct which contains column 
> names - and they should be used by default.
> For example, it would be great if this was possible:
> SELECT t.*, t.key1 + t.key4
> FROM some_table
> LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-2989) Adding Table Links to Hive

2013-07-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-2989.
---

Resolution: Won't Fix

> Adding Table Links to Hive
> --
>
> Key: HIVE-2989
> URL: https://issues.apache.org/jira/browse/HIVE-2989
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Query Processor, Security
>Affects Versions: 0.10.0
>Reporter: Bhushan Mandhani
>Assignee: Bhushan Mandhani
> Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, 
> HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, 
> HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> This will add Table Links to Hive. This will be an alternate mechanism for a 
> user to access tables and data in a database that is different from the one 
> he is associated with. This feature can be used to provide access control (if 
> access to databasename.tablename in queries and "use database X" is turned 
> off in conjunction).
> If db X wants to access one or more partitions from table T in db Y, the user 
> will issue:
> CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
> New partitions added to T will automatically be added to the link as well and 
> become available to X. However, if the link is specified to be static, that 
> will not be the case. The X user will then have to explicitly import each 
> partition of T that he needs. The command above will not actually make any 
> existing partitions of T available to X. Instead, we provide the following 
> command to add an existing partition to a link:
> ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
> The user will need to execute the above for each existing partition that 
> needs to be imported. For future partitions, Hive will take care of this. An 
> imported partition can be dropped from a link using a similar command. We 
> just specify "DROP" instead of "ADD". For querying the linked table, the X 
> user will refer to it as T@Y. Link Tables will only have read access and not 
> be writable. The entire Table Link alongwith all its imported partitions can 
> be dropped as follows:
> DROP LINK TO T@Y
> The above commands are purely MetaStore operations. The implementation will 
> rely on replicating the entire partition metadata when a partition is added 
> to a link.  For every link that is created, we will add a new row to table 
> TBLS. The TBL_TYPE column will have a new kind of value "LINK_TABLE" (or 
> "STATIC_LINK_TABLE" if the link has been specified as static). A new column 
> LINK_TBL_ID will be added which will contain the id of the imported table. It 
> will be NULL for all other table types including the regular managed tables. 
> When a partition is added to a link, the new row in the table PARTITIONS will 
> point to the LINK_TABLE in the same database  and not the master table in the 
> other database. We will replicate all the metadata for this partition from 
> the master database. The advantage of this approach is that fewer changes 
> will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
> like "SHOW TABLES" and "SHOW PARTITIONS" will work as expected for 
> LINK_TABLEs too. Of course, even though the metadata is not shared, the 
> underlying data on disk is still shared. Hive still needs to know that when 
> dropping a partition which belongs to a LINK_TABLE, it should not drop the 
> underlying data from HDFS. Views and external tables cannot be imported from 
> one database to another.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2013-07-11 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706598#comment-13706598
 ] 

Bhushan Mandhani commented on HIVE-2989:


Hi, Bhushan Mandhani is no longer at Facebook so this email address is no 
longer being monitored. If you need assistance, please contact another person 
who is currently at the company.


> Adding Table Links to Hive
> --
>
> Key: HIVE-2989
> URL: https://issues.apache.org/jira/browse/HIVE-2989
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Query Processor, Security
>Affects Versions: 0.10.0
>Reporter: Bhushan Mandhani
>Assignee: Bhushan Mandhani
> Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, 
> HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, 
> HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> This will add Table Links to Hive. This will be an alternate mechanism for a 
> user to access tables and data in a database that is different from the one 
> he is associated with. This feature can be used to provide access control (if 
> access to databasename.tablename in queries and "use database X" is turned 
> off in conjunction).
> If db X wants to access one or more partitions from table T in db Y, the user 
> will issue:
> CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
> New partitions added to T will automatically be added to the link as well and 
> become available to X. However, if the link is specified to be static, that 
> will not be the case. The X user will then have to explicitly import each 
> partition of T that he needs. The command above will not actually make any 
> existing partitions of T available to X. Instead, we provide the following 
> command to add an existing partition to a link:
> ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
> The user will need to execute the above for each existing partition that 
> needs to be imported. For future partitions, Hive will take care of this. An 
> imported partition can be dropped from a link using a similar command. We 
> just specify "DROP" instead of "ADD". For querying the linked table, the X 
> user will refer to it as T@Y. Link Tables will only have read access and not 
> be writable. The entire Table Link alongwith all its imported partitions can 
> be dropped as follows:
> DROP LINK TO T@Y
> The above commands are purely MetaStore operations. The implementation will 
> rely on replicating the entire partition metadata when a partition is added 
> to a link.  For every link that is created, we will add a new row to table 
> TBLS. The TBL_TYPE column will have a new kind of value "LINK_TABLE" (or 
> "STATIC_LINK_TABLE" if the link has been specified as static). A new column 
> LINK_TBL_ID will be added which will contain the id of the imported table. It 
> will be NULL for all other table types including the regular managed tables. 
> When a partition is added to a link, the new row in the table PARTITIONS will 
> point to the LINK_TABLE in the same database  and not the master table in the 
> other database. We will replicate all the metadata for this partition from 
> the master database. The advantage of this approach is that fewer changes 
> will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
> like "SHOW TABLES" and "SHOW PARTITIONS" will work as expected for 
> LINK_TABLEs too. Of course, even though the metadata is not shared, the 
> underlying data on disk is still shared. Hive still needs to know that when 
> dropping a partition which belongs to a LINK_TABLE, it should not drop the 
> underlying data from HDFS. Views and external tables cannot be imported from 
> one database to another.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-2591) Hive 0.7.1 fails with "Exception in thread "main" java.lang.NoSuchFieldError: type"

2013-07-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-2591.
---

Resolution: Won't Fix

> Hive 0.7.1 fails with "Exception in thread "main" java.lang.NoSuchFieldError: 
> type"
> ---
>
> Key: HIVE-2591
> URL: https://issues.apache.org/jira/browse/HIVE-2591
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, JDBC, SQL
>Affects Versions: 0.7.1
> Environment: Intel Core2 Quad CPU Q8400 @2.66GHz
> 4 GB RAM
> Ubuntu 10.10 32 bit
> JDK 6.0_27
> Apache Ant 1.8.0
> Apache Hive 0.7.1
> Apache Hadoop 0.20.203.0
>Reporter: Prashanth
>Priority: Blocker
>  Labels: hive
>
> Hi,
> When I try to invoke hive and type in "SHOW TABLES" in cli in the environment 
> as explained above, I get "Exception in thread "main" 
> java.lang.NoSuchFieldError: type" and I am not able to use it at all.
> Is there any temporary fix for this? Please let me know, if I am making any 
> mistake here.
> I have downloaded Hive 0.7.1 from the download link as mentioned in the Hive 
> Wiki. The download url is http://hive.apache.org/releases.html.
> /opt/hive-0.7.1$ hive
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use 
> org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
> Hive history file=/tmp/hadoop/hive_job_log_hduser_20190121_764439225.txt
> hive> SHOW TABLES;
> Exception in thread "main" java.lang.NoSuchFieldError: type
> at 
> org.apache.hadoop.hive.ql.parse.HiveLexer.mKW_SHOW(HiveLexer.java:1234)
> at 
> org.apache.hadoop.hive.ql.parse.HiveLexer.mTokens(HiveLexer.java:5942)
> at org.antlr.runtime.Lexer.nextToken(Lexer.java:89)
> at 
> org.antlr.runtime.BufferedTokenStream.fetch(BufferedTokenStream.java:133)
> at 
> org.antlr.runtime.BufferedTokenStream.sync(BufferedTokenStream.java:127)
> at 
> org.antlr.runtime.CommonTokenStream.setup(CommonTokenStream.java:127)
> at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:91)
> at 
> org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:521)
> at 
> org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:436)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:327)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> I am not sure what is the actual issue here or rather how to fix it.
> Can you please let me know if there is any workaround for this.
> Alternatively I tried building hive from the SVN source repo.
> I am neither able to build hive from SVN. I get the following error.
> [datanucleusenhancer] >>  D:\hive\build\ivy\lib\default\zookeeper-3.3.1.jar
> [datanucleusenhancer] Exception in thread "main" java.lang.VerifyError: 
> Expecting a stackmap frame at branch target 76 in method 
> org.apache.hadoop.hive.metastore.model.MDatabase.jdoCopyField(Lorg/apache/hadoop/hive/metastore/model/MDatabase;I)V
>  at offset 1
> [datanucleusenhancer]   at java.lang.Class.getDeclaredFields0(Native Method)
> [datanucleusenhancer]   at 
> java.lang.Class.privateGetDeclaredFields(Class.java:2308)
> [datanucleusenhancer]   at java.lang.Class.getDeclaredFields(Class.java:1760)
> [datanucleusenhancer]   at 
> org.datanucleus.metadata.ClassMetaData.addMetaDataForMembersNotInMetaData(ClassMetaData.java:358)
> [datanucleusenhancer]   at 
> org.datanucleus.metadata.ClassMetaData.populate(ClassMetaData.java:199)
> [datanucleusenhancer]   at 
> org.datanucleus.metadata.MetaDataManager$1.run(MetaDataManager.java:2394)
> [datanucleusenhancer]   at java.security.AccessController.doPrivileged(Native 
> Method)
> [datanucleusenhancer]   at 
> org.datanucleus.metadata.MetaDataManager.populateAbstractClassMetaData(MetaDataManager.java:2388)
> [datanucleusenhancer]   at 
> org.datanucleus.metadata.MetaDataManager.populateFileMetaData(MetaDataManager.java:2225)
> [datanucleusenhancer]   at 
> org.datanucleus.metadata.MetaDataManager.initialiseFileMetaDataForUse(MetaDataManager.java:925)
> [datanucleusenhancer]   at 
> org.datanucleus.metadata.MetaDataManager.loadMetadata

[jira] [Commented] (HIVE-2989) Adding Table Links to Hive

2013-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706597#comment-13706597
 ] 

Edward Capriolo commented on HIVE-2989:
---

Did we ditch this idea? should we close up shop?

> Adding Table Links to Hive
> --
>
> Key: HIVE-2989
> URL: https://issues.apache.org/jira/browse/HIVE-2989
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Query Processor, Security
>Affects Versions: 0.10.0
>Reporter: Bhushan Mandhani
>Assignee: Bhushan Mandhani
> Attachments: HIVE-2989.10.patch.txt, HIVE-2989.1.patch.txt, 
> HIVE-2989.2.patch.txt, HIVE-2989.3.patch.txt, HIVE-2989.4.patch.txt, 
> HIVE-2989.5.patch.txt, HIVE-2989.6.patch.txt, HIVE-2989.9.patch.txt
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> This will add Table Links to Hive. This will be an alternate mechanism for a 
> user to access tables and data in a database that is different from the one 
> he is associated with. This feature can be used to provide access control (if 
> access to databasename.tablename in queries and "use database X" is turned 
> off in conjunction).
> If db X wants to access one or more partitions from table T in db Y, the user 
> will issue:
> CREATE [STATIC] LINK TO T@Y LINKPROPERTIES ('RETENTION'='N')
> New partitions added to T will automatically be added to the link as well and 
> become available to X. However, if the link is specified to be static, that 
> will not be the case. The X user will then have to explicitly import each 
> partition of T that he needs. The command above will not actually make any 
> existing partitions of T available to X. Instead, we provide the following 
> command to add an existing partition to a link:
> ALTER LINK T@Y ADD PARTITION (ds='2012-04-27')
> The user will need to execute the above for each existing partition that 
> needs to be imported. For future partitions, Hive will take care of this. An 
> imported partition can be dropped from a link using a similar command. We 
> just specify "DROP" instead of "ADD". For querying the linked table, the X 
> user will refer to it as T@Y. Link Tables will only have read access and not 
> be writable. The entire Table Link alongwith all its imported partitions can 
> be dropped as follows:
> DROP LINK TO T@Y
> The above commands are purely MetaStore operations. The implementation will 
> rely on replicating the entire partition metadata when a partition is added 
> to a link.  For every link that is created, we will add a new row to table 
> TBLS. The TBL_TYPE column will have a new kind of value "LINK_TABLE" (or 
> "STATIC_LINK_TABLE" if the link has been specified as static). A new column 
> LINK_TBL_ID will be added which will contain the id of the imported table. It 
> will be NULL for all other table types including the regular managed tables. 
> When a partition is added to a link, the new row in the table PARTITIONS will 
> point to the LINK_TABLE in the same database  and not the master table in the 
> other database. We will replicate all the metadata for this partition from 
> the master database. The advantage of this approach is that fewer changes 
> will be needed in query processing and DDL for LINK_TABLEs. Also, commands 
> like "SHOW TABLES" and "SHOW PARTITIONS" will work as expected for 
> LINK_TABLEs too. Of course, even though the metadata is not shared, the 
> underlying data on disk is still shared. Hive still needs to know that when 
> dropping a partition which belongs to a LINK_TABLE, it should not drop the 
> underlying data from HDFS. Views and external tables cannot be imported from 
> one database to another.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-1446) Move Hive Documentation from the wiki to version control

2013-07-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-1446.
---

Resolution: Fixed

> Move Hive Documentation from the wiki to version control
> 
>
> Key: HIVE-1446
> URL: https://issues.apache.org/jira/browse/HIVE-1446
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Carl Steinbach
>Assignee: Carl Steinbach
> Attachments: hive-1446.diff, hive-1446-part-1.diff, hive-logo-wide.png
>
>
> Move the Hive Language Manual (and possibly some other documents) from the 
> Hive wiki to version control. This work needs to be coordinated with the 
> hive-dev and hive-user community in order to avoid missing any edits as well 
> as to avoid or limit unavailability of the docs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-848) support mutilpe databse support in HWI

2013-07-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-848.
--

Resolution: Won't Fix

> support mutilpe databse support in HWI
> --
>
> Key: HIVE-848
> URL: https://issues.apache.org/jira/browse/HIVE-848
> Project: Hive
>  Issue Type: New Feature
>Reporter: Namit Jain
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4518) Counter Strike: Operation Operator

2013-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706596#comment-13706596
 ] 

Edward Capriolo commented on HIVE-4518:
---

Are you going to submit this patch.

> Counter Strike: Operation Operator
> --
>
> Key: HIVE-4518
> URL: https://issues.apache.org/jira/browse/HIVE-4518
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-4518.1.patch, HIVE-4518.2.patch, HIVE-4518.3.patch, 
> HIVE-4518.4.patch
>
>
> Queries of the form:
> from foo
> insert overwrite table bar partition (p) select ...
> insert overwrite table bar partition (p) select ...
> insert overwrite table bar partition (p) select ...
> Generate a huge amount of counters. The reason is that task.progress is 
> turned on for dynamic partitioning queries.
> The counters not only make queries slower than necessary (up to 50%) you will 
> also eventually run out. That's because we're wrapping them in enum values to 
> comply with hadoop 0.17.
> The real reason we turn task.progress on is that we need CREATED_FILES and 
> FATAL counters to ensure dynamic partitioning queries don't go haywire.
> The counters have counter-intuitive names like C1 through C1000 and don't 
> seem really useful by themselves.
> With hadoop 20+ you don't need to wrap the counters anymore, each operator 
> can simply create and increment counters. That should simplify the code a lot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3404) UDF to obtain the quarter of an year if a date or timestamp is given .

2013-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706593#comment-13706593
 ] 

Edward Capriolo commented on HIVE-3404:
---

You also need to update show_functions.q

> UDF to obtain the quarter of an year if a date or timestamp is given .
> --
>
> Key: HIVE-3404
> URL: https://issues.apache.org/jira/browse/HIVE-3404
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Sanam Naz
> Attachments: HIVE-3404.1.patch.txt
>
>
> Hive current releases lacks a function which returns the quarter of an year 
> if a date or timestamp is given .The function QUARTER(date) would return the 
> quarter  from a date / timestamp .This can be used in HiveQL.This will be 
> useful for different domains like retail ,finance etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3404) UDF to obtain the quarter of an year if a date or timestamp is given .

2013-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706586#comment-13706586
 ] 

Edward Capriolo commented on HIVE-3404:
---

+1 

> UDF to obtain the quarter of an year if a date or timestamp is given .
> --
>
> Key: HIVE-3404
> URL: https://issues.apache.org/jira/browse/HIVE-3404
> Project: Hive
>  Issue Type: New Feature
>  Components: UDF
>Reporter: Sanam Naz
> Attachments: HIVE-3404.1.patch.txt
>
>
> Hive current releases lacks a function which returns the quarter of an year 
> if a date or timestamp is given .The function QUARTER(date) would return the 
> quarter  from a date / timestamp .This can be used in HiveQL.This will be 
> useful for different domains like retail ,finance etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4658) Make KW_OUTER optional in outer joins

2013-07-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706581#comment-13706581
 ] 

Edward Capriolo commented on HIVE-4658:
---

Can we go +1?

> Make KW_OUTER optional in outer joins
> -
>
> Key: HIVE-4658
> URL: https://issues.apache.org/jira/browse/HIVE-4658
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Navis
>Assignee: Edward Capriolo
>Priority: Trivial
> Attachments: hive-4658.2.patch.txt, HIVE-4658.D11091.1.patch
>
>
> For really trivial migration issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-07-11 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated HIVE-4331:
-

Attachment: HIVE_4331.patch

Initial patch will put it on review board

> Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
> 
>
> Key: HIVE-4331
> URL: https://issues.apache.org/jira/browse/HIVE-4331
> Project: Hive
>  Issue Type: Task
>  Components: HCatalog
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Viraj Bhat
> Attachments: HIVE_4331.patch, StorageHandlerDesign_HIVE4331.pdf
>
>
> 1) Deprecate the HCatHBaseStorageHandler and "RevisionManager" from HCatalog. 
> These will now continue to function but internally they will use the 
> "DefaultStorageHandler" from Hive. They will be removed in future release of 
> Hive.
> 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
> bypass the HiveOutputFormat. We will use this class in Hive's 
> "HBaseStorageHandler" instead of the "HiveHBaseTableOutputFormat".
> 3) Write new unit tests in the HCat's "storagehandler" so that systems such 
> as Pig and Map Reduce can use the Hive's "HBaseStorageHandler" instead of the 
> "HCatHBaseStorageHandler".
> 4) Make sure all the old and new unit tests pass without backward 
> compatibility (except known issues as described in the Design Document).
> 5) Replace all instances of the HCat source code, which point to 
> "HCatStorageHandler" to use the"HiveStorageHandler" including the 
> "FosterStorageHandler".
> I have attached the design document for the same and will attach a patch to 
> this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4841) Add partition level hook to HiveMetaHook

2013-07-11 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4841:


Status: Patch Available  (was: Open)

> Add partition level hook to HiveMetaHook
> 
>
> Key: HIVE-4841
> URL: https://issues.apache.org/jira/browse/HIVE-4841
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4841.D11673.1.patch
>
>
> Current HiveMetaHook provides hooks for tables only. With partition level 
> hook, external storages also could be revised to exploit PPR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4841) Add partition level hook to HiveMetaHook

2013-07-11 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706557#comment-13706557
 ] 

Navis commented on HIVE-4841:
-

I've consolidated various methods,

add_partition_with_environment_context()
append_partition_with_environment_context()
append_partition_by_name_with_environment_context()

into single entry point

add_partition_with_environment_context()

add passed all tests

> Add partition level hook to HiveMetaHook
> 
>
> Key: HIVE-4841
> URL: https://issues.apache.org/jira/browse/HIVE-4841
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4841.D11673.1.patch
>
>
> Current HiveMetaHook provides hooks for tables only. With partition level 
> hook, external storages also could be revised to exploit PPR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4841) Add partition level hook to HiveMetaHook

2013-07-11 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4841:
--

Attachment: HIVE-4841.D11673.1.patch

navis requested code review of "HIVE-4841 [jira] Add partition level hook to 
HiveMetaHook".

Reviewers: JIRA

HIVE-4841 Add partition level hook to HiveMetaHook

Current HiveMetaHook provides hooks for tables only. With partition level hook, 
external storages also could be revised to exploit PPR.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11673

AFFECTED FILES
  hbase-handler/src/java/org/apache/hadoop/hive/hbase/HBaseStorageHandler.java
  
hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/HBaseHCatStorageHandler.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaHook.java
  metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/27615/

To: JIRA, navis


> Add partition level hook to HiveMetaHook
> 
>
> Key: HIVE-4841
> URL: https://issues.apache.org/jira/browse/HIVE-4841
> Project: Hive
>  Issue Type: Improvement
>  Components: StorageHandler
>Reporter: Navis
>Assignee: Navis
>Priority: Minor
> Attachments: HIVE-4841.D11673.1.patch
>
>
> Current HiveMetaHook provides hooks for tables only. With partition level 
> hook, external storages also could be revised to exploit PPR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3745) Hive does improper "=" based string comparisons for strings with trailing whitespaces

2013-07-11 Thread Jason Dere (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706519#comment-13706519
 ] 

Jason Dere commented on HIVE-3745:
--

Would it make more sense to support the SQL comparison semantics using new char 
data types, so that we don't break existing behavior for strings? I've created 
HIVE-4844.

> Hive does improper "=" based string comparisons for strings with trailing 
> whitespaces
> -
>
> Key: HIVE-3745
> URL: https://issues.apache.org/jira/browse/HIVE-3745
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 0.9.0
>Reporter: Harsh J
>Assignee: Gang Tim Liu
>
> Compared to other systems such as DB2, MySQL, etc., which disregard trailing 
> whitespaces in a string used when comparing two strings with the "{{=}}" 
> relational operator, Hive does not do this.
> For example, note the following line from the MySQL manual: 
> http://dev.mysql.com/doc/refman/5.1/en/char.html
> {quote}
> All MySQL collations are of type PADSPACE. This means that all CHAR and 
> VARCHAR values in MySQL are compared without regard to any trailing spaces. 
> {quote}
> Hive still is whitespace sensitive and regards trailing spaces of a string as 
> worthy elements when comparing. Ideally {{LIKE}} should consider this 
> strongly, but {{=}} should not.
> Is there a specific reason behind this difference of implementation in Hive's 
> SQL?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4844) Add char/varchar data types

2013-07-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4844:
-

Assignee: Jason Dere

> Add char/varchar data types
> ---
>
> Key: HIVE-4844
> URL: https://issues.apache.org/jira/browse/HIVE-4844
> Project: Hive
>  Issue Type: New Feature
>  Components: Types
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> Add new char/varchar data types which have support for more SQL-compliant 
> behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4844) Add char/varchar data types

2013-07-11 Thread Jason Dere (JIRA)
Jason Dere created HIVE-4844:


 Summary: Add char/varchar data types
 Key: HIVE-4844
 URL: https://issues.apache.org/jira/browse/HIVE-4844
 Project: Hive
  Issue Type: New Feature
  Components: Types
Reporter: Jason Dere


Add new char/varchar data types which have support for more SQL-compliant 
behavior, such as SQL string comparison semantics, max length, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-11 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706462#comment-13706462
 ] 

Vikram Dixit K commented on HIVE-4843:
--

https://reviews.apache.org/r/12476/

> Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
> readability
> ---
>
> Key: HIVE-4843
> URL: https://issues.apache.org/jira/browse/HIVE-4843
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, tez-branch
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-4843.1.patch
>
>
> Currently, there are static apis in multiple locations in ExecDriver and 
> MapRedTask that can be leveraged if put in the already existing utility class 
> in the exec package. This would help making the code more maintainable, 
> readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-07-11 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706460#comment-13706460
 ] 

Rajesh Balamohan commented on HIVE-4331:


This will be extremely beneficial for lots of usecases involving Hive, HBase, 
HCatalog and Pig.  Especially one can think of hosting frequently changed data 
in HBase and access it in Hive/Pig/MapReduce via HCatalog.


> Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
> 
>
> Key: HIVE-4331
> URL: https://issues.apache.org/jira/browse/HIVE-4331
> Project: Hive
>  Issue Type: Task
>  Components: HCatalog
>Affects Versions: 0.11.0, 0.12.0
>Reporter: Ashutosh Chauhan
>Assignee: Viraj Bhat
> Attachments: StorageHandlerDesign_HIVE4331.pdf
>
>
> 1) Deprecate the HCatHBaseStorageHandler and "RevisionManager" from HCatalog. 
> These will now continue to function but internally they will use the 
> "DefaultStorageHandler" from Hive. They will be removed in future release of 
> Hive.
> 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
> bypass the HiveOutputFormat. We will use this class in Hive's 
> "HBaseStorageHandler" instead of the "HiveHBaseTableOutputFormat".
> 3) Write new unit tests in the HCat's "storagehandler" so that systems such 
> as Pig and Map Reduce can use the Hive's "HBaseStorageHandler" instead of the 
> "HCatHBaseStorageHandler".
> 4) Make sure all the old and new unit tests pass without backward 
> compatibility (except known issues as described in the Design Document).
> 5) Replace all instances of the HCat source code, which point to 
> "HCatStorageHandler" to use the"HiveStorageHandler" including the 
> "FosterStorageHandler".
> I have attached the design document for the same and will attach a patch to 
> this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-11 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706419#comment-13706419
 ] 

Gunther Hagleitner commented on HIVE-4843:
--

can you create a review on rb or phabricator please?

> Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
> readability
> ---
>
> Key: HIVE-4843
> URL: https://issues.apache.org/jira/browse/HIVE-4843
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, tez-branch
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-4843.1.patch
>
>
> Currently, there are static apis in multiple locations in ExecDriver and 
> MapRedTask that can be leveraged if put in the already existing utility class 
> in the exec package. This would help making the code more maintainable, 
> readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706360#comment-13706360
 ] 

Mohammad Kamrul Islam commented on HIVE-4732:
-

New patch is uploaded in RB: https://reviews.apache.org/r/12480/

Description copied from RB:
>From our performance analysis, we found AvroSerde's schema.equals() call 
>consumed a substantial amount ( nearly 40%) of time. This patch intends to 
>minimize the number schema.equals() calls by pushing the check as late/fewer 
>as possible.

At first, we added a unique id for each record reader which is then included in 
every AvroGenericRecordWritable. Then, we introduce two new data structures 
(one hashset and one hashmap) to store intermediate data to avoid duplicates 
checkings. Hashset contains all the record readers' IDs that don't need any 
re-encoding. On the other hand, HashMap contains the already used re-encoders. 
It works as cache and allows re-encoders reuse. With this change, our test 
shows nearly 40% reduction in Avro record reading time.

   

> Reduce or eliminate the expensive Schema equals() check for AvroSerde
> -
>
> Key: HIVE-4732
> URL: https://issues.apache.org/jira/browse/HIVE-4732
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Mark Wagner
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch
>
>
> The AvroSerde spends a significant amount of time checking schema equality. 
> Changing to compare hashcodes (which can be computed once then reused) will 
> improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Status: Patch Available  (was: Open)

> Reduce or eliminate the expensive Schema equals() check for AvroSerde
> -
>
> Key: HIVE-4732
> URL: https://issues.apache.org/jira/browse/HIVE-4732
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Mark Wagner
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch
>
>
> The AvroSerde spends a significant amount of time checking schema equality. 
> Changing to compare hashcodes (which can be computed once then reused) will 
> improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Attachment: HIVE-4732.v1.patch

> Reduce or eliminate the expensive Schema equals() check for AvroSerde
> -
>
> Key: HIVE-4732
> URL: https://issues.apache.org/jira/browse/HIVE-4732
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Mark Wagner
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-4732.1.patch, HIVE-4732.v1.patch
>
>
> The AvroSerde spends a significant amount of time checking schema equality. 
> Changing to compare hashcodes (which can be computed once then reused) will 
> improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request 12480: HIVE-4732 Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Islam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/12480/
---

Review request for hive, Ashutosh Chauhan and Jakob Homan.


Bugs: HIVE-4732
https://issues.apache.org/jira/browse/HIVE-4732


Repository: hive-git


Description
---

>From our performance analysis, we found AvroSerde's schema.equals() call 
>consumed a substantial amount ( nearly 40%) of time. This patch intends to 
>minimize the number schema.equals() calls by pushing the check as late/fewer 
>as possible.

At first, we added a unique id for each record reader which is then included in 
every AvroGenericRecordWritable. Then, we introduce two new data structures 
(one hashset and one hashmap) to store intermediate data to avoid duplicates 
checkings. Hashset contains all the record readers' IDs that don't need any 
re-encoding. On the other hand, HashMap contains the already used re-encoders. 
It works as cache and allows re-encoders reuse. With this change, our test 
shows nearly 40% reduction in Avro record reading time.
 
   


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/io/avro/AvroGenericRecordReader.java 
dbc999f 
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroDeserializer.java 
c85ef15 
  
serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroGenericRecordWritable.java
 66f0348 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/TestSchemaReEncoder.java 
9af751b 
  serde/src/test/org/apache/hadoop/hive/serde2/avro/Utils.java 2b948eb 

Diff: https://reviews.apache.org/r/12480/diff/


Testing
---


Thanks,

Mohammad Islam



[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-11 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4843:
-

Attachment: HIVE-4843.1.patch

> Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
> readability
> ---
>
> Key: HIVE-4843
> URL: https://issues.apache.org/jira/browse/HIVE-4843
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.12.0, tez-branch
>Reporter: Vikram Dixit K
>Assignee: Vikram Dixit K
> Attachments: HIVE-4843.1.patch
>
>
> Currently, there are static apis in multiple locations in ExecDriver and 
> MapRedTask that can be leveraged if put in the already existing utility class 
> in the exec package. This would help making the code more maintainable, 
> readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability

2013-07-11 Thread Vikram Dixit K (JIRA)
Vikram Dixit K created HIVE-4843:


 Summary: Refactoring MapRedTask and ExecDriver for better 
re-usability (for tez) and readability
 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch

Currently, there are static apis in multiple locations in ExecDriver and 
MapRedTask that can be leveraged if put in the already existing utility class 
in the exec package. This would help making the code more maintainable, 
readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706299#comment-13706299
 ] 

Mohammad Kamrul Islam commented on HIVE-4732:
-

Thanks Edward for the comments.
We are now trying to take a different approach to address the same issue.
A new patch is coming soon.

> Reduce or eliminate the expensive Schema equals() check for AvroSerde
> -
>
> Key: HIVE-4732
> URL: https://issues.apache.org/jira/browse/HIVE-4732
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Mark Wagner
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-4732.1.patch
>
>
> The AvroSerde spends a significant amount of time checking schema equality. 
> Changing to compare hashcodes (which can be computed once then reused) will 
> improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4055) add Date data type

2013-07-11 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4055:
-

Status: Patch Available  (was: Open)

> add Date data type
> --
>
> Key: HIVE-4055
> URL: https://issues.apache.org/jira/browse/HIVE-4055
> Project: Hive
>  Issue Type: Sub-task
>  Components: JDBC, Query Processor, Serializers/Deserializers, UDF
>Reporter: Sun Rui
> Attachments: Date.pdf, HIVE-4055.1.patch.txt, HIVE-4055.2.patch.txt, 
> HIVE-4055.D11547.1.patch
>
>
> Add Date data type, a new primitive data type which supports the standard SQL 
> date type.
> Basically, the implementation can take HIVE-2272 and HIVE-2957 as references.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3756) "LOAD DATA" does not honor permission inheritence

2013-07-11 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706269#comment-13706269
 ] 

Sushanth Sowmyan commented on HIVE-3756:


I have a few more thoughts on this. Let's walk through an example:

Let's say Parent Dir d1 has permission/group combination A.
Let's say directory d2 inside Parent Dir has permission/group combination B.

In the case of non-partitioned tables, d1 will be the database/warehouse dir, 
and d2 the table dir.
In the case of partitioned tables, d1 will be the table directory and d2 the 
appropriate partition directories.

If we did not have the flag to inherit permissions on, then whatever data is 
loaded, be it files inside d2 (as during a load operation) or replacing d2 and 
everything in it (as during an insert overwrite operation), will have yet 
another permission/group combination C, which is a function of the user's 
current umask and the user's default group

The purpose behind the subdir inherit permissions flag is to make this 
behaviour go away, and to be able to use the parent dir's permissions/group 
when possible. So far, so good.

Let's say, for purposes of this entire discussion from now onwards, the flag to 
inherit permissions is on.

Now, if we load data into d2, without using overwrite, files inside d2 get 
permission B.
If we load data into d2, using overwrite, we now overwrite d2, and thus, d2 
takes on d1's permissions, and so do the files inside, thus resulting in d2 and 
files inside d2 having permissions/group combination A.

--

While this behaviour is consistent, I find that from a user's perspective, if 
they create a table (say unpartitioned), then chmod/chgrp it to B, and then 
they try to load data into it using an Insert-Overwrite, then they still expect 
that they're only overwriting data inside the table dir, and their expectation 
is that the table still have permissions/group-combination B. They don't want 
it to be replaced by "A", the parent db dir's permissions/group , and they 
don't want "C", the umask/current-user-default-group.

Now, as to whether this requires a new flag that overrides 
"hive.warehouse.subdir.inherit.perms" or whether they want 
"hive.warehouse.subdir.inherit.perms" to work in this way is still up for 
discussion, but there is now need for an additional requirement, that of the 
following:

"If the directory being moved in already exists, and will be deleted so that 
this can be placed, then instead of going with the parent permissions, it 
should go with the previous dir's permissions."

Thoughts?

This can be a separate jira if people feel like it should be, but I think it's 
also a minor modification of this current jira.

> "LOAD DATA" does not honor permission inheritence
> -
>
> Key: HIVE-3756
> URL: https://issues.apache.org/jira/browse/HIVE-3756
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Security
>Affects Versions: 0.9.0
>Reporter: Johndee Burks
>Assignee: Chaoyu Tang
> Attachments: HIVE-3756_1.patch, HIVE-3756.patch
>
>
> When a "LOAD DATA" operation is performed the resulting data in hdfs for the 
> table does not maintain permission inheritance. This remains true even with 
> the "hive.warehouse.subdir.inherit.perms" set to true.
> The issue is easily reproducible by creating a table and loading some data 
> into it. After the load is complete just do a "dfs -ls -R" on the warehouse 
> directory and you will see that the inheritance of permissions worked for the 
> table directory but not for the data. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4842) Hive parallel test framework 2 needs to summarize failures

2013-07-11 Thread Vikram Dixit K (JIRA)
Vikram Dixit K created HIVE-4842:


 Summary: Hive parallel test framework 2 needs to summarize failures
 Key: HIVE-4842
 URL: https://issues.apache.org/jira/browse/HIVE-4842
 Project: Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 0.12.0
Reporter: Vikram Dixit K
Assignee: Brock Noland
Priority: Minor
 Fix For: 0.12.0


Currently when unit tests are run, there are multiple simple ways to consume 
the results. Particularly ant testreport target that generates an html file for 
easily locating failures. The ptest2 changes coming from HIVE-4675 is great for 
running the tests in parallel but not very easy to figure out the failing 
tests. It would be great to have an output similar to that of the testreport 
target for easy consumption.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4675) Create new parallel unit test environment

2013-07-11 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706236#comment-13706236
 ] 

Vikram Dixit K commented on HIVE-4675:
--

[~brocknoland] I have raised HIVE-4842 for the same. 

Thanks!

> Create new parallel unit test environment
> -
>
> Key: HIVE-4675
> URL: https://issues.apache.org/jira/browse/HIVE-4675
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.12.0
>
> Attachments: HIVE-4675.patch
>
>
> The current ptest tool is great, but it has the following limitations:
> -Requires an NFS filer
> -Unless the NFS filer is dedicated ptests can become IO bound easily
> -Investigating of failures is troublesome because the source directory for 
> the failure is not saved
> -Ignoring or isolated tests is not supported
> -No unit tests for the ptest framework exist
> It'd be great to have a ptest tool that addresses this limitations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4732) Reduce or eliminate the expensive Schema equals() check for AvroSerde

2013-07-11 Thread Mohammad Kamrul Islam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-4732:


Summary: Reduce or eliminate the expensive Schema equals() check for 
AvroSerde  (was: Speed up AvroSerde by checking hashcodes instead of equality)

> Reduce or eliminate the expensive Schema equals() check for AvroSerde
> -
>
> Key: HIVE-4732
> URL: https://issues.apache.org/jira/browse/HIVE-4732
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Reporter: Mark Wagner
>Assignee: Mohammad Kamrul Islam
> Attachments: HIVE-4732.1.patch
>
>
> The AvroSerde spends a significant amount of time checking schema equality. 
> Changing to compare hashcodes (which can be computed once then reused) will 
> improve performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive

2013-07-11 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706190#comment-13706190
 ] 

Dmitriy V. Ryaboy commented on HIVE-4160:
-

Jitendra,
I believe physical plan primitives for both Hive and Pig (and potentially 
others) are going to come in via Tez, as both Pig and Hive want to get off 
strict MR in the long-term.

I'll take a crack at extracting what's extractable. Right now Hive's UDAF 
reaches fairly deeply into this code, as you noted, but I think with a little 
restructuring this can be factored out.

> Vectorized Query Execution in Hive
> --
>
> Key: HIVE-4160
> URL: https://issues.apache.org/jira/browse/HIVE-4160
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Hive-Vectorized-Query-Execution-Design.docx, 
> Hive-Vectorized-Query-Execution-Design-rev2.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev4.docx, 
> Hive-Vectorized-Query-Execution-Design-rev4.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev5.docx, 
> Hive-Vectorized-Query-Execution-Design-rev5.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev6.docx, 
> Hive-Vectorized-Query-Execution-Design-rev6.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev7.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev9.docx, 
> Hive-Vectorized-Query-Execution-Design-rev9.pdf
>
>
> The Hive query execution engine currently processes one row at a time. A 
> single row of data goes through all the operators before the next row can be 
> processed. This mode of processing is very inefficient in terms of CPU usage. 
> Research has demonstrated that this yields very low instructions per cycle 
> [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization 
> and data columns go through a layer of object inspectors that identify column 
> type, deserialize data and determine appropriate expression routines in the 
> inner loop. These layers of virtual method calls further slow down the 
> processing. 
> This work will add support for vectorized query execution to Hive, where, 
> instead of individual rows, batches of about a thousand rows at a time are 
> processed. Each column in the batch is represented as a vector of a primitive 
> data type. The inner loop of execution scans these vectors very fast, 
> avoiding method calls, deserialization, unnecessary if-then-else, etc. This 
> substantially reduces CPU time used, and gives excellent instructions per 
> cycle (i.e. improved processor pipeline utilization). See the attached design 
> specification for more details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2991) Integrate Clover with Hive

2013-07-11 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706097#comment-13706097
 ] 

Ashutosh Chauhan commented on HIVE-2991:


[~iveselovsky] Seems like you have expanded the scope of this jira quite a bit. 
Your other changes (introducing targets in build system) are quite useful, but 
they are orthogonal to clover integration (as far as i understand). I would 
suggest to split the patch in three parts: one for clover integration, second 
for improvement in test infrastructure and third for improvements in build 
infra.

> Integrate Clover with Hive
> --
>
> Key: HIVE-2991
> URL: https://issues.apache.org/jira/browse/HIVE-2991
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 0.9.0
>Reporter: Ashutosh Chauhan
>Assignee: Ivan A. Veselovsky
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, 
> hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, 
> hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, 
> hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, 
> hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, 
> hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, 
> hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, 
> HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, 
> HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip
>
>
> Atlassian has donated license of their code coverage tool Clover to ASF. Lets 
> make use of it to generate code coverage report to figure out which areas of 
> Hive are well tested and which ones are not. More information about license 
> can be found in Hadoop jira HADOOP-1718 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4160) Vectorized Query Execution in Hive

2013-07-11 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706089#comment-13706089
 ] 

Jitendra Nath Pandey commented on HIVE-4160:


Dmitry, Vinod
  There is significant amount of vectorization work in expression evaluation 
for example, arithmetic expressions or logical expressions or aggregations etc. 
Many of these expressions are pretty generic and different systems are likely 
to have similar semantics for these. It should be possible to re-use this code 
with little change in pig or other systems. It will be required to use same 
vectorized representation of data in the processing engine to re-use these 
expressions, but that part of code is also generic and re-usable. I think that 
could be a good starting point.
  However, a bunch of the vectorization work is in operator code where we have 
vectorized version of the hive operators. These operators are closely tied with 
hive semantics and implementation. Therefore, it will need some restructuring 
in hive code base as well to generalize these operators for re-use in other 
projects. Also, at this point we should be thinking more generally about a 
common physical layer shared between pig and hive. These languages can continue 
to have different logical plans but it would be desirable that they share 
common physical plan structure because they both use same map-reduce runtime.

> Vectorized Query Execution in Hive
> --
>
> Key: HIVE-4160
> URL: https://issues.apache.org/jira/browse/HIVE-4160
> Project: Hive
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Hive-Vectorized-Query-Execution-Design.docx, 
> Hive-Vectorized-Query-Execution-Design-rev2.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.docx, 
> Hive-Vectorized-Query-Execution-Design-rev3.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev4.docx, 
> Hive-Vectorized-Query-Execution-Design-rev4.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev5.docx, 
> Hive-Vectorized-Query-Execution-Design-rev5.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev6.docx, 
> Hive-Vectorized-Query-Execution-Design-rev6.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev7.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.docx, 
> Hive-Vectorized-Query-Execution-Design-rev8.pdf, 
> Hive-Vectorized-Query-Execution-Design-rev9.docx, 
> Hive-Vectorized-Query-Execution-Design-rev9.pdf
>
>
> The Hive query execution engine currently processes one row at a time. A 
> single row of data goes through all the operators before the next row can be 
> processed. This mode of processing is very inefficient in terms of CPU usage. 
> Research has demonstrated that this yields very low instructions per cycle 
> [MonetDB X100]. Also currently Hive heavily relies on lazy deserialization 
> and data columns go through a layer of object inspectors that identify column 
> type, deserialize data and determine appropriate expression routines in the 
> inner loop. These layers of virtual method calls further slow down the 
> processing. 
> This work will add support for vectorized query execution to Hive, where, 
> instead of individual rows, batches of about a thousand rows at a time are 
> processed. Each column in the batch is represented as a vector of a primitive 
> data type. The inner loop of execution scans these vectors very fast, 
> avoiding method calls, deserialization, unnecessary if-then-else, etc. This 
> substantially reduces CPU time used, and gives excellent instructions per 
> cycle (i.e. improved processor pipeline utilization). See the attached design 
> specification for more details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-2991) Integrate Clover with Hive

2013-07-11 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky reassigned HIVE-2991:


Assignee: Ivan A. Veselovsky

> Integrate Clover with Hive
> --
>
> Key: HIVE-2991
> URL: https://issues.apache.org/jira/browse/HIVE-2991
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 0.9.0
>Reporter: Ashutosh Chauhan
>Assignee: Ivan A. Veselovsky
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, 
> hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, 
> hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, 
> hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, 
> hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, 
> hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, 
> hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, 
> HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, 
> HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip
>
>
> Atlassian has donated license of their code coverage tool Clover to ASF. Lets 
> make use of it to generate code coverage report to figure out which areas of 
> Hive are well tested and which ones are not. More information about license 
> can be found in Hadoop jira HADOOP-1718 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4675) Create new parallel unit test environment

2013-07-11 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13705947#comment-13705947
 ] 

Brock Noland commented on HIVE-4675:


Hi,

Great to hear! The TEST-*.xml file should be in the "logs" directory in the 
working dir. Typically we run this via jenkins and then in the jenkins build 
script copy the TEST-*.xml files into a directory for jenkins to parse.

I think we could generate some kind of report as well, did you want to create 
an enhancement request describing what you'd like?

Brock

> Create new parallel unit test environment
> -
>
> Key: HIVE-4675
> URL: https://issues.apache.org/jira/browse/HIVE-4675
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.12.0
>
> Attachments: HIVE-4675.patch
>
>
> The current ptest tool is great, but it has the following limitations:
> -Requires an NFS filer
> -Unless the NFS filer is dedicated ptests can become IO bound easily
> -Investigating of failures is troublesome because the source directory for 
> the failure is not saved
> -Ignoring or isolated tests is not supported
> -No unit tests for the ptest framework exist
> It'd be great to have a ptest tool that addresses this limitations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4675) Create new parallel unit test environment

2013-07-11 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13705943#comment-13705943
 ] 

Vikram Dixit K commented on HIVE-4675:
--

I used this framework to run tests on hive on a single node. It took about half 
the time that it normally takes which is great. However, I am unable to figure 
out the failing tests. I got a message that goes:

TestOrcHCatLoader has one or more failing tests... Also, it doesn't seem like 
the output is integrated with the ant testreport target. It would be great to 
see a summary of failing tests. Could you please elaborate on how to get an 
idea of the failing tests.

Thanks!

> Create new parallel unit test environment
> -
>
> Key: HIVE-4675
> URL: https://issues.apache.org/jira/browse/HIVE-4675
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Reporter: Brock Noland
>Assignee: Brock Noland
> Fix For: 0.12.0
>
> Attachments: HIVE-4675.patch
>
>
> The current ptest tool is great, but it has the following limitations:
> -Requires an NFS filer
> -Unless the NFS filer is dedicated ptests can become IO bound easily
> -Investigating of failures is troublesome because the source directory for 
> the failure is not saved
> -Ignoring or isolated tests is not supported
> -No unit tests for the ptest framework exist
> It'd be great to have a ptest tool that addresses this limitations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2991) Integrate Clover with Hive

2013-07-11 Thread Ivan A. Veselovsky (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan A. Veselovsky updated HIVE-2991:
-

Attachment: HIVE-clover-trunk--N1.patch
HIVE-clover-branch-0.11--N1.patch
HIVE-clover-branch-0.10--N1.patch

The attached patches "HIVE-clover-xxx.patch" are somewhat updated versions of 
the clovering. We used them in parallel builds.

Except clovering itself the patches introduce the following changes:
1) .q files test generator changed to split test classes by groups of tests (10 
test cases) per class is the default. This is needed to avoid huge test classes 
-- needed for parralalized and distributed builds.
2) added "test-lightweight" target that allows to run a batch of tests without 
re-generation/re-compilation. This is badly needed in parallelized and 
distributed builds.
3) we introduce "testcase-list" parameter that allows to pass several test 
class names to execute. The names are to be passed in form of comma-separated 
list with each name in the list being in form "**/a/b/c/TestFoo.*". Last 
asterisk is needed because main project accepts .class names, while HCatalog 
accepts .java names.
4) + several more improvements related to clover instrumentation, reporting, 
etc. 

> Integrate Clover with Hive
> --
>
> Key: HIVE-2991
> URL: https://issues.apache.org/jira/browse/HIVE-2991
> Project: Hive
>  Issue Type: Test
>  Components: Testing Infrastructure
>Affects Versions: 0.9.0
>Reporter: Ashutosh Chauhan
> Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2991.D2985.1.patch, 
> hive.2991.1.branch-0.10.patch, hive.2991.1.branch-0.9.patch, 
> hive.2991.1.trunk.patch, hive.2991.2.branch-0.10.patch, 
> hive.2991.2.branch-0.9.patch, hive.2991.2.trunk.patch, 
> hive.2991.3.branch-0.10.patch, hive.2991.3.branch-0.9.patch, 
> hive.2991.3.trunk.patch, hive.2991.4.branch-0.10.patch, 
> hive.2991.4.branch-0.9.patch, hive.2991.4.trunk.patch, 
> HIVE-clover-branch-0.10--N1.patch, HIVE-clover-branch-0.11--N1.patch, 
> HIVE-clover-trunk--N1.patch, hive-trunk-clover-html-report.zip
>
>
> Atlassian has donated license of their code coverage tool Clover to ASF. Lets 
> make use of it to generate code coverage report to figure out which areas of 
> Hive are well tested and which ones are not. More information about license 
> can be found in Hadoop jira HADOOP-1718 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability

2013-07-11 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4838:
---

Attachment: HIVE-4838.patch

Running tests on the attached patch.

> Refactor MapJoin HashMap code to improve testability and readability
> 
>
> Key: HIVE-4838
> URL: https://issues.apache.org/jira/browse/HIVE-4838
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-4838.patch
>
>
> MapJoin is an essential component for high performance joins in Hive and the 
> current code has done great service for many years. However, the code is 
> showing it's age and currently suffers  from the following issues:
> * Uses static state via the MapJoinMetaData class to pass serialization 
> metadata to the Key, Row classes.
> * The api of a logical "Table Container" is not defined and therefore it's 
> unclear what apis HashMapWrapper 
> needs to publicize. Additionally HashMapWrapper has many used public methods.
> * HashMapWrapper contains logic to serialize, test memory bounds, and 
> implement the table container. Ideally these logical units could be seperated
> * HashTableSinkObjectCtx has unused fields and unused methods
> * CommonJoinOperator and children use ArrayList on left hand side when only 
> List is required
> * There are unused classes MRU, DCLLItemm and classes which duplicate 
> functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4807) Hive metastore hangs

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13705678#comment-13705678
 ] 

Hudson commented on HIVE-4807:
--

Integrated in Hive-trunk-hadoop2 #282 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/282/])
HIVE-4807 : Hive metastore hangs (Sarvesh Sakalanaga via Ashutosh Chauhan) 
(Revision 1501675)

 Result = ABORTED
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1501675
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/ivy/libraries.properties
* /hive/trunk/jdbc/build.xml
* /hive/trunk/metastore/ivy.xml


> Hive metastore hangs
> 
>
> Key: HIVE-4807
> URL: https://issues.apache.org/jira/browse/HIVE-4807
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0
>Reporter: Sarvesh Sakalanaga
>Assignee: Sarvesh Sakalanaga
> Fix For: 0.12.0
>
> Attachments: Hive-4807.0.patch, Hive-4807.1.patch, Hive-4807.2.patch
>
>
>  Hive metastore hangs (does not accept any new connections) due to a bug in 
> DBCP. The root cause analysis is here 
> https://issues.apache.org/jira/browse/DBCP-398.  The fix is to change Hive 
> connection pool to BoneCP which is natively supported by DataNucleus.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3691) TestDynamicSerDe failed with IBM JDK

2013-07-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13705679#comment-13705679
 ] 

Hudson commented on HIVE-3691:
--

Integrated in Hive-trunk-hadoop2 #282 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/282/])
HIVE-3691 : TestDynamicSerDe failed with IBM JDK (Bing Li & Renata Ghisloti 
via Ashutosh Chauhan) (Revision 1501687)

 Result = ABORTED
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1501687
Files : 
* 
/hive/trunk/serde/src/test/org/apache/hadoop/hive/serde2/dynamic_type/TestDynamicSerDe.java


> TestDynamicSerDe failed with IBM JDK
> 
>
> Key: HIVE-3691
> URL: https://issues.apache.org/jira/browse/HIVE-3691
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.8.0, 0.9.0
> Environment: ant-1.8.2, IBM JDK 1.6
>Reporter: Bing Li
>Assignee: Bing Li
>Priority: Minor
> Fix For: 0.12.0
>
> Attachments: HIVE-3691.1.patch-trunk.txt, HIVE-3691.1.patch.txt
>
>
> the order of the output in the gloden file are different from JDKs.
> the root cause of this is the implementation of HashMap in JDK

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-784) Support uncorrelated subqueries in the WHERE clause

2013-07-11 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13705581#comment-13705581
 ] 

Navis commented on HIVE-784:


Added some comments

> Support uncorrelated subqueries in the WHERE clause
> ---
>
> Key: HIVE-784
> URL: https://issues.apache.org/jira/browse/HIVE-784
> Project: Hive
>  Issue Type: New Feature
>  Components: Query Processor
>Reporter: Ning Zhang
>Assignee: Matthew Weaver
> Attachments: HIVE-784.1.patch.txt
>
>
> Hive currently only support views in the FROM-clause, some Facebook use cases 
> suggest that Hive should support subqueries such as those connected by 
> IN/EXISTS in the WHERE-clause. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira