date:20140917

[jira] [Commented] (HIVE-8038) Decouple ORC files split calculation logic from Filesystem's get file location implementation

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136886#comment-14136886
 ] 

Hive QA commented on HIVE-8038:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669113/HIVE-8038.3.patch

{color:green}SUCCESS:{color} +1 6279 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/835/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/835/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-835/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669113

> Decouple ORC files split calculation logic from Filesystem's get file 
> location implementation
> -
>
> Key: HIVE-8038
> URL: https://issues.apache.org/jira/browse/HIVE-8038
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.13.1
>Reporter: Pankit Thapar
>Assignee: Pankit Thapar
> Fix For: 0.14.0
>
> Attachments: HIVE-8038.2.patch, HIVE-8038.3.patch, HIVE-8038.patch
>
>
> What is the Current Logic
> ==
> 1.get the file blocks from FileSystem.getFileBlockLocations() which returns 
> an array of BlockLocation
> 2.In SplitGenerator.createSplit(), check if split only spans one block or 
> multiple blocks.
> 3.If split spans just one block, then using the array index (index = 
> offset/blockSize), get the corresponding host having the blockLocation
> 4.If the split spans multiple blocks, then get all hosts that have at least 
> 80% of the max of total data in split hosted by any host.
> 5.add the split to a list of splits
> Issue with Current Logic
> =
> Dependency on FileSystem API’s logic for block location calculations. It 
> returns an array and we need to rely on FileSystem to  
> make all blocks of same size if we want to directly access a block from the 
> array.
>  
> What is the Fix
> =
> 1a.get the file blocks from FileSystem.getFileBlockLocations() which returns 
> an array of BlockLocation
> 1b.convert the array into a tree map  and return it 
> through getLocationsWithOffSet()
> 2.In SplitGenerator.createSplit(), check if split only spans one block or 
> multiple blocks.
> 3.If split spans just one block, then using Tree.floorEntry(key), get the 
> highest entry smaller than offset for the split and get the corresponding 
> host.
> 4a.If the split spans multiple blocks, get a submap, which contains all 
> entries containing blockLocations from the offset to offset + length
> 4b.get all hosts that have at least 80% of the max of total data in split 
> hosted by any host.
> 5.add the split to a list of splits
> What are the major changes in logic
> ==
> 1. store BlockLocations in a Map instead of an array
> 2. Call SHIMS.getLocationsWithOffSet() instead of getLocations()
> 3. one block case is checked by "if(offset + length <= start.getOffset() + 
> start.getLength())"  instead of "if((offset % blockSize) + length <= 
> blockSize)"
> What is the affect on Complexity (Big O)
> =
> 1. We add a O(n) loop to build a TreeMap from an array but its a one time 
> cost and would not be called for each split
> 2. In case of one block case, we can get the block in O(logn) worst case 
> which was O(1) before
> 3. Getting the submap is O(logn)
> 4. In case of multiple block case, building the list of hosts is O(m) which 
> was O(n) & m < n as previously we were iterating 
>over all the block locations but now we are only iterating only blocks 
> that belong to that range go offsets that we need. 
> What are the benefits of the change
> ==
> 1. With this fix, we do not depend on the blockLocations returned by 
> FileSystem to figure out the block corresponding to the offset and blockSize
> 2. Also, it is not necessary that block lengths is same for all blocks for 
> all FileSystems
> 3. Previously we were using blockSize for one block case and block.length for 
> multiple block case, which is not the case now. We figure out the block
>depending upon the actual length and offset of the block



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7508) Kerberos support for streaming

2014-09-17 Thread Lefty Leverenz (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7508:
-
Labels: Streaming  (was: Streaming TODOC14)

> Kerberos support for streaming
> --
>
> Key: HIVE-7508
> URL: https://issues.apache.org/jira/browse/HIVE-7508
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: Streaming
> Fix For: 0.14.0
>
> Attachments: HIVE-7508.patch
>
>
> Add kerberos support for streaming to secure Hive cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7508) Kerberos support for streaming

2014-09-17 Thread Lefty Leverenz (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136892#comment-14136892
 ] 

Lefty Leverenz commented on HIVE-7508:
--

Thanks Roshan, looks good.  I made a few trivial edits.

* [Streaming Data Ingest | 
https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest]

> Kerberos support for streaming
> --
>
> Key: HIVE-7508
> URL: https://issues.apache.org/jira/browse/HIVE-7508
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 0.13.1
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: Streaming
> Fix For: 0.14.0
>
> Attachments: HIVE-7508.patch
>
>
> Add kerberos support for streaming to secure Hive cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8045) SQL standard auth with cli - Errors and configuration issues

2014-09-17 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8045:

Status: Patch Available  (was: Open)

> SQL standard auth with cli - Errors and configuration issues
> 
>
> Key: HIVE-8045
> URL: https://issues.apache.org/jira/browse/HIVE-8045
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Jagruti Varia
>Assignee: Thejas M Nair
>
> HIVE-7533 enabled sql std authorization to be set in hive cli (without 
> enabling authorization checks). This updates hive configuration so that 
> create-table and create-views set permissions appropriately for the owner of 
> the table.
> HIVE-7209 added a metastore authorization provider that can be used to 
> restricts calls made to the authorization api, so that only HS2 can make 
> those calls (when HS2 uses embedded metastore).
> Some issues were found with this.
> # Even if hive.security.authorization.enabled=false, authorization checks 
> were happening for non sql statements as add/detete/dfs/compile, which 
> results in MetaStoreAuthzAPIAuthorizerEmbedOnly throwing an error.
> # Create table from hive-cli ended up calling metastore server api call 
> (getRoles) and resulted in  MetaStoreAuthzAPIAuthorizerEmbedOnly throwing an 
> error.
> # Some users prefer to enable authorization using hive-site.xml for 
> hive-server2 (hive.security.authorization.enabled param). If this file is 
> shared by hive-cli and hive-server2,  SQL std authorizer throws an error 
> because is use in hive-cli is not allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7980) Hive on spark issue..

2014-09-17 Thread alton.jung (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136931#comment-14136931
 ] 

alton.jung commented on HIVE-7980:
--

MR Xuefu Zhang..

I have a question about supporting hiveserver2 of spark..
i tested with hiveserver2 without spark(master, worker) actived..

I submited below commands on beeline to hiveserver2..
But it worked well... 
I thought below commands should have failed because i deactived spark master 
and worker..
when i changed the environment to hiveserver..it worked as i expected ( it 
failed since i deactived master and worker of spark)


[command in beeline]
set hive.execution.engine=spark;
set spark.master=spark://localhost.localdomain:7077;
set spark.eventLog.enabled=true;
set spark.executor.memory=256m;
set spark.serializer=org.apache.spark.serializer.KryoSerializer;
select * from test where id=1 order by id;


Best regards..

> Hive on spark issue..
> -
>
> Key: HIVE-7980
> URL: https://issues.apache.org/jira/browse/HIVE-7980
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Spark
>Affects Versions: spark-branch
> Environment: Test Environment is..
> . hive 0.14.0(spark branch version)
> . spark 
> (http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar)
> . hadoop 2.4.0 (yarn)
>Reporter: alton.jung
> Fix For: spark-branch
>
>
> .I followed this 
> guide(https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started).
>  and i compiled hive from spark branch. in the next step i met the below 
> error..
> (*i typed the hive query on beeline, i used the  simple query using "order 
> by" to invoke the palleral works 
>ex) select * from test where id = 1 order by id;
> )
> [Error list is]
> 2014-09-04 02:58:08,796 ERROR spark.SparkClient 
> (SparkClient.java:execute(158)) - Error generating Spark Plan
> java.lang.NullPointerException
>   at 
> org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1262)
>   at 
> org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1269)
>   at 
> org.apache.spark.SparkContext.hadoopRDD$default$5(SparkContext.scala:537)
>   at 
> org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:318)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateRDD(SparkPlanGenerator.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:88)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:156)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:77)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
> 2014-09-04 02:58:11,108 ERROR ql.Driver (SessionState.java:printError(696)) - 
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> 2014-09-04 02:58:11,182 INFO  log.PerfLogger 
> (PerfLogger.java:PerfLogEnd(135)) -  start=1409824527954 end=1409824691182 duration=163228 
> from=org.apache.hadoop.hive.ql.Driver>
> 2014-09-04 02:58:11,223 INFO  log.PerfLogger 
> (PerfLogger.java:PerfLogBegin(108)) -  from=org.apache.hadoop.hive.ql.Driver>
> 2014-09-04 02:58:11,224 INFO  log.PerfLogger 
> (PerfLogger.java:PerfLogEnd(135)) -  start=1409824691223 end=1409824691224 duration=1 
> from=org.apache.hadoop.hive.ql.Driver>
> 2014-09-04 02:58:11,306 ERROR operation.Operation 
> (SQLOperation.java:run(199)) - Error running hive query: 
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:284)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:146)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:508)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:20

[jira] [Commented] (HIVE-8083) Authorization DDLs should not enforce hive identifier syntax for user or group

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136957#comment-14136957
 ] 

Hive QA commented on HIVE-8083:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669125/HIVE-8083.2.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6279 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.streaming.TestStreaming.testTransactionBatchCommit_Json
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/836/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/836/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-836/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669125

> Authorization DDLs should not enforce hive identifier syntax for user or group
> --
>
> Key: HIVE-8083
> URL: https://issues.apache.org/jira/browse/HIVE-8083
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, SQLStandardAuthorization
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Attachments: HIVE-8083.1.patch, HIVE-8083.2.patch
>
>
> The compiler expects principals (user, group and role) as hive identifiers 
> for authorization DDLs. The user and group are entities that belong to 
> external namespace and we can't expect those to follow hive identifier syntax 
> rules. For example, a userid or group can contain '-' which is not allowed by 
> compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8106) Enable vectorization for spark [spark branch]

2014-09-17 Thread Remus Rusanu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136959#comment-14136959
 ] 

Remus Rusanu commented on HIVE-8106:


Can you add review board link? ty

> Enable vectorization for spark [spark branch]
> -
>
> Key: HIVE-8106
> URL: https://issues.apache.org/jira/browse/HIVE-8106
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Chinna Rao Lalam
>Assignee: Chinna Rao Lalam
> Attachments: HIVE-8106-spark.patch, HIVE-8106.1-spark.patch
>
>
> Enable the vectorization optimization on spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7762) Enhancement while getting partitions via webhcat client

2014-09-17 Thread Suhas Vasu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137021#comment-14137021
 ] 

Suhas Vasu commented on HIVE-7762:
--

Sure, i'' add the test-cases. makes sense to have them.

Guess {noformat}HCatClient.getPartitions(){noformat} behaves as expected but 
i'll check it once.

I'll update the jira with the findings

> Enhancement while getting partitions via webhcat client
> ---
>
> Key: HIVE-7762
> URL: https://issues.apache.org/jira/browse/HIVE-7762
> Project: Hive
>  Issue Type: Improvement
>  Components: WebHCat
>Reporter: Suhas Vasu
>Priority: Minor
> Attachments: HIVE-7762.2.patch, HIVE-7762.patch
>
>
> Hcatalog creates partitions in lower case, whereas getting partitions from 
> hcatalog via webhcat client doesn't handle this. So the client starts 
> throwing exceptions.
> Ex:
> CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year 
> STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS 
> TEXTFILE LOCATION '/user/suhas/hcat-data/in/';
> Then i try to get partitions by:
> {noformat}
> String inputTableName = "in_table";
> String database = "default";
> Map partitionSpec = new HashMap();
> partitionSpec.put("Year", "2014");
> partitionSpec.put("Month", "08");
> partitionSpec.put("Date", "11");
> partitionSpec.put("Hour", "00");
> partitionSpec.put("Minute", "00");
> HCatClient client = get(catalogUrl);
> HCatPartition hCatPartition = client.getPartition(database, 
> inputTableName, partitionSpec);
> {noformat}
> This throws up saying:
> {noformat}
> Exception in thread "main" org.apache.hcatalog.common.HCatException : 9001 : 
> Exception occurred while processing HCat request : Invalid partition-key 
> specified: year
>   at 
> org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366)
>   at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
> {noformat}
> The same code works if i do
> {noformat}
> partitionSpec.put("year", "2014");
> partitionSpec.put("month", "08");
> partitionSpec.put("date", "11");
> partitionSpec.put("hour", "00");
> partitionSpec.put("minute", "00");
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8139) Upgrade commons-lang from 2.4 to 2.6

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137022#comment-14137022
 ] 

Hive QA commented on HIVE-8139:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669123/HIVE-8139.1.patch

{color:green}SUCCESS:{color} +1 6279 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/837/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/837/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-837/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669123

> Upgrade commons-lang from 2.4 to 2.6
> 
>
> Key: HIVE-8139
> URL: https://issues.apache.org/jira/browse/HIVE-8139
> Project: Hive
>  Issue Type: Bug
>  Components: Build Infrastructure
>Affects Versions: 0.14.0
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Fix For: 0.14.0
>
> Attachments: HIVE-8139.1.patch
>
>
> Upgrade commons-lang version from 2.4 to latest 2.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8158) Optimize writeValue/setValue in VectorExpressionWriterFactory (in VectorReduceSinkOperator codepath)

2014-09-17 Thread Rajesh Balamohan (JIRA)

Rajesh Balamohan created HIVE-8158:
--

 Summary: Optimize writeValue/setValue in 
VectorExpressionWriterFactory (in VectorReduceSinkOperator codepath)
 Key: HIVE-8158
 URL: https://issues.apache.org/jira/browse/HIVE-8158
 Project: Hive
  Issue Type: Bug
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan


VectorReduceSinkOperator --> ProcessOp --> makeValueWriatable --> 
VectorExpressionWriterFactory --> writeValue(byte[], int, int) /setValue.

It appears that this goes through an additional layer of Text.encode/decode 
causing CPU pressure (profiler output attached).

SettableStringObjectInspector / WritableStringObjectInspector has "set(Object 
o, Text value)" method. It would be beneficial to use set(Object, Text) 
directly to save CPU cycles.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8158) Optimize writeValue/setValue in VectorExpressionWriterFactory (in VectorReduceSinkOperator codepath)

2014-09-17 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-8158:
---
Attachment: profiler_output.png

> Optimize writeValue/setValue in VectorExpressionWriterFactory (in 
> VectorReduceSinkOperator codepath)
> 
>
> Key: HIVE-8158
> URL: https://issues.apache.org/jira/browse/HIVE-8158
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>  Labels: performance
> Attachments: profiler_output.png
>
>
> VectorReduceSinkOperator --> ProcessOp --> makeValueWriatable --> 
> VectorExpressionWriterFactory --> writeValue(byte[], int, int) /setValue.
> It appears that this goes through an additional layer of Text.encode/decode 
> causing CPU pressure (profiler output attached).
> SettableStringObjectInspector / WritableStringObjectInspector has "set(Object 
> o, Text value)" method. It would be beneficial to use set(Object, Text) 
> directly to save CPU cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8158) Optimize writeValue/setValue in VectorExpressionWriterFactory (in VectorReduceSinkOperator codepath)

2014-09-17 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-8158:
---
Attachment: HIVE-8158.1.patch

> Optimize writeValue/setValue in VectorExpressionWriterFactory (in 
> VectorReduceSinkOperator codepath)
> 
>
> Key: HIVE-8158
> URL: https://issues.apache.org/jira/browse/HIVE-8158
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>  Labels: performance
> Attachments: HIVE-8158.1.patch, profiler_output.png
>
>
> VectorReduceSinkOperator --> ProcessOp --> makeValueWriatable --> 
> VectorExpressionWriterFactory --> writeValue(byte[], int, int) /setValue.
> It appears that this goes through an additional layer of Text.encode/decode 
> causing CPU pressure (profiler output attached).
> SettableStringObjectInspector / WritableStringObjectInspector has "set(Object 
> o, Text value)" method. It would be beneficial to use set(Object, Text) 
> directly to save CPU cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-6936) Provide table properties to InputFormats

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137102#comment-14137102
 ] 

Hive QA commented on HIVE-6936:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669225/HIVE-6936.patch

{color:green}SUCCESS:{color} +1 6280 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/838/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/838/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-838/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669225

> Provide table properties to InputFormats
> 
>
> Key: HIVE-6936
> URL: https://issues.apache.org/jira/browse/HIVE-6936
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, 
> HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, HIVE-6936.patch, 
> HIVE-6936.patch, HIVE-6936.patch
>
>
> Some advanced file formats need the table properties made available to them. 
> Additionally, it would be convenient to provide a unique id for fetch 
> operators and the complete list of directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8115) Hive select query hang when fields contain map

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137191#comment-14137191
 ] 

Hive QA commented on HIVE-8115:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669133/HIVE-8115.1.patch

{color:green}SUCCESS:{color} +1 6279 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/839/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/839/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-839/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669133

> Hive select query hang when fields contain map
> --
>
> Key: HIVE-8115
> URL: https://issues.apache.org/jira/browse/HIVE-8115
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Xiaobing Zhou
>Assignee: Xiaobing Zhou
> Attachments: HIVE-8115.1.patch, createTable.hql, data
>
>
> Attached the repro of the issue. When creating an table loading the data 
> attached, all hive query with hangs even just select * from the table.
> repro steps:
> 1. run createTable.hql
> 2. hadoop fs ls -put data /data
> 3. LOAD DATA INPATH '/data' OVERWRITE INTO TABLE testtable;
> 4. SELECT * FROM testtable;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8103) Read ACID tables with FetchOperator returns no rows

2014-09-17 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8103:
-
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Patch checked in.  Thank you Ashutosh for the review.

> Read ACID tables with FetchOperator returns no rows
> ---
>
> Key: HIVE-8103
> URL: https://issues.apache.org/jira/browse/HIVE-8103
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-8103.patch
>
>
> After inserting records into a table that uses OrcOutputFormat with the 
> transaction manager set to DbTxnManager, {{select count ( * )}} will return 
> the number of rows inserted, while {{select *}} returns nothing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7980) Hive on spark issue..

2014-09-17 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137302#comment-14137302
 ] 

Xuefu Zhang commented on HIVE-7980:
---

[~alton.jung] Thanks for reporting the problem. I'll find a developer to look 
at this issue.

> Hive on spark issue..
> -
>
> Key: HIVE-7980
> URL: https://issues.apache.org/jira/browse/HIVE-7980
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Spark
>Affects Versions: spark-branch
> Environment: Test Environment is..
> . hive 0.14.0(spark branch version)
> . spark 
> (http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar)
> . hadoop 2.4.0 (yarn)
>Reporter: alton.jung
> Fix For: spark-branch
>
>
> .I followed this 
> guide(https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started).
>  and i compiled hive from spark branch. in the next step i met the below 
> error..
> (*i typed the hive query on beeline, i used the  simple query using "order 
> by" to invoke the palleral works 
>ex) select * from test where id = 1 order by id;
> )
> [Error list is]
> 2014-09-04 02:58:08,796 ERROR spark.SparkClient 
> (SparkClient.java:execute(158)) - Error generating Spark Plan
> java.lang.NullPointerException
>   at 
> org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1262)
>   at 
> org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1269)
>   at 
> org.apache.spark.SparkContext.hadoopRDD$default$5(SparkContext.scala:537)
>   at 
> org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:318)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateRDD(SparkPlanGenerator.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:88)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:156)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:77)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
> 2014-09-04 02:58:11,108 ERROR ql.Driver (SessionState.java:printError(696)) - 
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> 2014-09-04 02:58:11,182 INFO  log.PerfLogger 
> (PerfLogger.java:PerfLogEnd(135)) -  start=1409824527954 end=1409824691182 duration=163228 
> from=org.apache.hadoop.hive.ql.Driver>
> 2014-09-04 02:58:11,223 INFO  log.PerfLogger 
> (PerfLogger.java:PerfLogBegin(108)) -  from=org.apache.hadoop.hive.ql.Driver>
> 2014-09-04 02:58:11,224 INFO  log.PerfLogger 
> (PerfLogger.java:PerfLogEnd(135)) -  start=1409824691223 end=1409824691224 duration=1 
> from=org.apache.hadoop.hive.ql.Driver>
> 2014-09-04 02:58:11,306 ERROR operation.Operation 
> (SQLOperation.java:run(199)) - Error running hive query: 
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:284)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:146)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:508)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> 2014-09-04 02:58:11,634 INFO  exec.ListSinkOperator 
> (Operator.java:close(580)) - 47 finished. closing... 
> 2014-09-04 02:58:11,683 INFO  exec.ListSinkOperator 
> (Op

[jira] [Assigned] (HIVE-7980) Hive on spark issue..

2014-09-17 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-7980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-7980:
-

Assignee: Chao

> Hive on spark issue..
> -
>
> Key: HIVE-7980
> URL: https://issues.apache.org/jira/browse/HIVE-7980
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Spark
>Affects Versions: spark-branch
> Environment: Test Environment is..
> . hive 0.14.0(spark branch version)
> . spark 
> (http://ec2-50-18-79-139.us-west-1.compute.amazonaws.com/data/spark-assembly-1.1.0-SNAPSHOT-hadoop2.3.0.jar)
> . hadoop 2.4.0 (yarn)
>Reporter: alton.jung
>Assignee: Chao
> Fix For: spark-branch
>
>
> .I followed this 
> guide(https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started).
>  and i compiled hive from spark branch. in the next step i met the below 
> error..
> (*i typed the hive query on beeline, i used the  simple query using "order 
> by" to invoke the palleral works 
>ex) select * from test where id = 1 order by id;
> )
> [Error list is]
> 2014-09-04 02:58:08,796 ERROR spark.SparkClient 
> (SparkClient.java:execute(158)) - Error generating Spark Plan
> java.lang.NullPointerException
>   at 
> org.apache.spark.SparkContext.defaultParallelism(SparkContext.scala:1262)
>   at 
> org.apache.spark.SparkContext.defaultMinPartitions(SparkContext.scala:1269)
>   at 
> org.apache.spark.SparkContext.hadoopRDD$default$5(SparkContext.scala:537)
>   at 
> org.apache.spark.api.java.JavaSparkContext.hadoopRDD(JavaSparkContext.scala:318)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generateRDD(SparkPlanGenerator.java:160)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkPlanGenerator.generate(SparkPlanGenerator.java:88)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkClient.execute(SparkClient.java:156)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.submit(SparkSessionImpl.java:52)
>   at 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask.execute(SparkTask.java:77)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:161)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
>   at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:72)
> 2014-09-04 02:58:11,108 ERROR ql.Driver (SessionState.java:printError(696)) - 
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
> 2014-09-04 02:58:11,182 INFO  log.PerfLogger 
> (PerfLogger.java:PerfLogEnd(135)) -  start=1409824527954 end=1409824691182 duration=163228 
> from=org.apache.hadoop.hive.ql.Driver>
> 2014-09-04 02:58:11,223 INFO  log.PerfLogger 
> (PerfLogger.java:PerfLogBegin(108)) -  from=org.apache.hadoop.hive.ql.Driver>
> 2014-09-04 02:58:11,224 INFO  log.PerfLogger 
> (PerfLogger.java:PerfLogEnd(135)) -  start=1409824691223 end=1409824691224 duration=1 
> from=org.apache.hadoop.hive.ql.Driver>
> 2014-09-04 02:58:11,306 ERROR operation.Operation 
> (SQLOperation.java:run(199)) - Error running hive query: 
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.spark.SparkTask
>   at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:284)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:146)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:69)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:196)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:508)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:208)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:722)
> 2014-09-04 02:58:11,634 INFO  exec.ListSinkOperator 
> (Operator.java:close(580)) - 47 finished. closing... 
> 2014-09-04 02:58:11,683 INFO  exec.ListSinkOperator 
> (Operator.java:close(598)) - 47 Close done
> 2014-09-04 02:58:12,190 INFO  log.PerfLogger 
> (PerfLog

[jira] [Updated] (HIVE-8105) booleans and nulls not handled properly in insert/values

2014-09-17 Thread Alan Gates (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-8105:
-
Status: Open  (was: Patch Available)

> booleans and nulls not handled properly in insert/values
> 
>
> Key: HIVE-8105
> URL: https://issues.apache.org/jira/browse/HIVE-8105
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Attachments: HIVE-8105.2.patch, HIVE-8105.patch
>
>
> Doing an insert/values with a boolean always results in a value of true, 
> regardless of whether true or false is given in the query.
> Doing an insert/values with a null for a column value results in a semantic 
> error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8105) booleans and nulls not handled properly in insert/values

2014-09-17 Thread Alan Gates (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137320#comment-14137320
 ] 

Alan Gates commented on HIVE-8105:
--

I thought it was just boolean that wouldn't handle null properly here.  But 
you're right, string doesn't either.  So I agree we should disable the null 
handling for now.  I'll rework this patch to just fix the true/false issue and 
file a separate JIRA to fix the null issue later.

> booleans and nulls not handled properly in insert/values
> 
>
> Key: HIVE-8105
> URL: https://issues.apache.org/jira/browse/HIVE-8105
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Attachments: HIVE-8105.2.patch, HIVE-8105.patch
>
>
> Doing an insert/values with a boolean always results in a value of true, 
> regardless of whether true or false is given in the query.
> Doing an insert/values with a null for a column value results in a semantic 
> error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25720: HIVE-8141:Refactor the GraphTran code by moving union handling logic to UnionTran [Spark Branch]

2014-09-17 Thread Xuefu Zhang


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25720/#review53676
---

Ship it!


Ship It!

- Xuefu Zhang


On Sept. 16, 2014, 11:56 p.m., Na Yang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25720/
> ---
> 
> (Updated Sept. 16, 2014, 11:56 p.m.)
> 
> 
> Review request for hive, Brock Noland and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8141
> https://issues.apache.org/jira/browse/HIVE-8141
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Refactor the GraphTran code by moving union handling logic to UnionTran
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/GraphTran.java 93674c1 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/UnionTran.java 40f22a0 
> 
> Diff: https://reviews.apache.org/r/25720/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Na Yang
> 
>

[jira] [Commented] (HIVE-8141) Refactor the GraphTran code by moving union handling logic to UnionTran [Spark Branch]

2014-09-17 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137322#comment-14137322
 ] 

Xuefu Zhang commented on HIVE-8141:
---

+1

> Refactor the GraphTran code by moving union handling logic to UnionTran 
> [Spark Branch]
> --
>
> Key: HIVE-8141
> URL: https://issues.apache.org/jira/browse/HIVE-8141
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
>  Labels: Spark-M1
> Attachments: HIVE-8141.1-spark.patch
>
>
> In the current hive on spark code, union logic is handled in the GraphTran 
> class. The Union logic could be moved to the UnionTran class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-5764) Stopping Metastore and HiveServer2 from command line

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137332#comment-14137332
 ] 

Hive QA commented on HIVE-5764:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669154/HIVE-5764.1.patch

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 6279 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
org.apache.hive.hcatalog.streaming.TestStreaming.testAddPartition
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/840/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/840/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-840/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669154

> Stopping Metastore and HiveServer2 from command line
> 
>
> Key: HIVE-5764
> URL: https://issues.apache.org/jira/browse/HIVE-5764
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Vaibhav Gumashta
>Assignee: Xiaobing Zhou
>  Labels: patch
> Attachments: HIVE-5764.1.patch
>
>
> Currently a user needs to kill the process. Ideally there should be something 
> like:
> hive --service metastore stop
> hive --service hiveserver2 stop



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8138) Global Init file should allow specifying file name not only directory

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137334#comment-14137334
 ] 

Hive QA commented on HIVE-8138:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669159/HIVE-8138.patch

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/841/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/841/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-841/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ [[ -n /usr/java/jdk1.7.0_45-cloudera ]]
+ export JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ JAVA_HOME=/usr/java/jdk1.7.0_45-cloudera
+ export 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ 
PATH=/usr/java/jdk1.7.0_45-cloudera/bin/:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-maven-3.0.5/bin:/usr/local/apache-maven-3.0.5/bin:/usr/java/jdk1.6.0_34/bin:/usr/local/apache-ant-1.9.1/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hiveptest/bin
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost 
-Dhttp.proxyPort=3128'
+ cd /data/hive-ptest/working/
+ tee /data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-841/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ svn = \s\v\n ]]
+ [[ -n '' ]]
+ [[ -d apache-svn-trunk-source ]]
+ [[ ! -d apache-svn-trunk-source/.svn ]]
+ [[ ! -d apache-svn-trunk-source ]]
+ cd apache-svn-trunk-source
+ svn revert -R .
Reverted 
'metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java'
Reverted 'bin/ext/metastore.sh'
Reverted 'bin/ext/hiveserver2.sh'
Reverted 'bin/hive'
Reverted 
'service/src/java/org/apache/hive/service/server/ServerOptionsProcessor.java'
++ awk '{print $2}'
++ egrep -v '^X|^Performing status on external'
++ svn status --no-ignore
+ rm -rf target datanucleus.log ant/target shims/target shims/0.20/target 
shims/0.20S/target shims/0.23/target shims/aggregator/target 
shims/common/target shims/common-secure/target packaging/target 
hbase-handler/target testutils/target jdbc/target metastore/target 
metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java.orig 
bin/hive.orig itests/target itests/hcatalog-unit/target 
itests/test-serde/target itests/qtest/target itests/hive-unit-hadoop2/target 
itests/hive-minikdc/target itests/hive-unit/target itests/custom-serde/target 
itests/util/target hcatalog/target hcatalog/core/target 
hcatalog/streaming/target hcatalog/server-extensions/target 
hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target 
hcatalog/hcatalog-pig-adapter/target accumulo-handler/target hwi/target 
common/target common/src/gen contrib/target service/target serde/target 
beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target
+ svn update
Uql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java
Uql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java

Fetching external item into 'hcatalog/src/test/e2e/harness'
Updated external to revision 1625614.

Updated to revision 1625614.
+ patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hive-ptest/working/scratch/build.patch
+ [[ -f /data/hive-ptest/working/scratch/build.patch ]]
+ chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh
+ /data/hive-ptest/working/scratch/smart-apply-patch.sh 
/data/hive-ptest/working/scratch/build.patch
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669159

> Global Init file should allow specifying file name  not only directory
> --
>
> Key: HIVE-8138
> URL: https://issues.apache.org/jira/browse/HIVE-8138
> Project: Hive
>  Issue Type: Bug
>Reporter: Brock Noland
>Assignee: Brock Noland
> Attachments: HIVE-8138.patch
>
>
> HIVE-5160 allows you to specify a directory where a .hiverc file exists. 
> However since .hiverc is a hidden file this can be confusing. The property 
> should allow a path to a file or

[jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-17 Thread Rui Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137365#comment-14137365
 ] 

Rui Li commented on HIVE-8043:
--

Hi [~xuefuz],

I looked into the patch in HIVE-7704. My understanding is that the newly added 
operator, mapper etc. is just for (fast) merging RC and Orc files. Other file 
formats will still be merged by the {{TS -> FS}} work. For RC and Orc files, 
this work is a {{MergeFileWork}}, for others, this work is a {{MapWork}}. And 
according to the execution engine, this work will be wrapped in a MapredWork, 
TezWork or SparkWork.

For RC and Orc files, {{MergeFileMapper}} is used instead of {{ExecMapper}}. 
The main difference between the two mappers is that {{MergeFileMapper}} wraps 
and uses {{AbstractFileMergeOperator}} (two implementations for RC and Orc file 
respectively) as the top operator, while {{ExecMapper}} uses {{MapOperator}}.

I think the following needs to be considered on spark side:
* For non-RC files, I think it should work out of the box, at least for simple 
cases. We may need to take extra care of dynamically partitioned tables, 
multi-insert and union queries etc. I tested some simple insert queries where I 
increased {{mapreduce.job.reduces}} to generate many small files. With 
{{hive.merge.sparkfiles=false}}, the destination table consists of all these 
small files, and when turned on, all the small files get merged. I noticed the 
merging feature caused some issue in HIVE-7810. I'll verify if it's still a 
problem now that we have union-remove disabled for spark.
* For RC and Orc files, we need to be aware of the {{MergeFileWork}}. And since 
{{SparkMapRecordHandler}} is our counterpart for {{ExecMapper}}, we'll need 
another record handler as counterpart for {{MergeFileMapper}}, maybe another 
hive function as well. I'm working to implement this to do some tests.
* MR distinguishes map-only and map-reduce jobs for merging. Not sure if we 
shall do similar thing for spark
* Besides, it seems there're two scenarios where merging is needed: at the end 
of a job (map-only or map-reduce), and in DDL task. I'll investigate more into 
this.

Any idea or suggestion is appreciated. Thanks.

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8126) Standalone hive-jdbc jar is not packaged in the Hive distribution

2014-09-17 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8126:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Deepesh!

> Standalone hive-jdbc jar is not packaged in the Hive distribution
> -
>
> Key: HIVE-8126
> URL: https://issues.apache.org/jira/browse/HIVE-8126
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 0.14.0
>Reporter: Deepesh Khandelwal
>Assignee: Deepesh Khandelwal
> Fix For: 0.14.0
>
> Attachments: HIVE-8126.1.patch
>
>
> With HIVE-538 we started creating the hive-jdbc-*-standalone.jar but the 
> packaging/distribution does not contain the standalone jdbc jar. I would have 
> expected it to locate under the lib folder of the distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8150) [CBO] Type coercion in union queries

2014-09-17 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8150:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to cbo branch.

> [CBO] Type coercion in union queries
> 
>
> Key: HIVE-8150
> URL: https://issues.apache.org/jira/browse/HIVE-8150
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-8150.cbo.patch
>
>
> If we can't get common type from Optiq, bail out for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7787) Reading Parquet file with enum in Thrift Encoding throws NoSuchFieldError

2014-09-17 Thread Svend Vanderveken (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137401#comment-14137401
 ] 

Svend Vanderveken commented on HIVE-7787:
-


I encounter a very similar issue with importing data from a hive external table 
in raw CSV format into a parquet table with CDH 5.1

{code}
create external table if not exists testsv.objects_raw (
  objectid string,
  model string,
  owner string,
  attributes map)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
 STORED AS TEXTFILE
 location '/test/somefolder;
{code}

(load some data in csv format in /test/somefolder)

{code}
create table if not exists testsv.objects (
  objectid string,
  model string,
  owner string,
  attributes map)
 ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
 STORED AS
 INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat'
 OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat';
{code}

{code}
 insert overwrite table testsv.objects select source.* from testsv.objects_raw 
source;
{code}


{code}
2014-09-17 10:58:39,436 Stage-3 map = 100%,  reduce = 0%
Ended Job = job_1410534905977_0011 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1410534905977_0011_m_00 (and more) from job 
job_1410534905977_0011

Task with the most failures(4):
-
Task ID:
  task_1410534905977_0011_m_00

URL:
  
http://vm28-hulk-priv:8088/taskdetails.jsp?jobid=job_1410534905977_0011&tipid=task_1410534905977_0011_m_00
-
Diagnostic Messages for this Task:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:346)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:293)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:407)
at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:560)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:168)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:409)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:332)
... 11 more
Caused by: java.lang.NoSuchFieldError: DECIMAL
at 
org.apache.hadoop.hive.ql.io.parquet.convert.ETypeConverter.getNewConverter(ETypeConverter.java:146)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.HiveGroupConverter.getConverterFromDescription(HiveGroupConverter.java:31)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.(DataWritableGroupConverter.java:64)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableGroupConverter.(DataWritableGroupConverter.java:40)
at 
org.apache.hadoop.hive.ql.io.parquet.convert.DataWritableRecordConverter.(DataWritableRecordConverter.java:32)
at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.prepareForRead(DataWritableReadSupport.java:128)
at 
parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142)
at 
parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118)
at 
parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:92)
at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:66)
at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFo

[jira] [Created] (HIVE-8159) CBO: bail from Optiq planning if a Select list contains multiple references to the same name

2014-09-17 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-8159:
--

 Summary: CBO: bail from Optiq planning if a Select list contains 
multiple references to the same name
 Key: HIVE-8159
 URL: https://issues.apache.org/jira/browse/HIVE-8159
 Project: Hive
  Issue Type: Sub-task
  Components: CBO
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.14.0


Optiq explicitly disallows this:
{code}
select x, x from t1
{code}

This is allowed in Hive (and also in mysql). Will check in Postgres. For now 
bailing from Optiq planning if we see this. Will file an Optiq issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8159) CBO: bail from Optiq planning if a Select list contains multiple references to the same name

2014-09-17 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned HIVE-8159:
--

Assignee: Ashutosh Chauhan  (was: Harish Butani)

> CBO: bail from Optiq planning if a Select list contains multiple references 
> to the same name
> 
>
> Key: HIVE-8159
> URL: https://issues.apache.org/jira/browse/HIVE-8159
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Harish Butani
>Assignee: Ashutosh Chauhan
> Fix For: 0.14.0
>
>
> Optiq explicitly disallows this:
> {code}
> select x, x from t1
> {code}
> This is allowed in Hive (and also in mysql). Will check in Postgres. For now 
> bailing from Optiq planning if we see this. Will file an Optiq issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8159) CBO: bail from Optiq planning if a Select list contains multiple references to the same name

2014-09-17 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8159:
---
Reporter: Ashutosh Chauhan  (was: Harish Butani)

> CBO: bail from Optiq planning if a Select list contains multiple references 
> to the same name
> 
>
> Key: HIVE-8159
> URL: https://issues.apache.org/jira/browse/HIVE-8159
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.14.0
>
>
> Optiq explicitly disallows this:
> {code}
> select x, x from t1
> {code}
> This is allowed in Hive (and also in mysql). Will check in Postgres. For now 
> bailing from Optiq planning if we see this. Will file an Optiq issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: moving Hive to git

2014-09-17 Thread Lefty Leverenz

I'm not convinced git would be better.  Could someone please spell out the
advantages?

In particular:


   1. "... git is more powerful and easy to use (once you go past the
   learning curve!)"


-- Lefty

On Tue, Sep 16, 2014 at 4:27 PM, Sergey Shelukhin 
wrote:

> It seems there was consensus that we should move. Any volunteers to do it?
> I can try to find the details on how HBase migrated.
>
> On Fri, Sep 12, 2014 at 7:10 PM, Lefty Leverenz 
> wrote:
>
> > We had a related discussion March 5 - 10:  llsmugcwkuryr5tb
> >  through rq66qe2cpfgw7o5s
> > .
> >
> >
> > -- Lefty
> >
> > On Fri, Sep 12, 2014 at 9:33 PM, Sergey Shelukhin <
> ser...@hortonworks.com>
> > wrote:
> >
> > > Hi.
> > > Many Apache projects are moving to git from svn; HBase moved recently,
> > and
> > > as far as I have heard Hadoop has moved too.
> > > Are there any objections to moving Hive to use git too?
> > > I wanted to start a preliminary discussion.
> > >
> > > --
> > > CONFIDENTIALITY NOTICE
> > > NOTICE: This message is intended for the use of the individual or
> entity
> > to
> > > which it is addressed and may contain information that is confidential,
> > > privileged and exempt from disclosure under applicable law. If the
> reader
> > > of this message is not the intended recipient, you are hereby notified
> > that
> > > any printing, copying, dissemination, distribution, disclosure or
> > > forwarding of this communication is strictly prohibited. If you have
> > > received this communication in error, please contact the sender
> > immediately
> > > and delete it from your system. Thank You.
> > >
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Review Request 25739: CBO: bail from Optiq planning if a Select list contains multiple references to the same name

2014-09-17 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25739/
---

Review request for hive and Harish Butani.


Bugs: HIVE-8159
https://issues.apache.org/jira/browse/HIVE-8159


Repository: hive-git


Description
---

CBO: bail from Optiq planning if a Select list contains multiple references to 
the same name


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveProjectRel.java
 4f760a2 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java
 0ba320c 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/DerivedTableInjector.java
 be740cc 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 465aebd 

Diff: https://reviews.apache.org/r/25739/diff/


Testing
---

timestamp_udf.q


Thanks,

Ashutosh Chauhan

[jira] [Updated] (HIVE-8159) CBO: bail from Optiq planning if a Select list contains multiple references to the same name

2014-09-17 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8159:
---
Attachment: HIVE-8159.cbo.patch

> CBO: bail from Optiq planning if a Select list contains multiple references 
> to the same name
> 
>
> Key: HIVE-8159
> URL: https://issues.apache.org/jira/browse/HIVE-8159
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.14.0
>
> Attachments: HIVE-8159.cbo.patch
>
>
> Optiq explicitly disallows this:
> {code}
> select x, x from t1
> {code}
> This is allowed in Hive (and also in mysql). Will check in Postgres. For now 
> bailing from Optiq planning if we see this. Will file an Optiq issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8159) CBO: bail from Optiq planning if a Select list contains multiple references to the same name

2014-09-17 Thread Harish Butani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137429#comment-14137429
 ] 

Harish Butani commented on HIVE-8159:
-

[~ashutoshc] I have filed OPTIQ-411. [~julianhyde] pointed point that this is 
supported in Optiq at the AST level, but not at the RelNode level. 
He suggested we look at how this is done in Optiq and do the same in Hive when 
we go from Hive AST to Optiq RelNodes

> CBO: bail from Optiq planning if a Select list contains multiple references 
> to the same name
> 
>
> Key: HIVE-8159
> URL: https://issues.apache.org/jira/browse/HIVE-8159
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.14.0
>
> Attachments: HIVE-8159.cbo.patch
>
>
> Optiq explicitly disallows this:
> {code}
> select x, x from t1
> {code}
> This is allowed in Hive (and also in mysql). Will check in Postgres. For now 
> bailing from Optiq planning if we see this. Will file an Optiq issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25739: CBO: bail from Optiq planning if a Select list contains multiple references to the same name

2014-09-17 Thread Harish Butani


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25739/#review53683
---

Ship it!


Ship It!

- Harish Butani


On Sept. 17, 2014, 3:47 p.m., Ashutosh Chauhan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25739/
> ---
> 
> (Updated Sept. 17, 2014, 3:47 p.m.)
> 
> 
> Review request for hive and Harish Butani.
> 
> 
> Bugs: HIVE-8159
> https://issues.apache.org/jira/browse/HIVE-8159
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> CBO: bail from Optiq planning if a Select list contains multiple references 
> to the same name
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/reloperators/HiveProjectRel.java
>  4f760a2 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/ASTConverter.java
>  0ba320c 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/DerivedTableInjector.java
>  be740cc 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 465aebd 
> 
> Diff: https://reviews.apache.org/r/25739/diff/
> 
> 
> Testing
> ---
> 
> timestamp_udf.q
> 
> 
> Thanks,
> 
> Ashutosh Chauhan
> 
>

[jira] [Commented] (HIVE-8159) CBO: bail from Optiq planning if a Select list contains multiple references to the same name

2014-09-17 Thread Harish Butani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137435#comment-14137435
 ] 

Harish Butani commented on HIVE-8159:
-

+1

> CBO: bail from Optiq planning if a Select list contains multiple references 
> to the same name
> 
>
> Key: HIVE-8159
> URL: https://issues.apache.org/jira/browse/HIVE-8159
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.14.0
>
> Attachments: HIVE-8159.cbo.patch
>
>
> Optiq explicitly disallows this:
> {code}
> select x, x from t1
> {code}
> This is allowed in Hive (and also in mysql). Will check in Postgres. For now 
> bailing from Optiq planning if we see this. Will file an Optiq issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-8159) CBO: bail from Optiq planning if a Select list contains multiple references to the same name

2014-09-17 Thread Harish Butani (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137429#comment-14137429
 ] 

Harish Butani edited comment on HIVE-8159 at 9/17/14 3:51 PM:
--

[~ashutoshc] I have filed OPTIQ-411. [~julianhyde] pointed out that this is 
supported in Optiq at the AST level, but not at the RelNode level. 
He suggested we look at how this is done in Optiq and do the same in Hive when 
we go from Hive AST to Optiq RelNodes


was (Author: rhbutani):
[~ashutoshc] I have filed OPTIQ-411. [~julianhyde] pointed point that this is 
supported in Optiq at the AST level, but not at the RelNode level. 
He suggested we look at how this is done in Optiq and do the same in Hive when 
we go from Hive AST to Optiq RelNodes

> CBO: bail from Optiq planning if a Select list contains multiple references 
> to the same name
> 
>
> Key: HIVE-8159
> URL: https://issues.apache.org/jira/browse/HIVE-8159
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.14.0
>
> Attachments: HIVE-8159.cbo.patch
>
>
> Optiq explicitly disallows this:
> {code}
> select x, x from t1
> {code}
> This is allowed in Hive (and also in mysql). Will check in Postgres. For now 
> bailing from Optiq planning if we see this. Will file an Optiq issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: moving Hive to git

2014-09-17 Thread Owen O'Malley

For me, the advantages of git are:
1. Each user's working copy contains the global history of the project. So
while I'm disconnected in an airplane; I can look at history and logs,
switch between branches, and do merges.
2. It makes it very easy to work on development branches and rebase off of
trunk to incorporate other people's changes. Tracking of which commits have
been merged in to a branch is much better in git than subversion.
3. I can easily share my development branches with others via github (or
Apache if we switch).
4. Tags can be signed with pgp keys and unlike subversion, you can't commit
into tags by mistake.

Thanks,
   Owen


On Wed, Sep 17, 2014 at 8:47 AM, Lefty Leverenz 
wrote:

> I'm not convinced git would be better.  Could someone please spell out the
> advantages?
>
> In particular:
>
>
>1. "... git is more powerful and easy to use (once you go past the
>learning curve!)"
>
>
> -- Lefty
>
> On Tue, Sep 16, 2014 at 4:27 PM, Sergey Shelukhin 
> wrote:
>
> > It seems there was consensus that we should move. Any volunteers to do
> it?
> > I can try to find the details on how HBase migrated.
> >
> > On Fri, Sep 12, 2014 at 7:10 PM, Lefty Leverenz  >
> > wrote:
> >
> > > We had a related discussion March 5 - 10:  llsmugcwkuryr5tb
> > >  through
> rq66qe2cpfgw7o5s
> > > .
> > >
> > >
> > > -- Lefty
> > >
> > > On Fri, Sep 12, 2014 at 9:33 PM, Sergey Shelukhin <
> > ser...@hortonworks.com>
> > > wrote:
> > >
> > > > Hi.
> > > > Many Apache projects are moving to git from svn; HBase moved
> recently,
> > > and
> > > > as far as I have heard Hadoop has moved too.
> > > > Are there any objections to moving Hive to use git too?
> > > > I wanted to start a preliminary discussion.
> > > >
> > > > --
> > > > CONFIDENTIALITY NOTICE
> > > > NOTICE: This message is intended for the use of the individual or
> > entity
> > > to
> > > > which it is addressed and may contain information that is
> confidential,
> > > > privileged and exempt from disclosure under applicable law. If the
> > reader
> > > > of this message is not the intended recipient, you are hereby
> notified
> > > that
> > > > any printing, copying, dissemination, distribution, disclosure or
> > > > forwarding of this communication is strictly prohibited. If you have
> > > > received this communication in error, please contact the sender
> > > immediately
> > > > and delete it from your system. Thank You.
> > > >
> > >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>

[jira] [Resolved] (HIVE-8159) CBO: bail from Optiq planning if a Select list contains multiple references to the same name

2014-09-17 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan resolved HIVE-8159.

Resolution: Fixed

Patch checked-in to trunk.
yeah.. we need to revisit this later. 

> CBO: bail from Optiq planning if a Select list contains multiple references 
> to the same name
> 
>
> Key: HIVE-8159
> URL: https://issues.apache.org/jira/browse/HIVE-8159
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 0.14.0
>
> Attachments: HIVE-8159.cbo.patch
>
>
> Optiq explicitly disallows this:
> {code}
> select x, x from t1
> {code}
> This is allowed in Hive (and also in mysql). Will check in Postgres. For now 
> bailing from Optiq planning if we see this. Will file an Optiq issue. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8071) hive shell tries to write hive-exec.jar for each run

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137456#comment-14137456
 ] 

Hive QA commented on HIVE-8071:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669168/HIVE-8071.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6279 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.streaming.TestStreaming.testRemainingTransactions
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/842/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/842/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-842/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669168

> hive shell tries to write hive-exec.jar for each run
> 
>
> Key: HIVE-8071
> URL: https://issues.apache.org/jira/browse/HIVE-8071
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
> Attachments: HIVE-8071.1.patch, HIVE-8071.2.patch
>
>
> For every run of the hive CLI there is a delay for the shell startup
> {noformat}
> 14/07/31 23:07:19 INFO Configuration.deprecation: fs.default.name is 
> deprecated. Instead, use fs.defaultFS
> 14/07/31 23:07:19 INFO tez.DagUtils: Hive jar directory is 
> hdfs://mac-10:8020/user/gopal/apps/2014-Jul-31/hive/
> 14/07/31 23:07:19 INFO tez.DagUtils: Localizing resource because it does not 
> exist: 
> file:/home/gopal/tez-autobuild/dist/hive/lib/hive-exec-0.14.0-SNAPSHOT.jar to 
> dest: 
> hdfs://mac-10:8020/user/gopal/apps/2014-Jul-31/hive/hive-exec-0.14.0-SNAPSHOTde1f82f0b5561d3db9e3080dfb2897210a3bda4ca5e7b14e881e381115837fd8.
> jar
> 14/07/31 23:07:19 INFO tez.DagUtils: Looks like another thread is writing the 
> same file will wait.
> 14/07/31 23:07:19 INFO tez.DagUtils: Number of wait attempts: 5. Wait 
> interval: 5000
> 14/07/31 23:07:19 INFO tez.DagUtils: Resource modification time: 1406870512963
> 14/07/31 23:07:20 INFO tez.TezSessionState: Opening new Tez Session (id: 
> 02d6b558-44cc-4182-b2f2-6a37ffdd25d2, scratch dir: 
> hdfs://mac-10:8020/tmp/hive-gopal/_tez_session_dir/02d6b558-44cc-4182-b2f2-6a37ffdd25d2)
> {noformat}
> Traced this to a method which does PRIVATE LRs - this is marked as PRIVATE 
> even if it is from a common install dir.
> {code}
>  public LocalResource localizeResource(Path src, Path dest, Configuration 
> conf)
> throws IOException {
> 
> return createLocalResource(destFS, dest, LocalResourceType.FILE,
> LocalResourceVisibility.PRIVATE);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8141) Refactor the GraphTran code by moving union handling logic to UnionTran [Spark Branch]

2014-09-17 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8141:
--
   Resolution: Fixed
Fix Version/s: spark-branch
   Status: Resolved  (was: Patch Available)

Patch committed to Spark branch. Thanks to Na for the contribution.

> Refactor the GraphTran code by moving union handling logic to UnionTran 
> [Spark Branch]
> --
>
> Key: HIVE-8141
> URL: https://issues.apache.org/jira/browse/HIVE-8141
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Na Yang
>Assignee: Na Yang
>  Labels: Spark-M1
> Fix For: spark-branch
>
> Attachments: HIVE-8141.1-spark.patch
>
>
> In the current hive on spark code, union logic is handled in the GraphTran 
> class. The Union logic could be moved to the UnionTran class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8160) Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]

2014-09-17 Thread Xuefu Zhang (JIRA)

Xuefu Zhang created HIVE-8160:
-

 Summary: Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]
 Key: HIVE-8160
 URL: https://issues.apache.org/jira/browse/HIVE-8160
 Project: Hive
  Issue Type: Task
  Components: Spark
Reporter: Xuefu Zhang
Priority: Minor


Hive on Spark needs SPARK-2978, which is now available in latest Spark main 
branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-2075) error in function count(distinct col)

2014-09-17 Thread Lars Francke (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke resolved HIVE-2075.

Resolution: Cannot Reproduce

> error in function  count(distinct col)
> --
>
> Key: HIVE-2075
> URL: https://issues.apache.org/jira/browse/HIVE-2075
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Serializers/Deserializers
>Affects Versions: 0.7.0
>Reporter: Alexey Diomin
>
> select 
>  N.cl1, 
>  count(distinct N.cl2) as cl2count,
>  N.cl3
> from raw N
> group by 
>  N.cl1,
>  N.cl2,
>  N.cl3
> not working on 0.7 (on 0.6 work correct), but count(N.cl2) work correct



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-1992) NCDFE: o.a.commons/configuration/Configuration with Hadoop 0.20.100

2014-09-17 Thread Lars Francke (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke resolved HIVE-1992.

Resolution: Invalid

> NCDFE: o.a.commons/configuration/Configuration with Hadoop 0.20.100
> ---
>
> Key: HIVE-1992
> URL: https://issues.apache.org/jira/browse/HIVE-1992
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 0.6.0
> Environment: Red Hat 2.6.18
>Reporter: Joep Rottinghuis
> Attachments: commons-configuration-1.6.jar, hive-1992.patch
>
>
> When compiling Hive against Hadoop 0.20.100 all tests and CLI commands fail 
> with NoClassDefFoundError: org/apache/commons/configuration/Configuration 
> (see log output example below):
> Hadoop classes depend on commons.configuration which is not available on 
> classpath with Hive 0.6.
> Sample test case failure (one of many):
> {code}
> test:
> [junit] Running org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore
> [junit] Unable to open the metastore
> [junit] java.lang.NoClassDefFoundError: 
> org/apache/commons/configuration/Configuration
> [junit]   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:37)
> [junit]   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.(DefaultMetricsSystem.java:34)
> [junit]   at 
> org.apache.hadoop.security.UgiInstrumentation.create(UgiInstrumentation.java:51)
> [junit]   at 
> org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:196)
> [junit]   at 
> org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:159)
> [junit]   at 
> org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:216)
> [junit]   at 
> org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:409)
> [junit]   at 
> org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:395)
> [junit]   at 
> org.apache.hadoop.fs.FileSystem$Cache$Key.(FileSystem.java:1418)
> [junit]   at 
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1319)
> [junit]   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:226)
> [junit]   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
> [junit]   at 
> org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:72)
> [junit]   at 
> org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:104)
> [junit]   at 
> org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:119)
> {code}
> As with HIVE-1990, I first applied HIVE-1264 and tweaked the build.properties 
> to compile Hive against locally built Hadoop 0.20.100 (see HADOOP-7108).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HIVE-2014) fix a few eclipse warnings

2014-09-17 Thread Lars Francke (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Francke resolved HIVE-2014.

Resolution: Won't Fix

> fix a few eclipse warnings
> --
>
> Key: HIVE-2014
> URL: https://issues.apache.org/jira/browse/HIVE-2014
> Project: Hive
>  Issue Type: Improvement
>Reporter: Jon Stevens
>Priority: Trivial
> Attachments: hive.diff
>
>
> I started to fix a couple of Eclipse warnings found via the ecj compiler.
> I'm not a fan of having warnings in code. Mostly because it tends to hide 
> potential real problems.
> I fixed a few of these in an effort to get the ball rolling.
> Seems jira can't attach a file during new issue creation. Will attach diff 
> after this is submitted.
> This is based off a master checkout from github.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25595: HIVE-8083: Authorization DDLs should not enforce hive identifier syntax for user or group namesname that

2014-09-17 Thread Prasad Mujumdar



> On Sept. 16, 2014, 11:43 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g, line 1541
> > 
> >
> > Is this supposed to be just identifier itself?

yes, it should be just identifier.


- Prasad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25595/#review53624
---


On Sept. 16, 2014, 6:29 p.m., Prasad Mujumdar wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25595/
> ---
> 
> (Updated Sept. 16, 2014, 6:29 p.m.)
> 
> 
> Review request for hive and Brock Noland.
> 
> 
> Bugs: HIVE-8083
> https://issues.apache.org/jira/browse/HIVE-8083
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The compiler expects principals (user, group and role) as hive identifiers 
> for authorization DDLs. The user and group are entities that belong to 
> external namespace and we can't expect those to follow hive identifier syntax 
> rules. For example, a userid or group can contain '-' which is not allowed by 
> compiler.
> The patch is to allow string literal for user and group names.
> The quoted identifier support perhaps can be made to work with this. However 
> IMO this syntax should be supported regardless of quoted identifier support 
> (which is an optional configuration)
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 25cd3a5 
>   ql/src/test/queries/clientpositive/authorization_non_id.q PRE-CREATION 
>   ql/src/test/results/clientpositive/authorization_non_id.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/25595/diff/
> 
> 
> Testing
> ---
> 
> Added test case to verify various auth DDLs with new syntax.
> 
> 
> Thanks,
> 
> Prasad Mujumdar
> 
>

moving Hive to git

2014-09-17 Thread E.L. Leverenz

This is the rest of the message I meant to send on the "moving Hive to git" 
thread, but then did an accidental send.  Apache rejected several attempts as 
spam, so I'm sending this from a different email account.

This list summarizes the previous discussion, with my questions/comments: 
1. "... git is more powerful and easy to use (once you go past the 
learning curve!)" [Thejas] -- that learning curve still intimidates me, which 
suggests it might also be daunting for newcomers.
2. "Switching to git from svn seems to be a proposal slightly different 
from that of switching to pull request from the head of the thread. Personally 
I'm +1 to git, but I think patches are very portable and widely adopted in 
Hadoop ecosystem and we should keep the practice." [Xuefu] -- could someone 
explain the patch issue?
3. "We need to keep patches in Jira  ... having a patch in the jira is 
critical I feel. We must at least have a perma link to the changes." [Edward] 
-- again, how are patches different in git?
4. "In my read of the Apache git - github integration blog post we 
cannot use pull requests as patches. Just that we'll be notified of them and 
could perhaps use them as code review." [Brock] -- okay, perhaps this answers 
my patch question.
5. "One additional item I think we should investigate is disabling 
merge commits on trunk and feature branches." -- uh oh, I'm slipping backwards 
on the learning curve.
6. "I do not think we want Pull Requests coming at us. Better way is 
let someone open a git branch for the changes, then we review and merge the 
branch." [Edward] -- okay, creeping back up the learning curve.
7. "I'm +1 on switching to git, but only if we can find a way to 
disable merge commits to trunk and feature branches. I'm -1 on switching to 
Github since, as far as I know, it only supports merge based workflows." [Carl]
8. "Agree with Carl about git merge commits, they make the changes hard 
to follow. But it should be OK, if there is no way to disable it in the main 
git repo, it is a small set of active committers, we can make a policy and 
expect people to follow it. But we should certainly disable 'git push -f' (and 
anything as distruptive)." [Thejas]-- that small set of committers is growing 
larger all the time.
-- Lefty

Re: Review Request 25595: HIVE-8083: Authorization DDLs should not enforce hive identifier syntax for user or group namesname that

2014-09-17 Thread Prasad Mujumdar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25595/
---

(Updated Sept. 17, 2014, 5:08 p.m.)


Review request for hive and Brock Noland.


Changes
---

Addressed review feedback
Used QuotedIdentifier rule directly. This ensures that the back quotes are 
removed from the user or group name.
Updated test case.


Bugs: HIVE-8083
https://issues.apache.org/jira/browse/HIVE-8083


Repository: hive-git


Description
---

The compiler expects principals (user, group and role) as hive identifiers for 
authorization DDLs. The user and group are entities that belong to external 
namespace and we can't expect those to follow hive identifier syntax rules. For 
example, a userid or group can contain '-' which is not allowed by compiler.
The patch is to allow string literal for user and group names.
The quoted identifier support perhaps can be made to work with this. However 
IMO this syntax should be supported regardless of quoted identifier support 
(which is an optional configuration)


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g 25cd3a5 
  ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g 34d2dfc 
  ql/src/test/queries/clientpositive/authorization_non_id.q PRE-CREATION 
  ql/src/test/results/clientpositive/authorization_non_id.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/25595/diff/


Testing
---

Added test case to verify various auth DDLs with new syntax.


Thanks,

Prasad Mujumdar

[jira] [Updated] (HIVE-8083) Authorization DDLs should not enforce hive identifier syntax for user or group

2014-09-17 Thread Prasad Mujumdar (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasad Mujumdar updated HIVE-8083:
--
Attachment: HIVE-8083.3.patch

Updated patch. Addresses review feedback.

> Authorization DDLs should not enforce hive identifier syntax for user or group
> --
>
> Key: HIVE-8083
> URL: https://issues.apache.org/jira/browse/HIVE-8083
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, SQLStandardAuthorization
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Attachments: HIVE-8083.1.patch, HIVE-8083.2.patch, HIVE-8083.3.patch
>
>
> The compiler expects principals (user, group and role) as hive identifiers 
> for authorization DDLs. The user and group are entities that belong to 
> external namespace and we can't expect those to follow hive identifier syntax 
> rules. For example, a userid or group can contain '-' which is not allowed by 
> compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: moving Hive to git

2014-09-17 Thread Rajat Ratewal

Can we look at how this is done in Hbase or Hadoop. If these projects
migrated on git then I am sure that they might have faced similar issues.
On 17 Sep 2014 22:37, "E.L. Leverenz"  wrote:

> This is the rest of the message I meant to send on the "moving Hive to
> git" thread, but then did an accidental send.  Apache rejected several
> attempts as spam, so I'm sending this from a different email account.
>
> This list summarizes the previous discussion, with my questions/comments:
> 1. "... git is more powerful and easy to use (once you go past the
> learning curve!)" [Thejas] -- that learning curve still intimidates me,
> which suggests it might also be daunting for newcomers.
> 2. "Switching to git from svn seems to be a proposal slightly
> different from that of switching to pull request from the head of the
> thread. Personally I'm +1 to git, but I think patches are very portable and
> widely adopted in Hadoop ecosystem and we should keep the practice."
> [Xuefu] -- could someone explain the patch issue?
> 3. "We need to keep patches in Jira  ... having a patch in the
> jira is critical I feel. We must at least have a perma link to the
> changes." [Edward] -- again, how are patches different in git?
> 4. "In my read of the Apache git - github integration blog post we
> cannot use pull requests as patches. Just that we'll be notified of them
> and could perhaps use them as code review." [Brock] -- okay, perhaps this
> answers my patch question.
> 5. "One additional item I think we should investigate is disabling
> merge commits on trunk and feature branches." -- uh oh, I'm slipping
> backwards on the learning curve.
> 6. "I do not think we want Pull Requests coming at us. Better way
> is let someone open a git branch for the changes, then we review and merge
> the branch." [Edward] -- okay, creeping back up the learning curve.
> 7. "I'm +1 on switching to git, but only if we can find a way to
> disable merge commits to trunk and feature branches. I'm -1 on switching to
> Github since, as far as I know, it only supports merge based workflows."
> [Carl]
> 8. "Agree with Carl about git merge commits, they make the changes
> hard to follow. But it should be OK, if there is no way to disable it in
> the main git repo, it is a small set of active committers, we can make a
> policy and expect people to follow it. But we should certainly disable 'git
> push -f' (and anything as distruptive)." [Thejas]-- that small set of
> committers is growing larger all the time.
> -- Lefty

[jira] [Commented] (HIVE-8083) Authorization DDLs should not enforce hive identifier syntax for user or group

2014-09-17 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137581#comment-14137581
 ] 

Xuefu Zhang commented on HIVE-8083:
---

+1 pending on test result.

> Authorization DDLs should not enforce hive identifier syntax for user or group
> --
>
> Key: HIVE-8083
> URL: https://issues.apache.org/jira/browse/HIVE-8083
> Project: Hive
>  Issue Type: Bug
>  Components: SQL, SQLStandardAuthorization
>Affects Versions: 0.13.0, 0.13.1
>Reporter: Prasad Mujumdar
>Assignee: Prasad Mujumdar
> Attachments: HIVE-8083.1.patch, HIVE-8083.2.patch, HIVE-8083.3.patch
>
>
> The compiler expects principals (user, group and role) as hive identifiers 
> for authorization DDLs. The user and group are entities that belong to 
> external namespace and we can't expect those to follow hive identifier syntax 
> rules. For example, a userid or group can contain '-' which is not allowed by 
> compiler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8137) Empty ORC file handling

2014-09-17 Thread Pankit Thapar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137591#comment-14137591
 ] 

Pankit Thapar commented on HIVE-8137:
-

I think Tez works in this case  because in Tez related code flow, hive creates 
files for empty tables.
I dont know if that would be the right approach for OrcInputFormat.
Also, one way to avoid creating split would be to list file status in 
CombineHiveInputFormat.getSplits() and filter out zero length files. then pass 
on that list to hadoop. But going this way, we add an O(n) overhead of getting 
file status.


> Empty ORC file handling
> ---
>
> Key: HIVE-8137
> URL: https://issues.apache.org/jira/browse/HIVE-8137
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats
>Affects Versions: 0.13.1
>Reporter: Pankit Thapar
> Fix For: 0.14.0
>
>
> Hive 13 does not handle reading of a zero size Orc File properly. An Orc file 
> is suposed to have a post-script
> which the ReaderIml class tries to read and initialize the footer with it. 
> But in case, the file is empty 
> or is of zero size, then it runs into an IndexOutOfBound Exception because of 
> ReaderImpl trying to read in its constructor.
> Code Snippet : 
> //get length of PostScript
> int psLen = buffer.get(readSize - 1) & 0xff; 
> In the above code, readSize for an empty file is zero.
> I see that ensureOrcFooter() method performs some sanity checks for footer , 
> so, either we can move the above code snippet to ensureOrcFooter() and throw 
> a "Malformed ORC file exception" or we can create a dummy Reader that does 
> not initialize footer and basically has hasNext() set to false so that it 
> returns false on the first call.
> Basically, I would like to know what might be the correct way to handle an 
> empty ORC file in a mapred job?
> Should we neglect it and not throw an exception or we can throw an exeption 
> that the ORC file is malformed.
> Please let me know your thoughts on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8142) Add merge operators to queryplan.thrift instead of generated source file

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137606#comment-14137606
 ] 

Hive QA commented on HIVE-8142:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669178/HIVE-8142.1.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6279 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.parse.TestParse.testParse_union
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/844/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/844/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-844/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669178

> Add merge operators to queryplan.thrift instead of generated source file
> 
>
> Key: HIVE-8142
> URL: https://issues.apache.org/jira/browse/HIVE-8142
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Minor
> Attachments: HIVE-8142.1.patch
>
>
> HIVE-7704 added two new operators for fast file merging to OperatorType.java 
> which is a generated source file. Instead the operators should be added to 
> queryplan.thrift



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8113) Derby server fails to start on windows

2014-09-17 Thread Sushanth Sowmyan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137617#comment-14137617
 ] 

Sushanth Sowmyan commented on HIVE-8113:


Actually, I was thinking about this further, and as I was getting ready to 
commit this patch, I saw this comment in metastore/pom.xml:

{noformat}


  org.apache.derby
  derby
  ${derby.version}

{noformat}

and that made me wonder why we use that dependency in that pom.xml without 
specifying scope as test, and that made me think again about including derbynet 
also as a core dependency, which I don't think we should. I think the way to 
proceed ahead with this would be to do the following:

a) Create a new profile (maybe called "windows" ?). There already seems to be a 
profile called windows-test (maybe combine with that?)
b) Add the derbynet dependency in as a dependency only if that profile is 
active, and add it in a packaging scope, not in compile or test. Adding it in 
the compile scope will guarantee that it's available in lib/, but the problem 
with that is that we'll pass on those additional dependencies to any other 
modules that want to include this even as a compile-time dependency.
c) Additionally, we should probably change that derby dependency to be limited 
to test scope as well.



> Derby server fails to start on windows
> --
>
> Key: HIVE-8113
> URL: https://issues.apache.org/jira/browse/HIVE-8113
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-8113.1.patch
>
>
> %HIVE_HOME%\lib\derby-10.10.1.1.jar
> doesn't contain the main class 
> org.apache.derby.drda.NetworkServerControl
> referenced in
> %HIVE_HOME%\bin\derbyserver.cmd



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8043) Support merging small files [Spark Branch]

2014-09-17 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137618#comment-14137618
 ] 

Xuefu Zhang commented on HIVE-8043:
---

[~lirui] Thanks for your detailed analysis. I think we need to verify the 
following:

1. File merging (either thru DLL or hive settings) needs to work for all data 
formats regardless executioin engine type. That includes RC, ORC, and other 
formats. Please verify that with Spark, file merging works. If not, check MR.

2. The improvement made in HIVE-7704 might be Tez only. If this the case, 
please identify the work that needs to be done to support that, but we don't 
have to implement it now, as it's an optimization, which can be done in later 
milestones.

Thanks.

> Support merging small files [Spark Branch]
> --
>
> Key: HIVE-8043
> URL: https://issues.apache.org/jira/browse/HIVE-8043
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
>  Labels: Spark-M1
>
> Hive currently supports merging small files with MR as the execution engine. 
> There are options available for this, such as 
> {code}
> hive.merge.mapfiles
> hive.merge.mapredfiles
> {code}
> Hive.merge.sparkfiles is already introduced in HIVE-7810. To make it work, we 
> might need a little more research and design on this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: moving Hive to git

2014-09-17 Thread Brock Noland

Hi,

I am generally +1 on the proposal. I'd strongly want to disable merge
commits. They are far too easy to accidently push. If there is no option to
disable them, one option would be to do what we did in flume.

Basically:

1) Trunk operates as normal.
2) We always have the next release branch open
3) Every commit is committed to trunk and immediately cherry-picked to the
release branch. We could use a script to automate this.

Brock

Re: [ANNOUNCE] New Hive Committer - Eugene Koifman

2014-09-17 Thread Brock Noland

Congratulations Eugene! Well deserved.

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137633#comment-14137633
 ] 

Sergey Shelukhin commented on HIVE-7950:


Can you post and RB? Mostly nits like if arrays are null there's no need to copy

> StorageHandler resources aren't added to Tez Session if already Session is 
> already Open
> ---
>
> Key: HIVE-7950
> URL: https://issues.apache.org/jira/browse/HIVE-7950
> Project: Hive
>  Issue Type: Bug
>  Components: StorageHandler, Tez
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 0.14.0
>
> Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, 
> hive-7950-tez-WIP.diff
>
>
> Was trying to run some queries using the AccumuloStorageHandler when using 
> the Tez execution engine. Some things that classes which were added to 
> tmpjars weren't making it into the container. When a Tez Session is already 
> open, as is the normal case when simply using the `hive` command, the 
> resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7950) StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-17 Thread Josh Elser (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137640#comment-14137640
 ] 

Josh Elser commented on HIVE-7950:
--

Sure thing. RB is linked.

> StorageHandler resources aren't added to Tez Session if already Session is 
> already Open
> ---
>
> Key: HIVE-7950
> URL: https://issues.apache.org/jira/browse/HIVE-7950
> Project: Hive
>  Issue Type: Bug
>  Components: StorageHandler, Tez
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 0.14.0
>
> Attachments: HIVE-7950-1.diff, HIVE-7950.2.patch, HIVE-7950.3.patch, 
> hive-7950-tez-WIP.diff
>
>
> Was trying to run some queries using the AccumuloStorageHandler when using 
> the Tez execution engine. Some things that classes which were added to 
> tmpjars weren't making it into the container. When a Tez Session is already 
> open, as is the normal case when simply using the `hive` command, the 
> resources aren't added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 25743: StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-17 Thread Josh Elser


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25743/
---

Review request for hive.


Bugs: HIVE-7950
https://issues.apache.org/jira/browse/HIVE-7950


Repository: hive-git


Description
---

Was trying to run some queries using the AccumuloStorageHandler when using the 
Tez execution engine. Some things that classes which were added to tmpjars 
weren't making it into the container. When a Tez Session is already open, as is 
the normal case when simply using the `hive` command, the resources aren't 
added.


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
0d0ac41 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 428e0ff 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TezWork.java 456b5eb 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 
ad5a6e7 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 45ab672 

Diff: https://reviews.apache.org/r/25743/diff/


Testing
---

Ran ql/ unit tests, tested AccumuloStorageHandler in local deployment after 
killing tez session.


Thanks,

Josh Elser

Re: moving Hive to git

2014-09-17 Thread Sergey Shelukhin

I can check how HBase operates without merge commits... cherry-picking
seems tedious, at least without the script - too easy to forget, and that
would arguably be more harmful than a stray merge commit.

On Wed, Sep 17, 2014 at 10:47 AM, Brock Noland  wrote:

> Hi,
>
> I am generally +1 on the proposal. I'd strongly want to disable merge
> commits. They are far too easy to accidently push. If there is no option to
> disable them, one option would be to do what we did in flume.
>
> Basically:
>
> 1) Trunk operates as normal.
> 2) We always have the next release branch open
> 3) Every commit is committed to trunk and immediately cherry-picked to the
> release branch. We could use a script to automate this.
>
> Brock
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Updated] (HIVE-8153) Reduce the verbosity of debug logs in ORC record reader

2014-09-17 Thread Prasanth J (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-8153:
-
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. This patch just changes from DEBUG to TRACE logging. No 
other code change that warrants a HIVE QA test run.

> Reduce the verbosity of debug logs in ORC record reader
> ---
>
> Key: HIVE-8153
> URL: https://issues.apache.org/jira/browse/HIVE-8153
> Project: Hive
>  Issue Type: Bug
>  Components: File Formats
>Affects Versions: 0.14.0
>Reporter: Prasanth J
>Assignee: Prasanth J
>Priority: Trivial
> Fix For: 0.14.0
>
> Attachments: HIVE-8153.1.patch, HIVE-8153.2.patch
>
>
> Following fields are logged for every row.
> {code}
> if (LOG.isDebugEnabled()) {
>   LOG.debug("row from " + reader.path);
>   LOG.debug("orc row = " + result);
> }
> {code}
> This should be moved to trace logging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Review Request 25743: StorageHandler resources aren't added to Tez Session if already Session is already Open

2014-09-17 Thread Sergey Shelukhin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25743/#review53700
---



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java


please log error;
in fact, I wonder if it's wise to ignore exceptions. Why not let it thru?



ql/src/java/org/apache/hadoop/hive/ql/plan/TezWork.java


it's not necessary to copy if any of the lists are null or empty


- Sergey Shelukhin


On Sept. 17, 2014, 5:56 p.m., Josh Elser wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/25743/
> ---
> 
> (Updated Sept. 17, 2014, 5:56 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-7950
> https://issues.apache.org/jira/browse/HIVE-7950
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Was trying to run some queries using the AccumuloStorageHandler when using 
> the Tez execution engine. Some things that classes which were added to 
> tmpjars weren't making it into the container. When a Tez Session is already 
> open, as is the normal case when simply using the `hive` command, the 
> resources aren't added.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 
> 0d0ac41 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 428e0ff 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TezWork.java 456b5eb 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 
> ad5a6e7 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 45ab672 
> 
> Diff: https://reviews.apache.org/r/25743/diff/
> 
> 
> Testing
> ---
> 
> Ran ql/ unit tests, tested AccumuloStorageHandler in local deployment after 
> killing tez session.
> 
> 
> Thanks,
> 
> Josh Elser
> 
>

Re: moving Hive to git

2014-09-17 Thread Sergey Shelukhin

I'd say learning curve for git is shorter than for svn, especially for
newcomers, since git is widely adopted and used (afaik most people use git
mirror for development even now, and only use svn to push).
svn in my experience makes it hard to do things, as a general principle.
I've had to write bash scripts to replace one git command (e.g. git clean),
and branching is difficult.

On Wed, Sep 17, 2014 at 9:31 AM, E.L. Leverenz  wrote:

> This is the rest of the message I meant to send on the "moving Hive to
> git" thread, but then did an accidental send.  Apache rejected several
> attempts as spam, so I'm sending this from a different email account.
>
> This list summarizes the previous discussion, with my questions/comments:
> 1. "... git is more powerful and easy to use (once you go past the
> learning curve!)" [Thejas] -- that learning curve still intimidates me,
> which suggests it might also be daunting for newcomers.
> 2. "Switching to git from svn seems to be a proposal slightly
> different from that of switching to pull request from the head of the
> thread. Personally I'm +1 to git, but I think patches are very portable and
> widely adopted in Hadoop ecosystem and we should keep the practice."
> [Xuefu] -- could someone explain the patch issue?
> 3. "We need to keep patches in Jira  ... having a patch in the
> jira is critical I feel. We must at least have a perma link to the
> changes." [Edward] -- again, how are patches different in git?
> 4. "In my read of the Apache git - github integration blog post we
> cannot use pull requests as patches. Just that we'll be notified of them
> and could perhaps use them as code review." [Brock] -- okay, perhaps this
> answers my patch question.
> 5. "One additional item I think we should investigate is disabling
> merge commits on trunk and feature branches." -- uh oh, I'm slipping
> backwards on the learning curve.
> 6. "I do not think we want Pull Requests coming at us. Better way
> is let someone open a git branch for the changes, then we review and merge
> the branch." [Edward] -- okay, creeping back up the learning curve.
> 7. "I'm +1 on switching to git, but only if we can find a way to
> disable merge commits to trunk and feature branches. I'm -1 on switching to
> Github since, as far as I know, it only supports merge based workflows."
> [Carl]
> 8. "Agree with Carl about git merge commits, they make the changes
> hard to follow. But it should be OK, if there is no way to disable it in
> the main git repo, it is a small set of active committers, we can make a
> policy and expect people to follow it. But we should certainly disable 'git
> push -f' (and anything as distruptive)." [Thejas]-- that small set of
> committers is growing larger all the time.
> -- Lefty

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

[jira] [Created] (HIVE-8161) [CBO] Partition pruner doesnt handle unpartitioned table in non-strict mode correctly

2014-09-17 Thread Ashutosh Chauhan (JIRA)

Ashutosh Chauhan created HIVE-8161:
--

 Summary: [CBO] Partition pruner doesnt handle unpartitioned table 
in non-strict mode correctly
 Key: HIVE-8161
 URL: https://issues.apache.org/jira/browse/HIVE-8161
 Project: Hive
  Issue Type: Bug
  Components: CBO
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 25744: [CBO] Partition pruner doesnt handle unpartitioned table in non-strict mode correctly

2014-09-17 Thread Ashutosh Chauhan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25744/
---

Review request for hive and John Pullokkaran.


Bugs: HIVE-8161
https://issues.apache.org/jira/browse/HIVE-8161


Repository: hive-git


Description
---

[CBO] Partition pruner doesnt handle unpartitioned table in non-strict mode 
correctly


Diffs
-

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java 
8df86bb 

Diff: https://reviews.apache.org/r/25744/diff/


Testing
---

create_view_partitioned.q


Thanks,

Ashutosh Chauhan

[jira] [Updated] (HIVE-8161) [CBO] Partition pruner doesnt handle unpartitioned table in non-strict mode correctly

2014-09-17 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8161:
---
Status: Patch Available  (was: Open)

> [CBO] Partition pruner doesnt handle unpartitioned table in non-strict mode 
> correctly
> -
>
> Key: HIVE-8161
> URL: https://issues.apache.org/jira/browse/HIVE-8161
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-8161.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8161) [CBO] Partition pruner doesnt handle unpartitioned table in non-strict mode correctly

2014-09-17 Thread Ashutosh Chauhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-8161:
---
Attachment: HIVE-8161.cbo.patch

> [CBO] Partition pruner doesnt handle unpartitioned table in non-strict mode 
> correctly
> -
>
> Key: HIVE-8161
> URL: https://issues.apache.org/jira/browse/HIVE-8161
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-8161.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8162) hive.optimize.sort.dynamic.partition causes RuntimeException for inserting into dynamic partitioned table when map function is used in the subquery

2014-09-17 Thread Na Yang (JIRA)

Na Yang created HIVE-8162:
-

 Summary: hive.optimize.sort.dynamic.partition causes 
RuntimeException for inserting into dynamic partitioned table when map function 
is used in the subquery 
 Key: HIVE-8162
 URL: https://issues.apache.org/jira/browse/HIVE-8162
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.13.0
Reporter: Na Yang


Exception:
Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
Hive Runtime Error: Unable to deserialize reduce input key from 
x1x129x51x83x14x1x128x0x0x2x1x1x1x120x95x112x114x111x100x117x99x116x95x105x100x0x1x0x0x255
 with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2, 
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
 serialization.sort.order=+++, columns.types=int,map,int}
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:518)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:462)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122)
at org.apache.hadoop.mapred.Child.main(Child.java:271)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
Error: Unable to deserialize reduce input key from 
x1x129x51x83x14x1x128x0x0x2x1x1x1x120x95x112x114x111x100x117x99x116x95x105x100x0x1x0x0x255
 with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2, 
serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
 serialization.sort.order=+++, columns.types=int,map,int}
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:222)
... 7 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:189)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:220)
... 7 more
Caused by: java.io.EOFException
at 
org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:533)
at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:236)
at 
org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:185)
... 8 more


Step to reproduce the exception:
-
CREATE TABLE associateddata(creative_id int,creative_group_id int,placement_id
int,sm_campaign_id int,browser_id string, trans_type_p string,trans_time_p
string,group_name string,event_name string,order_id string,revenue
float,currency string, trans_type_ci string,trans_time_ci string,f16
map,campaign_id int,user_agent_cat string,geo_country
string,geo_city string,geo_state string,geo_zip string,geo_dma string,geo_area
string,geo_isp string,site_id int,section_id int,f16_ci map)
PARTITIONED BY(day_id int, hour_id int) ROW FORMAT DELIMITED FIELDS TERMINATED
BY '\t';

LOAD DATA LOCAL INPATH '/tmp/47rows.txt' INTO TABLE associateddata
PARTITION(day_id=20140814,hour_id=2014081417);

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict; 

CREATE  EXTERNAL TABLE IF NOT EXISTS agg_pv_associateddata_c (
 vt_tran_qty int COMMENT 'The count of view
thru transactions'
, pair_value_txt  string  COMMENT 'F16 name values
pairs'
)
PARTITIONED BY (day_id int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE
LOCATION '/user/prodman/agg_pv_associateddata_c';

INSERT INTO TABLE agg_pv_associateddata_c PARTITION (day_id)
select 2 as vt_tran_qty, pair_value_txt, day_id
 from (select map( 'x_product_id',coalesce(F16['x_product_id'],'') ) as 
pair_value_txt , day_id , hour_id 
from associateddata where hour_id = 2014081417 and sm_campaign_id in
(10187171,1090942,10541943,10833443,8635630,10187170,9445296,10696334,11398585,9524211,1145211)
) a GROUP BY pair_value_txt, day_id;

The query worked fine in Hive-0.12 and Hive-0.13. It starts failing in 
Hive-0.13. If hive.optimize.sort.dynamic.partition is turned off in Hive-0.13, 
the exception is gone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8162) hive.optimize.sort.dynamic.partition causes RuntimeException for inserting into dynamic partitioned table when map function is used in the subquery

2014-09-17 Thread Na Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Na Yang updated HIVE-8162:
--
Attachment: 47rows.txt

The data file for the query is attached

> hive.optimize.sort.dynamic.partition causes RuntimeException for inserting 
> into dynamic partitioned table when map function is used in the subquery 
> 
>
> Key: HIVE-8162
> URL: https://issues.apache.org/jira/browse/HIVE-8162
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.13.0
>Reporter: Na Yang
> Attachments: 47rows.txt
>
>
> Exception:
> Diagnostic Messages for this Task:
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error: Unable to deserialize reduce input key from 
> x1x129x51x83x14x1x128x0x0x2x1x1x1x120x95x112x114x111x100x117x99x116x95x105x100x0x1x0x0x255
>  with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+++, columns.types=int,map,int}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:283)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:518)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:462)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1122)
>   at org.apache.hadoop.mapred.Child.main(Child.java:271)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error: Unable to deserialize reduce input key from 
> x1x129x51x83x14x1x128x0x0x2x1x1x1x120x95x112x114x111x100x117x99x116x95x105x100x0x1x0x0x255
>  with properties {columns=reducesinkkey0,reducesinkkey1,reducesinkkey2, 
> serialization.lib=org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe,
>  serialization.sort.order=+++, columns.types=int,map,int}
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:222)
>   ... 7 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:189)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:220)
>   ... 7 more
> Caused by: java.io.EOFException
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.InputByteBuffer.read(InputByteBuffer.java:54)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserializeInt(BinarySortableSerDe.java:533)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:236)
>   at 
> org.apache.hadoop.hive.serde2.binarysortable.BinarySortableSerDe.deserialize(BinarySortableSerDe.java:185)
>   ... 8 more
> Step to reproduce the exception:
> -
> CREATE TABLE associateddata(creative_id int,creative_group_id int,placement_id
> int,sm_campaign_id int,browser_id string, trans_type_p string,trans_time_p
> string,group_name string,event_name string,order_id string,revenue
> float,currency string, trans_type_ci string,trans_time_ci string,f16
> map,campaign_id int,user_agent_cat string,geo_country
> string,geo_city string,geo_state string,geo_zip string,geo_dma string,geo_area
> string,geo_isp string,site_id int,section_id int,f16_ci map)
> PARTITIONED BY(day_id int, hour_id int) ROW FORMAT DELIMITED FIELDS TERMINATED
> BY '\t';
> LOAD DATA LOCAL INPATH '/tmp/47rows.txt' INTO TABLE associateddata
> PARTITION(day_id=20140814,hour_id=2014081417);
> set hive.exec.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict; 
> CREATE  EXTERNAL TABLE IF NOT EXISTS agg_pv_associateddata_c (
>  vt_tran_qty int COMMENT 'The count of view
> thru transactions'
> , pair_value_txt  string  COMMENT 'F16 name values
> pairs'
> )
> PARTITIONED BY (day_id int)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE
> LOCATION '/user/prodman/agg_pv_associateddata_c';
> INSERT INTO TABLE agg_pv_associateddata_c PARTITION (day_id)
> select 2 as vt_tran_qty, pair_value_txt, day_id
>  from (select map( 'x_product_id',coalesce(F16['x_product_id'],'') ) as 
> pair_value_txt , day_id , hour_id 
> from associateddata where hour_id = 2014081417 and sm_campaign_id in
> (10187171,1090942,10541943,10833443,8635630,10187170,9445296,10696334,11398585,9524211,11452

[jira] [Commented] (HIVE-8143) Create root scratch dir with 733 instead of 777 perms

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137761#comment-14137761
 ] 

Hive QA commented on HIVE-8143:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669243/HIVE-8143.2.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 6279 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_schemeAuthority
org.apache.hadoop.hive.ql.parse.TestParse.testParse_union
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/845/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/845/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-845/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669243

> Create root scratch dir with 733 instead of 777 perms
> -
>
> Key: HIVE-8143
> URL: https://issues.apache.org/jira/browse/HIVE-8143
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
> Attachments: HIVE-8143.1.patch, HIVE-8143.2.patch
>
>
> hive.exec.scratchdir which is treated as the root scratch directory on hdfs 
> only needs to be writable by all. We can use 733 instead of 777 for doing 
> that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7985) With CBO enabled cross product is generated when a subquery is present

2014-09-17 Thread Mostafa Mokhtar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137768#comment-14137768
 ] 

Mostafa Mokhtar commented on HIVE-7985:
---

Q56 also has the same issues 
Query 
{code}
with ss as (
 select i_item_id,sum(ss_ext_sales_price) total_sales
 from
store_sales,
date_dim,
 customer_address,
 item
 where item.i_item_id in (select
 i.i_item_id
from item i
where i_color in ('purple','burlywood','indian'))
 and ss_item_sk  = i_item_sk
 and ss_sold_date_sk = d_date_sk
 and d_year  = 2001
 and d_moy   = 1
 and ss_addr_sk  = ca_address_sk
 and ca_gmt_offset   = -6 
 group by i_item_id),
 cs as (
 select i_item_id,sum(cs_ext_sales_price) total_sales
 from
catalog_sales,
date_dim,
 customer_address,
 item
 where
 item.i_item_id   in (select
  i.i_item_id
from item i
where i_color in ('purple','burlywood','indian'))
 and cs_item_sk  = i_item_sk
 and cs_sold_date_sk = d_date_sk
 and d_year  = 2001
 and d_moy   = 1
 and cs_bill_addr_sk = ca_address_sk
 and ca_gmt_offset   = -6 
 group by i_item_id),
 ws as (
 select i_item_id,sum(ws_ext_sales_price) total_sales
 from
web_sales,
date_dim,
 customer_address,
 item
 where
 item.i_item_id   in (select
  i.i_item_id
from item i
where i_color in ('purple','burlywood','indian'))
 and ws_item_sk  = i_item_sk
 and ws_sold_date_sk = d_date_sk
 and d_year  = 2001
 and d_moy   = 1
 and ws_bill_addr_sk = ca_address_sk
 and ca_gmt_offset   = -6
 group by i_item_id)
  select  i_item_id ,sum(total_sales) total_sales
 from  (select * from ss 
union all
select * from cs 
union all
select * from ws) tmp1
 group by i_item_id
 order by total_sales
 limit 100
{code}


Plan 
{code}
Warning: Map Join MAPJOIN[177][bigTable=?] in task 'Map 8' is a cross product
Warning: Map Join MAPJOIN[178][bigTable=?] in task 'Map 21' is a cross product
Warning: Map Join MAPJOIN[179][bigTable=web_sales] in task 'Map 14' is a cross 
product
Warning: Map Join MAPJOIN[180][bigTable=?] in task 'Map 20' is a cross product
Warning: Map Join MAPJOIN[181][bigTable=?] in task 'Map 22' is a cross product
Warning: Map Join MAPJOIN[182][bigTable=store_sales] in task 'Map 17' is a 
cross product
Warning: Map Join MAPJOIN[174][bigTable=?] in task 'Map 13' is a cross product
Warning: Map Join MAPJOIN[175][bigTable=?] in task 'Map 10' is a cross product
Warning: Map Join MAPJOIN[176][bigTable=catalog_sales] in task 'Map 2' is a 
cross product
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
Tez
  Edges:
Map 10 <- Map 13 (BROADCAST_EDGE)
Map 13 <- Map 16 (BROADCAST_EDGE)
Map 14 <- Map 11 (BROADCAST_EDGE), Map 21 (BROADCAST_EDGE)
Map 17 <- Map 12 (BROADCAST_EDGE), Map 22 (BROADCAST_EDGE)
Map 2 <- Map 1 (BROADCAST_EDGE), Map 10 (BROADCAST_EDGE)
Map 20 <- Map 9 (BROADCAST_EDGE)
Map 21 <- Map 8 (BROADCAST_EDGE)
Map 22 <- Map 20 (BROADCAST_EDGE)
Map 8 <- Map 19 (BROADCAST_EDGE)
Reducer 15 <- Map 14 (SIMPLE_EDGE), Union 4 (CONTAINS)
Reducer 18 <- Map 17 (SIMPLE_EDGE), Union 4 (CONTAINS)
Reducer 3 <- Map 2 (SIMPLE_EDGE), Union 4 (CONTAINS)
Reducer 5 <- Union 4 (SIMPLE_EDGE)
Reducer 6 <- Reducer 5 (SIMPLE_EDGE)
Reducer 7 <- Reducer 6 (SIMPLE_EDGE)
  DagName: mmokhtar_2014091623_107f445a-d48f-4a42-89a1-8c53eaa8dec0:1
  Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: i
  filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') 
and i_item_id is not null) (type: boolean)
  Statistics: Num rows: 48000 Data size: 68732712 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: ((i_color) IN ('purple', 'burlywood', 'indian') 
and i_item_id is not null) (type: boolean)
Statistics: Num rows: 12000 Data size: 17183178 Basic 
stats: COMPLETE Column stats: NONE
Select Operator
  expressions: i_item_id (type: string)
  outputColumnNames: _col0
  Statistics: Num rows: 12000 Data size: 17183178 Basic 
stats: COMPLETE Column stats: NONE
  Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statisti

[jira] [Commented] (HIVE-8161) [CBO] Partition pruner doesnt handle unpartitioned table in non-strict mode correctly

2014-09-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137773#comment-14137773
 ] 

Sergey Shelukhin commented on HIVE-8161:


+1

> [CBO] Partition pruner doesnt handle unpartitioned table in non-strict mode 
> correctly
> -
>
> Key: HIVE-8161
> URL: https://issues.apache.org/jira/browse/HIVE-8161
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-8161.cbo.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: moving Hive to git

2014-09-17 Thread Mohit Sabharwal

Reg. disabling merge commits, if Apache is ok installing git server-side
hook scripts, setting up a pre-receive hook could be a possible option:
  
http://stackoverflow.com/questions/2039773/have-remote-git-repository-refuse-merge-commits-on-push

On Wed, Sep 17, 2014 at 10:56 AM, Sergey Shelukhin
 wrote:
> I can check how HBase operates without merge commits... cherry-picking
> seems tedious, at least without the script - too easy to forget, and that
> would arguably be more harmful than a stray merge commit.
>
> On Wed, Sep 17, 2014 at 10:47 AM, Brock Noland  wrote:
>
>> Hi,
>>
>> I am generally +1 on the proposal. I'd strongly want to disable merge
>> commits. They are far too easy to accidently push. If there is no option to
>> disable them, one option would be to do what we did in flume.
>>
>> Basically:
>>
>> 1) Trunk operates as normal.
>> 2) We always have the next release branch open
>> 3) Every commit is committed to trunk and immediately cherry-picked to the
>> release branch. We could use a script to automate this.
>>
>> Brock
>>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

[jira] [Updated] (HIVE-8045) SQL standard auth with cli - Errors and configuration issues

2014-09-17 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-8045:

Attachment: HIVE-8045.1.patch

> SQL standard auth with cli - Errors and configuration issues
> 
>
> Key: HIVE-8045
> URL: https://issues.apache.org/jira/browse/HIVE-8045
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Jagruti Varia
>Assignee: Thejas M Nair
> Attachments: HIVE-8045.1.patch
>
>
> HIVE-7533 enabled sql std authorization to be set in hive cli (without 
> enabling authorization checks). This updates hive configuration so that 
> create-table and create-views set permissions appropriately for the owner of 
> the table.
> HIVE-7209 added a metastore authorization provider that can be used to 
> restricts calls made to the authorization api, so that only HS2 can make 
> those calls (when HS2 uses embedded metastore).
> Some issues were found with this.
> # Even if hive.security.authorization.enabled=false, authorization checks 
> were happening for non sql statements as add/detete/dfs/compile, which 
> results in MetaStoreAuthzAPIAuthorizerEmbedOnly throwing an error.
> # Create table from hive-cli ended up calling metastore server api call 
> (getRoles) and resulted in  MetaStoreAuthzAPIAuthorizerEmbedOnly throwing an 
> error.
> # Some users prefer to enable authorization using hive-site.xml for 
> hive-server2 (hive.security.authorization.enabled param). If this file is 
> shared by hive-cli and hive-server2,  SQL std authorizer throws an error 
> because is use in hive-cli is not allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8162) hive.optimize.sort.dynamic.partition causes RuntimeException for inserting into dynamic partitioned table when map function is used in the subquery

2014-09-17 Thread Na Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137784#comment-14137784
 ] 

Na Yang commented on HIVE-8162:
---

The operator tree for this query is like:
TS0-FIL9-SEL2-GBY4-RS5-GBY6-SEL7-RS10-EX11-FS8.

The task graph for this query is like:
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0

STAGE PLANS:
  Stage: Stage-1
Map Reduce
  Map Operator Tree:
  TableScan
alias: associateddata
Statistics: Num rows: 25374 Data size: 101496 Basic stats: COMPLETE 
Column stats: NONE
Filter Operator
  predicate: (sm_campaign_id) IN (10187171, 1090942, 10541943, 
10833443, 8635630, 10187170, 9445296, 10696334, 11398585, 9524211, 1145211) 
(type: boolean)
  Statistics: Num rows: 12687 Data size: 50748 Basic stats: 
COMPLETE Column stats: NONE
  Select Operator
expressions: map('x_product_id':'') (type: map), 
day_id (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 12687 Data size: 50748 Basic stats: 
COMPLETE Column stats: NONE
Group By Operator
  keys: _col0 (type: map), _col1 (type: int)
  mode: hash
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 12687 Data size: 50748 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: map), _col1 
(type: int)
sort order: ++
Map-reduce partition columns: _col0 (type: 
map), _col1 (type: int)
Statistics: Num rows: 12687 Data size: 50748 Basic stats: 
COMPLETE Column stats: NONE
  Reduce Operator Tree:
Group By Operator
  keys: KEY._col0 (type: map), KEY._col1 (type: int)
  mode: mergepartial
  outputColumnNames: _col0, _col1
  Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE 
Column stats: NONE
  Select Operator
expressions: 2 (type: int), _col0 (type: map), _col1 
(type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE 
Column stats: NONE
File Output Operator
  compressed: false
  table:
  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
  serde: 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe

  Stage: Stage-2
Map Reduce
  Map Operator Tree:
  TableScan
Reduce Output Operator
  key expressions: _col2 (type: int), _col0 (type: 
map), _col1 (type: int)
  sort order: +++
  Map-reduce partition columns: _col2 (type: int)
  Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE 
Column stats: NONE
  value expressions: _col0 (type: int), _col1 (type: 
map), _col2 (type: int)
  Reduce Operator Tree:
Extract
  Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE 
Column stats: NONE
  File Output Operator
compressed: false
Statistics: Num rows: 6343 Data size: 25372 Basic stats: COMPLETE 
Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: default.agg_pv_associateddata_c

  Stage: Stage-0
Move Operator
  tables:
  partition:
day_id 
  replace: false
  table:
  input format: org.apache.hadoop.mapred.TextInputFormat
  output format: 
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
  name: default.agg_pv_associateddata_c

  Stage: Stage-3
Stats-Aggr Operator

The exception happens when executing task stage-2. The ReduceSinkDesc for RS10 
has keycols type as {int, map, int} and the intermediate file 
for this table is stored in SequenceFileInputFormat and using LazyBinarySerDe. 
However, the LazyBinarySerDe is not able to deserialize non-primitive type from 
the intermediate file which causes the exception.   

Using the TextInputFormat and LazySimpleSerDe for the intermediate file, the 
exception is gone. However, changing the intermediate file InputFormat and 
SerDe is not a preferred solution.  

> hive.optimize.sort.dynamic.partition causes RuntimeException for inserting 
> int

[jira] [Commented] (HIVE-2927) Allow escape character in get_json_object

2014-09-17 Thread Vlad Zhidkov (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137787#comment-14137787
 ] 

Vlad Zhidkov commented on HIVE-2927:


Is anyone planning to work on this feature? Are there any known workarounds? My 
specific problem has to do with '$' characters in the field names.

> Allow escape character in get_json_object
> -
>
> Key: HIVE-2927
> URL: https://issues.apache.org/jira/browse/HIVE-2927
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 0.8.1, 0.9.0
>Reporter: Sean McNamara
> Attachments: HIVE-2927.1.patch.txt
>
>   Original Estimate: 0h
>  Remaining Estimate: 0h
>
> *Background:*
> get_json_object extracts json objects from a json string based on a specified 
> path.
> *Problem:*
> The current implementation of get_json_object can't see keys with a '.' in 
> them.  Our data contains '.' in the keys, so we have to filter our json keys 
> through a streaming script to replace '.' for '_'.
> *Example:*
> {{json = {"a":{"b": 1}, "c.d": 2}}}
> {{get_json_object(json, "$.a.b") returns: 1}}
> {{get_json_object(json, "$.c.d") returns: NULL}}
> In the present implementation of get_json_object, c.d is not addressable.
> *Proposal:*
> The desired behavior would be to allow the JSON path to be escape-able, like 
> so:
> {{get_json_object(json, '$.c\\\.d') would return: 2}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8081) "drop index if exists" fails if table specified does not exist

2014-09-17 Thread Thejas M Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137840#comment-14137840
 ] 

Thejas M Nair commented on HIVE-8081:
-

+1

> "drop index if exists" fails if table specified does not exist
> --
>
> Key: HIVE-8081
> URL: https://issues.apache.org/jira/browse/HIVE-8081
> Project: Hive
>  Issue Type: Bug
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-8081.1.patch, HIVE-8081.2.patch, HIVE-8081.3.patch
>
>
> Seems to be a regression in behavior from HIVE-7648.
> {noformat}
> FAILED: SemanticException [Error 10001]: Table not found missing_ddl_3
> 14/09/09 16:12:46 [main]: ERROR ql.Driver: FAILED: SemanticException [Error 
> 10001]: Table not found missing_ddl_3
> org.apache.hadoop.hive.ql.parse.SemanticException: Table not found 
> missing_ddl_3
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1243)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1226)
>   at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropIndex(DDLSemanticAnalyzer.java:1148)
>   at 
> org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:326)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:402)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:298)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:992)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1062)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:929)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:919)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:246)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:198)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HIVE-8163) With dynamic partition pruning map operator that generates the partition filters is not vectorized

2014-09-17 Thread Mostafa Mokhtar (JIRA)

Mostafa Mokhtar created HIVE-8163:
-

 Summary: With dynamic partition pruning map operator that 
generates the partition filters is not vectorized
 Key: HIVE-8163
 URL: https://issues.apache.org/jira/browse/HIVE-8163
 Project: Hive
  Issue Type: Bug
  Components: Tez
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Gunther Hagleitner
Priority: Minor
 Fix For: vectorization-branch, 0.14.0


Vertex used to generate the partition pruning filters is not vectorized.

Sample from the plan :
{code}
Vertices:
Map 1 
Map Operator Tree:
TableScan
  alias: d3
  filterExpr: ((d_quarter_name) IN ('2000Q1', '2000Q2', 
'2000Q3') and d_date_sk is not null) (type: boolean)
  Statistics: Num rows: 73049 Data size: 81741831 Basic stats: 
COMPLETE Column stats: NONE
  Filter Operator
predicate: ((d_quarter_name) IN ('2000Q1', '2000Q2', 
'2000Q3') and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
Select Operator
  expressions: d_date_sk (type: int)
  outputColumnNames: _col0
  Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
  Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
  Select Operator
expressions: _col0 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
Group By Operator
  keys: _col0 (type: int)
  mode: hash
  outputColumnNames: _col0
  Statistics: Num rows: 18262 Data size: 20435178 Basic 
stats: COMPLETE Column stats: NONE
  Dynamic Partitioning Event Operator
Target Input: catalog_sales
Partition key expr: cs_sold_date_sk
Statistics: Num rows: 18262 Data size: 20435178 
Basic stats: COMPLETE Column stats: NONE
Target column: cs_sold_date_sk
Target Vertex: Map 3
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8104) Insert statements against ACID tables NPE when vectorization is on

2014-09-17 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137938#comment-14137938
 ] 

Hive QA commented on HIVE-8104:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12669201/HIVE-8104.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 6280 tests executed
*Failed tests:*
{noformat}
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/846/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/846/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-846/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12669201

> Insert statements against ACID tables NPE when vectorization is on
> --
>
> Key: HIVE-8104
> URL: https://issues.apache.org/jira/browse/HIVE-8104
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor, Vectorization
>Affects Versions: 0.14.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Critical
> Attachments: HIVE-8104.2.patch, HIVE-8104.patch
>
>
> Doing an insert against a table that is using ACID format with the 
> transaction manager set to DbTxnManager and vectorization turned on results 
> in an NPE.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-7073) Implement Binary in ParquetSerDe

2014-09-17 Thread Pratik Khadloya (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137947#comment-14137947
 ] 

Pratik Khadloya commented on HIVE-7073:
---

Hello [~davidzchen] are you working on it? If not and you have some ideas on 
how to do it, please note them in this jira.

Thanks.

> Implement Binary in ParquetSerDe
> 
>
> Key: HIVE-7073
> URL: https://issues.apache.org/jira/browse/HIVE-7073
> Project: Hive
>  Issue Type: Sub-task
>Reporter: David Chen
>Assignee: David Chen
>
> The ParquetSerDe currently does not support the BINARY data type. This ticket 
> is to implement the BINARY data type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8143) Create root scratch dir with 733 instead of 777 perms

2014-09-17 Thread Vaibhav Gumashta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137955#comment-14137955
 ] 

Vaibhav Gumashta commented on HIVE-8143:


These tests are failing elsewhere as well. I'll commit this shortly.

> Create root scratch dir with 733 instead of 777 perms
> -
>
> Key: HIVE-8143
> URL: https://issues.apache.org/jira/browse/HIVE-8143
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
> Attachments: HIVE-8143.1.patch, HIVE-8143.2.patch
>
>
> hive.exec.scratchdir which is treated as the root scratch directory on hdfs 
> only needs to be writable by all. We can use 733 instead of 777 for doing 
> that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8143) Create root scratch dir with 733 instead of 777 perms

2014-09-17 Thread Vaibhav Gumashta (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-8143:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed. Thanks for the review [~gopalv]. 

> Create root scratch dir with 733 instead of 777 perms
> -
>
> Key: HIVE-8143
> URL: https://issues.apache.org/jira/browse/HIVE-8143
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, JDBC
>Affects Versions: 0.14.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Fix For: 0.14.0
>
> Attachments: HIVE-8143.1.patch, HIVE-8143.2.patch
>
>
> hive.exec.scratchdir which is treated as the root scratch directory on hdfs 
> only needs to be writable by all. We can use 733 instead of 777 for doing 
> that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8089) Ordering is lost when limit is put in outer query

2014-09-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137966#comment-14137966
 ] 

Sergey Shelukhin commented on HIVE-8089:


At first glance it seems by design... ordering of the table is not preserved 
(or at least there's no guarantee about ordering) when one selects from a table 
w/o order by. So, why should it be preserved from subquery?

> Ordering is lost when limit is put in outer query
> -
>
> Key: HIVE-8089
> URL: https://issues.apache.org/jira/browse/HIVE-8089
> Project: Hive
>  Issue Type: Bug
>Reporter: Laljo John Pullokkaran
>
> It seems like hive supports order by, limit in sub queries (compiler doesn't 
> complain). However ordering seems to be lost based on where you place the 
> limit.   I haven't debugged the issue.
> ex:
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1)t1;
> null  NULL
> null  NULL
> 1 1
> 1 1
> 1 1
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8089) Ordering is lost when limit is put in outer query

2014-09-17 Thread Laljo John Pullokkaran (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-8089:
-
Assignee: Sergey Shelukhin

> Ordering is lost when limit is put in outer query
> -
>
> Key: HIVE-8089
> URL: https://issues.apache.org/jira/browse/HIVE-8089
> Project: Hive
>  Issue Type: Bug
>Reporter: Laljo John Pullokkaran
>Assignee: Sergey Shelukhin
>
> It seems like hive supports order by, limit in sub queries (compiler doesn't 
> complain). However ordering seems to be lost based on where you place the 
> limit.   I haven't debugged the issue.
> ex:
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1)t1;
> null  NULL
> null  NULL
> 1 1
> 1 1
> 1 1
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8089) Ordering is lost when limit is put in outer query

2014-09-17 Thread Laljo John Pullokkaran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137972#comment-14137972
 ] 

Laljo John Pullokkaran commented on HIVE-8089:
--

We support order in subquery when limit is present; but ordering fails only 
when a limit is specified in outer query.

> Ordering is lost when limit is put in outer query
> -
>
> Key: HIVE-8089
> URL: https://issues.apache.org/jira/browse/HIVE-8089
> Project: Hive
>  Issue Type: Bug
>Reporter: Laljo John Pullokkaran
>Assignee: Sergey Shelukhin
>
> It seems like hive supports order by, limit in sub queries (compiler doesn't 
> complain). However ordering seems to be lost based on where you place the 
> limit.   I haven't debugged the issue.
> ex:
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1)t1;
> null  NULL
> null  NULL
> 1 1
> 1 1
> 1 1
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-7777) Add CSV Serde based on OpenCSV

2014-09-17 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-:
---
Release Note: A CSVSerde based on OpenCSV has been added. This Serde works 
for most CSV data, but does not handled embedded newlines. To use the Serde, 
specify the fully qualified class name 
org.apache.hadoop.hive.serde2.OpenCSVSerde.  (was: A CSVSerde based on OpenCSV 
has been added. This Serde works for most CSV data, but does not handled 
embedded newlines.)

> Add CSV Serde based on OpenCSV
> --
>
> Key: HIVE-
> URL: https://issues.apache.org/jira/browse/HIVE-
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Ferdinand Xu
>Assignee: Ferdinand Xu
>  Labels: TODOC14
> Fix For: 0.14.0
>
> Attachments: HIVE-.1.patch, HIVE-.2.patch, HIVE-.3.patch, 
> HIVE-.patch, csv-serde-master.zip
>
>
> There is no official support for csvSerde for hive while there is an open 
> source project in github(https://github.com/ogrodnek/csv-serde). CSV is of 
> high frequency in use as a data format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8111) CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO

2014-09-17 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8111:
---
Attachment: HIVE-8111.patch

[~ashutoshc] [~jpullokkaran]

Attaching the patch. 
While the spurious casts are retained, result types are now all correct and the 
query results are correct for decimal_udf.
Notes: 
- there's a bug on CBO branch right now, so I reverted my recent merge and 
obtained the out file before that;
- the out file includes changes for the remaining casts (which are ok)
- the out file also includes changes where extra stage is added to each query 
(same on CBO branch). I am not sure if this is ok.

Will post rb shortly

> CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO
> 
>
> Key: HIVE-8111
> URL: https://issues.apache.org/jira/browse/HIVE-8111
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-8111.patch
>
>
> Original test failure: looks like column type changes to different decimals 
> in most cases. In one case it causes the integer part to be too big to fit, 
> so the result becomes null it seems.
> What happens is that CBO adds casts to arithmetic expressions to make them 
> type compatible; these casts become part of new AST, and then Hive adds casts 
> on top of these casts. This (the first part) also causes lots of out file 
> changes. It's not clear how to best fix it so far, in addition to incorrect 
> decimal width and sometimes nulls when width is larger than allowed in Hive.
> Option one - don't add those for numeric ops - cannot be done if numeric op 
> is a part of compare, for which CBO needs correct types.
> Option two - unwrap casts when determining type in Hive - hard or impossible 
> to tell apart CBO-added casts and user casts. 
> Option three - don't change types in Hive if CBO has run - seems hacky and 
> hard to ensure it's applied everywhere.
> Option four - map all expressions precisely between two trees and remove 
> casts again after optimization, will be pretty difficult.
> Option five - somehow mark those casts. Not sure about how yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8158) Optimize writeValue/setValue in VectorExpressionWriterFactory (in VectorReduceSinkOperator codepath)

2014-09-17 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-8158:
---
Status: Patch Available  (was: Open)

> Optimize writeValue/setValue in VectorExpressionWriterFactory (in 
> VectorReduceSinkOperator codepath)
> 
>
> Key: HIVE-8158
> URL: https://issues.apache.org/jira/browse/HIVE-8158
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>  Labels: performance
> Attachments: HIVE-8158.1.patch, profiler_output.png
>
>
> VectorReduceSinkOperator --> ProcessOp --> makeValueWriatable --> 
> VectorExpressionWriterFactory --> writeValue(byte[], int, int) /setValue.
> It appears that this goes through an additional layer of Text.encode/decode 
> causing CPU pressure (profiler output attached).
> SettableStringObjectInspector / WritableStringObjectInspector has "set(Object 
> o, Text value)" method. It would be beneficial to use set(Object, Text) 
> directly to save CPU cycles.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8045) SQL standard auth with cli - Errors and configuration issues

2014-09-17 Thread Jason Dere (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138004#comment-14138004
 ] 

Jason Dere commented on HIVE-8045:
--

Couple comments on RB.
The docs will also need to be updated to reflect the supported auth 
configurations, for both HS2 and CLI. Might want to keep them separate from any 
such configurations supported in 0.13, since it looks like there are some 
differences now.

> SQL standard auth with cli - Errors and configuration issues
> 
>
> Key: HIVE-8045
> URL: https://issues.apache.org/jira/browse/HIVE-8045
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Reporter: Jagruti Varia
>Assignee: Thejas M Nair
> Attachments: HIVE-8045.1.patch
>
>
> HIVE-7533 enabled sql std authorization to be set in hive cli (without 
> enabling authorization checks). This updates hive configuration so that 
> create-table and create-views set permissions appropriately for the owner of 
> the table.
> HIVE-7209 added a metastore authorization provider that can be used to 
> restricts calls made to the authorization api, so that only HS2 can make 
> those calls (when HS2 uses embedded metastore).
> Some issues were found with this.
> # Even if hive.security.authorization.enabled=false, authorization checks 
> were happening for non sql statements as add/detete/dfs/compile, which 
> results in MetaStoreAuthzAPIAuthorizerEmbedOnly throwing an error.
> # Create table from hive-cli ended up calling metastore server api call 
> (getRoles) and resulted in  MetaStoreAuthzAPIAuthorizerEmbedOnly throwing an 
> error.
> # Some users prefer to enable authorization using hive-site.xml for 
> hive-server2 (hive.security.authorization.enabled param). If this file is 
> shared by hive-cli and hive-server2,  SQL std authorizer throws an error 
> because is use in hive-cli is not allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8089) Ordering is lost when limit is put in outer query

2014-09-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138005#comment-14138005
 ] 

Sergey Shelukhin commented on HIVE-8089:


it seems that ordering in the subquery should have no effect on any outer 
limits, and outer limits cannot have effect on nested query.
So if I sort numbers 1..10, and so select N from (select N order by N limit 5), 
that should return 1..5.
If we have select N from (select N order by N) limit 5 we can return any 5 
numbers in any order, because it's just like select N from table limit 5 - no 
order is guaranteed...
I will take a look if this is easy to make this behavior more "user-friendly"

> Ordering is lost when limit is put in outer query
> -
>
> Key: HIVE-8089
> URL: https://issues.apache.org/jira/browse/HIVE-8089
> Project: Hive
>  Issue Type: Bug
>Reporter: Laljo John Pullokkaran
>Assignee: Sergey Shelukhin
>
> It seems like hive supports order by, limit in sub queries (compiler doesn't 
> complain). However ordering seems to be lost based on where you place the 
> limit.   I haven't debugged the issue.
> ex:
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1)t1;
> null  NULL
> null  NULL
> 1 1
> 1 1
> 1 1
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-8089) Ordering is lost when limit is put in outer query

2014-09-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138005#comment-14138005
 ] 

Sergey Shelukhin edited comment on HIVE-8089 at 9/17/14 9:23 PM:
-

it seems that ordering in the subquery should have no effect on any outer 
limits, and outer limits cannot have effect on nested query.
So if I have numbers 1..10, and so select N from (select N order by N limit 5), 
that should return 1..5.
If we have select N from (select N order by N) limit 5 we can return any 5 
numbers in any order, because it's just like select N from table limit 5 - no 
order is guaranteed...
I will take a look if this is easy to make this behavior more "user-friendly"


was (Author: sershe):
it seems that ordering in the subquery should have no effect on any outer 
limits, and outer limits cannot have effect on nested query.
So if I sort numbers 1..10, and so select N from (select N order by N limit 5), 
that should return 1..5.
If we have select N from (select N order by N) limit 5 we can return any 5 
numbers in any order, because it's just like select N from table limit 5 - no 
order is guaranteed...
I will take a look if this is easy to make this behavior more "user-friendly"

> Ordering is lost when limit is put in outer query
> -
>
> Key: HIVE-8089
> URL: https://issues.apache.org/jira/browse/HIVE-8089
> Project: Hive
>  Issue Type: Bug
>Reporter: Laljo John Pullokkaran
>Assignee: Sergey Shelukhin
>
> It seems like hive supports order by, limit in sub queries (compiler doesn't 
> complain). However ordering seems to be lost based on where you place the 
> limit.   I haven't debugged the issue.
> ex:
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1)t1;
> null  NULL
> null  NULL
> 1 1
> 1 1
> 1 1
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-8089) Ordering is lost when limit is put in outer query

2014-09-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138005#comment-14138005
 ] 

Sergey Shelukhin edited comment on HIVE-8089 at 9/17/14 9:24 PM:
-

it seems that ordering in the subquery should have no effect on any outer 
limits, and outer limits cannot have effect on nested query.
So if I have numbers 1..10, and so select N from (select N order by N limit 5), 
that should return 1..5.
If we have select N from (select N order by N) limit 5 we can return any 5 
numbers in any order, because it's just like insert into T select N order by N, 
then select N from T limit 5 - table happens to be sorted and so result may be 
1..5, but no order is guaranteed and so it can be anything, strictly speaking...
I will take a look if this is easy to make this behavior more "user-friendly"


was (Author: sershe):
it seems that ordering in the subquery should have no effect on any outer 
limits, and outer limits cannot have effect on nested query.
So if I have numbers 1..10, and so select N from (select N order by N limit 5), 
that should return 1..5.
If we have select N from (select N order by N) limit 5 we can return any 5 
numbers in any order, because it's just like select N from table limit 5 - no 
order is guaranteed...
I will take a look if this is easy to make this behavior more "user-friendly"

> Ordering is lost when limit is put in outer query
> -
>
> Key: HIVE-8089
> URL: https://issues.apache.org/jira/browse/HIVE-8089
> Project: Hive
>  Issue Type: Bug
>Reporter: Laljo John Pullokkaran
>Assignee: Sergey Shelukhin
>
> It seems like hive supports order by, limit in sub queries (compiler doesn't 
> complain). However ordering seems to be lost based on where you place the 
> limit.   I haven't debugged the issue.
> ex:
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1)t1;
> null  NULL
> null  NULL
> 1 1
> 1 1
> 1 1
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL
> select key, c_int from (select key, c_int from (select key, c_int from t1 
> order by c_int limit 5)t1 limit 5)t1;
> 1 1
> 1 1
> 1 1
> null  NULL
> null  NULL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8111) CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO

2014-09-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-8111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138008#comment-14138008
 ] 

Sergey Shelukhin commented on HIVE-8111:


https://reviews.apache.org/r/25754/

> CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO
> 
>
> Key: HIVE-8111
> URL: https://issues.apache.org/jira/browse/HIVE-8111
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-8111.patch
>
>
> Original test failure: looks like column type changes to different decimals 
> in most cases. In one case it causes the integer part to be too big to fit, 
> so the result becomes null it seems.
> What happens is that CBO adds casts to arithmetic expressions to make them 
> type compatible; these casts become part of new AST, and then Hive adds casts 
> on top of these casts. This (the first part) also causes lots of out file 
> changes. It's not clear how to best fix it so far, in addition to incorrect 
> decimal width and sometimes nulls when width is larger than allowed in Hive.
> Option one - don't add those for numeric ops - cannot be done if numeric op 
> is a part of compare, for which CBO needs correct types.
> Option two - unwrap casts when determining type in Hive - hard or impossible 
> to tell apart CBO-added casts and user casts. 
> Option three - don't change types in Hive if CBO has run - seems hacky and 
> hard to ensure it's applied everywhere.
> Option four - map all expressions precisely between two trees and remove 
> casts again after optimization, will be pretty difficult.
> Option five - somehow mark those casts. Not sure about how yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Review Request 25754: HIVE-8111 CBO trunk merge: duplicated casts for arithmetic expressions in Hive and CBO

2014-09-17 Thread Sergey Shelukhin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25754/
---

Review request for hive, Ashutosh Chauhan and John Pullokkaran.


Repository: hive-git


Description
---

see jira


Diffs
-

  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/optiq/translator/RexNodeConverter.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseNumeric.java 
6131d3d 
  ql/src/test/queries/clientpositive/decimal_udf.q 591c210 
  ql/src/test/results/clientpositive/decimal_udf.q.out c5c2031 

Diff: https://reviews.apache.org/r/25754/diff/


Testing
---


Thanks,

Sergey Shelukhin

[jira] [Commented] (HIVE-7145) Remove dependence on apache commons-lang

2014-09-17 Thread Owen O'Malley (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-7145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138023#comment-14138023
 ] 

Owen O'Malley commented on HIVE-7145:
-

This is harder than I thought because the generated Thrift code depends on 
commons lang 2. To completely remove the dependence on commons lang 2, you 
would need to fix Thrift...

> Remove dependence on apache commons-lang
> 
>
> Key: HIVE-7145
> URL: https://issues.apache.org/jira/browse/HIVE-7145
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> We currently depend on both Apache commons-lang and commons-lang3. They are 
> the same project, just at version 2.x vs 3.x. I propose that we move all of 
> the references in Hive to commons-lang3 and remove the v2 usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8102) Partitions of type 'date' behave incorrectly with daylight saving time.

2014-09-17 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-8102:


Assignee: Jason Dere

> Partitions of type 'date' behave incorrectly with daylight saving time.
> ---
>
> Key: HIVE-8102
> URL: https://issues.apache.org/jira/browse/HIVE-8102
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema, Serializers/Deserializers
>Affects Versions: 0.13.0
>Reporter: Eli Acherkan
>Assignee: Jason Dere
> Attachments: HIVE-8102.1.patch, HIVE-8102.2.patch
>
>
> On 2AM on March 28th 2014, Israel went from standard time (GMT+2) to daylight 
> saving time (GMT+3).
> The server's timezone is Asia/Jerusalem. When creating a partition whose key 
> is 2014-03-28, Hive creates a partition for 2013-03-27 instead:
> hive (default)> create table test (a int) partitioned by (`b_prt` date);
> OK
> Time taken: 0.092 seconds
> hive (default)> alter table test add partition (b_prt='2014-03-28');
> OK
> Time taken: 0.187 seconds
> hive (default)> show partitions test;   
> OK
> partition
> b_prt=2014-03-27
> Time taken: 0.134 seconds, Fetched: 1 row(s)
> It seems that the root cause is the behavior of 
> DateWritable.daysToMillis/dateToDays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (HIVE-8160) Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]

2014-09-17 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-8160:
-

Assignee: Xuefu Zhang

> Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]
> -
>
> Key: HIVE-8160
> URL: https://issues.apache.org/jira/browse/HIVE-8160
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
> Attachments: HIVE-8160.1-spark.patch
>
>
> Hive on Spark needs SPARK-2978, which is now available in latest Spark main 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8160) Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]

2014-09-17 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8160:
--
Attachment: HIVE-8160.1-spark.patch

> Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]
> -
>
> Key: HIVE-8160
> URL: https://issues.apache.org/jira/browse/HIVE-8160
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
> Attachments: HIVE-8160.1-spark.patch
>
>
> Hive on Spark needs SPARK-2978, which is now available in latest Spark main 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-8160) Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]

2014-09-17 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-8160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-8160:
--
Labels: Spark-M1  (was: )

> Upgrade Spark dependency to 1.2.0-SNAPSHOT [Spark Branch]
> -
>
> Key: HIVE-8160
> URL: https://issues.apache.org/jira/browse/HIVE-8160
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
>Priority: Minor
>  Labels: Spark-M1
> Attachments: HIVE-8160.1-spark.patch
>
>
> Hive on Spark needs SPARK-2978, which is now available in latest Spark main 
> branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 >

1 - 100 of 224 matches

Mail list logo