[jira] [Created] (HIVE-11498) HIVE Authorization v2 should not check permission for dummy entity

2015-08-06 Thread Dapeng Sun (JIRA)
Dapeng Sun created HIVE-11498:
-

 Summary: HIVE Authorization v2 should not check permission for 
dummy entity
 Key: HIVE-11498
 URL: https://issues.apache.org/jira/browse/HIVE-11498
 Project: Hive
  Issue Type: Bug
Reporter: Dapeng Sun
Assignee: Dapeng Sun






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Issue while storing Date data in hive table (stored as parquet) with cascading-hive

2015-08-06 Thread Santlal J Gupta
Hi,

I am beginner with cascading-hive.
Through cascading hive, i want to load data into the  hive table which is 
stored as parquet format. My data contains one field which is date. I have 
created hive table in parquet format. But when i tried to load  date data into 
the hive table (i.e. stored as parquet) , it failed to load.  Here in sink I 
have mentioned HiveTap and I have map the field with Binary(String) as Date 
datatype is not available in Cascading-parquet.
I have tried some sample code.

code :

public class ReadText_StoredIn_Parquet_Date {

static String inpath="parquet_input/ReadText_StoredIn_Parquet_Date.txt";
public static void main(String[] args) {
// TODO Auto-generated method stub

Properties properties = new Properties();
AppProps.setApplicationJarClass( properties, TestExample.class 
);
AppProps.addApplicationTag( properties, 
"Cascading-HiveDemoPart1" );

Scheme sourceSch = new TextDelimited(new 
Fields("dob"),true,"\n");
Tap inTapCallCenter = new Hfs( sourceSch, inpath );

String columnFields[]={"dob"};
String columnType[]={"date"};
String databaseName="hive_parquet";
String tableName= "parquet_date";

HiveTableDescriptor sinkTableDescriptor = new 
HiveTableDescriptor
(databaseName ,tableName, columnFields, 
columnType );

ParquetTupleScheme scheme = new ParquetTupleScheme(new 
Fields(columnFields),new Fields(columnFields),
"message ReadText_Parquet_string_int{optional 
Binary dob; }");

HiveTap sinkTap = new HiveTap( sinkTableDescriptor, scheme, 
SinkMode.REPLACE, true );
Pipe copyPipe = new Pipe( "copyPipe" );

FlowDef def=FlowDef.flowDef().addSource(copyPipe, 
inTapCallCenter).addTailSink(copyPipe, sinkTap);
new Hadoop2MR1FlowConnector(properties).connect(def).complete();
}
}

This code works  fine, it loads data into the table (i.e. stored as parquet 
format). But while I read data it will give exception
I have used ParquetTupleScheme to generate scheme for HiveTap.

I have used following query.

Query:

hive (hive_parquet)> create table parquet_date(dob date) stored as parquet;
hive (hive_parquet)> select * from parquet_date;

Failed with exception 
java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast 
to org.apache.hadoop.hive.serde2.io.DateWritable


can you please assist me to how to store date data value into the hive 
table(i.e. stored as parquet) by using ParquetTupleScheme or by any another way.


currently i am using:
hive-1.2.0
hadoop

Thanks,
Santlal J. Gupta
**Disclaimer**
 This e-mail message and any attachments may contain confidential information 
and is for the sole use of the intended recipient(s) only. Any views or 
opinions presented or implied are solely those of the author and do not 
necessarily represent the views of BitWise. If you are not the intended 
recipient(s), you are hereby notified that disclosure, printing, copying, 
forwarding, distribution, or the taking of any action whatsoever in reliance on 
the contents of this electronic information is strictly prohibited. If you have 
received this e-mail message in error, please immediately notify the sender and 
delete the electronic message and any attachments.BitWise does not accept 
liability for any virus introduced by this e-mail or any attachments. 



Review Request 37207: HIVE-5277 Fix for wrong hbase counts

2015-08-06 Thread Swarnim Kulkarni

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37207/
---

Review request for hive, Ashutosh Chauhan and Xuefu Zhang.


Repository: hive-git


Description
---

HIVE-5277 Fix for wrong hbase counts


Diffs
-

  
hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHBaseInputFormatUtil.java
 05245728ba7ad8579554d44ff6abad61db89ed16 
  hbase-handler/src/test/queries/positive/hbase_null_first_col.q PRE-CREATION 
  hbase-handler/src/test/results/positive/hbase_null_first_col.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/37207/diff/


Testing
---

Added tests as well as tested manually that it fixes the issue.


Thanks,

Swarnim Kulkarni



Re: Review Request 37150: HIVE-11375 Broken processing of queries containing NOT (x IS NOT NULL and x <> 0)

2015-08-06 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37150/#review94484
---



ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
 (line 532)


why only this case cannot be folded? seems like the other cases, like "not 
(key < 3 and key is not null" would also have the same issue.



ql/src/test/queries/clientpositive/folder_predicate.q (line 7)


maybe add tests negation as well?
e.g.:

SELECT * FROM predicate_fold_tb WHERE NOT (value IS NOT NULL AND value >= 
3);


- Chao Sun


On Aug. 6, 2015, 12:01 a.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37150/
> ---
> 
> (Updated Aug. 6, 2015, 12:01 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-11375 Broken processing of queries containing NOT (x IS NOT NULL and x 
> <> 0)
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/ConstantPropagateProcFactory.java
>  410735c27e5372a0818e18e1a6dc5b07d7b986c0 
>   ql/src/test/queries/clientpositive/folder_predicate.q PRE-CREATION 
>   ql/src/test/results/clientpositive/folder_predicate.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/37150/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 37096: HIVE-11451 SemanticAnalyzer throws IndexOutOfBounds Exception

2015-08-06 Thread Chao Sun

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37096/#review94482
---

Ship it!


Ship It!

- Chao Sun


On Aug. 7, 2015, 1:02 a.m., Aihua Xu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37096/
> ---
> 
> (Updated Aug. 7, 2015, 1:02 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-11451 SemanticAnalyzer throws IndexOutOfBounds Exception
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f05407d 
>   ql/src/test/queries/clientnegative/mismatch_columns_insertion.q 
> PRE-CREATION 
>   ql/src/test/results/clientnegative/mismatch_columns_insertion.q.out 
> PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/37096/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Aihua Xu
> 
>



Re: Review Request 37096: HIVE-11451 SemanticAnalyzer throws IndexOutOfBounds Exception

2015-08-06 Thread Aihua Xu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37096/
---

(Updated Aug. 7, 2015, 1:02 a.m.)


Review request for hive.


Repository: hive-git


Description
---

HIVE-11451 SemanticAnalyzer throws IndexOutOfBounds Exception


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java f05407d 
  ql/src/test/queries/clientnegative/mismatch_columns_insertion.q PRE-CREATION 
  ql/src/test/results/clientnegative/mismatch_columns_insertion.q.out 
PRE-CREATION 

Diff: https://reviews.apache.org/r/37096/diff/


Testing
---


Thanks,

Aihua Xu



[jira] [Created] (HIVE-11497) Make sure --orcfiledump utility includes OrcRecordUpdate.AcidStats

2015-08-06 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-11497:
-

 Summary: Make sure --orcfiledump utility includes 
OrcRecordUpdate.AcidStats
 Key: HIVE-11497
 URL: https://issues.apache.org/jira/browse/HIVE-11497
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


OrcRecordUpdater.AcidStats maintains counts on I/U/D events in the file (going 
back to Hive 0.14).

current branch-1, has OrcRecordUpdater.parserAcidStats() to read it and should 
be included in _orcfiledump_ output.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11496) Better tests for evaluating ORC predicate pushdown

2015-08-06 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-11496:


 Summary: Better tests for evaluating ORC predicate pushdown
 Key: HIVE-11496
 URL: https://issues.apache.org/jira/browse/HIVE-11496
 Project: Hive
  Issue Type: Improvement
Affects Versions: 1.3.0, 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Prasanth Jayachandran


There were many regressions recently wrt ORC predicate pushdown. We don't have 
system tests to capture these regressions. Currently there is only junit tests 
for testing ORC predicate pushdown feature. Since hive counters are not 
available during qfile test execution there is no easy way to verify if ORC PPD 
feature worked or not. This jira is add a post execution hook to print hive 
counters (esp. number of input records) to error stream so that it will appear 
in qfile test output. This way we can verify ORC SARG evaluation and avoid 
future regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11495) Add aborted reason to transaction information.

2015-08-06 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-11495:
-

 Summary: Add aborted reason to transaction information.
 Key: HIVE-11495
 URL: https://issues.apache.org/jira/browse/HIVE-11495
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Should add TXNS.COMMENT field or something like that so that if the system 
aborts a transaction (due to timeout, for example) we can add a message to that 
effect to the aborted transaction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11494) Some positive constant double predicates gets rounded off while negative constants are not

2015-08-06 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-11494:


 Summary: Some positive constant double predicates gets rounded off 
while negative constants are not
 Key: HIVE-11494
 URL: https://issues.apache.org/jira/browse/HIVE-11494
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Pengcheng Xiong
Priority: Critical


Check the predicates in filter expression for following queries. It looks 
closely related to HIVE-11477 and HIVE-11493
{code:title=explain select * from orc_ppd where f = -0.0799821186066;}
OK
Stage-0
   Fetch Operator
  limit:-1
  Select Operator [SEL_2]
 
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13"]
 Filter Operator [FIL_4]
predicate:(f = -0.0799821186066) (type: boolean)
TableScan [TS_0]
   alias:orc_ppd
{code}

{code:title=explain select * from orc_ppd where f = 0.0799821186066;}
OK
Stage-0
   Fetch Operator
  limit:-1
  Select Operator [SEL_2]
 
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13"]
 Filter Operator [FIL_4]
predicate:(f = 0.08) (type: boolean)
TableScan [TS_0]
   alias:orc_ppd
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11493) Predicate with integer column equals double evaluates to false

2015-08-06 Thread Prasanth Jayachandran (JIRA)
Prasanth Jayachandran created HIVE-11493:


 Summary: Predicate with integer column equals double evaluates to 
false
 Key: HIVE-11493
 URL: https://issues.apache.org/jira/browse/HIVE-11493
 Project: Hive
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Prasanth Jayachandran
Assignee: Pengcheng Xiong
Priority: Blocker


Filters with integer column equals double constant evaluates to false 
everytime. Negative double constant works fine.

{code:title=explain select * from orc_ppd where t = 10.0;}
OK
Stage-0
   Fetch Operator
  limit:-1
  Select Operator [SEL_2]
 
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13"]
 Filter Operator [FIL_1]
predicate:false (type: boolean)
TableScan [TS_0]
   alias:orc_ppd
{code}

{code:title=explain select * from orc_ppd where t = -10.0;}
OK
Stage-0
   Fetch Operator
  limit:-1
  Select Operator [SEL_2]
 
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13"]
 Filter Operator [FIL_1]
predicate:(t = (- 10.0)) (type: boolean)
TableScan [TS_0]
   alias:orc_ppd
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11492) get rid of gWorkMap

2015-08-06 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-11492:
---

 Summary: get rid of gWorkMap
 Key: HIVE-11492
 URL: https://issues.apache.org/jira/browse/HIVE-11492
 Project: Hive
  Issue Type: Bug
Reporter: Sergey Shelukhin


gWorkMap is an annoying ugly global that causes leaks. It's not clear why this 
is needed when we already have 10 different *Context objects floating around 
during compilation. At worst we can add another one, would still be better than 
the global map. It should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 37156: HIVE-7476 : CTAS does not work properly for s3

2015-08-06 Thread Szehon Ho

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37156/
---

(Updated Aug. 6, 2015, 7:36 p.m.)


Review request for hive and Lenni Kuff.


Changes
---

Address review comments from Lenni.


Bugs: HIVE-7476
https://issues.apache.org/jira/browse/HIVE-7476


Repository: hive-git


Description
---

Currently, CTAS is broken when target is on S3 and source tables are not, or 
more generally, where source and target tables are on different file systems.  

Mainly the issues was that during the Move operation (last stage of CTAS), it 
was using the destination FileSystem object to run the operations on both the 
source/dest files, thus error when running on a source.  The fix is to use the 
source FileSystem to run operations on the source file, and the dest FileSystem 
to run operations on the dest File.


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 0a466e4 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5840802 

Diff: https://reviews.apache.org/r/37156/diff/


Testing
---

Manually ran CTAS to create a table on S3.


Thanks,

Szehon Ho



[jira] [Created] (HIVE-11491) Lazily call ASTNode::toStringTree() after tree modification

2015-08-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-11491:


 Summary: Lazily call ASTNode::toStringTree() after tree 
modification
 Key: HIVE-11491
 URL: https://issues.apache.org/jira/browse/HIVE-11491
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Currently, we call toStringTree() as part of HIVE-11316 everytime the tree is 
modified. This is a bad approach as we can lazily delay this to the point when 
toStringTree() is called again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11490) Lazily call ASTNode::toStringTree() after tree modification

2015-08-06 Thread Hari Sankar Sivarama Subramaniyan (JIRA)
Hari Sankar Sivarama Subramaniyan created HIVE-11490:


 Summary: Lazily call ASTNode::toStringTree() after tree 
modification
 Key: HIVE-11490
 URL: https://issues.apache.org/jira/browse/HIVE-11490
 Project: Hive
  Issue Type: Bug
Reporter: Hari Sankar Sivarama Subramaniyan
Assignee: Hari Sankar Sivarama Subramaniyan


Currently, we call toStringTree() as part of HIVE-11316 everytime the tree is 
modified. This is a bad approach as we can lazily delay this to the point when 
toStringTree() is called again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 36962: CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY

2015-08-06 Thread pengcheng xiong

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36962/#review94421
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 


Sorry but I actually have a different opinion: (1) Hive is now complicated 
enough; putting some unused code here is just adding complexity and compilation 
overhead. (2) If the dev would like to use this kind of method in the future, 
he/she can implement it again. (3) This method is very specific, e.g., it will 
not do anything if there are more than 1 child at any level. It is not that 
generic.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java (line 573)


True.


- pengcheng xiong


On Aug. 3, 2015, 11:30 p.m., pengcheng xiong wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36962/
> ---
> 
> (Updated Aug. 3, 2015, 11:30 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> solution is to add a SEL in between
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 0f02737 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java 
> af54286 
> 
> Diff: https://reviews.apache.org/r/36962/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>



Hive-0.14 - Build # 1035 - Still Failing

2015-08-06 Thread Apache Jenkins Server
Changes for Build #1014

Changes for Build #1015

Changes for Build #1016

Changes for Build #1017

Changes for Build #1018

Changes for Build #1019

Changes for Build #1020

Changes for Build #1021

Changes for Build #1022

Changes for Build #1023

Changes for Build #1024

Changes for Build #1025

Changes for Build #1026

Changes for Build #1027

Changes for Build #1028

Changes for Build #1029

Changes for Build #1030

Changes for Build #1031

Changes for Build #1032

Changes for Build #1033

Changes for Build #1034

Changes for Build #1035



No tests ran.

The Apache Jenkins build system has built Hive-0.14 (build #1035)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-0.14/1035/ to view 
the results.

NullPointerException in Hive GetNominalPath function?

2015-08-06 Thread Saman Biook Aghazadeh
I have developed a customized InputReader to Read NetCDF format files in
Hadoop, which you can see the code here:

package org.apache.hadoop.mapred;

import java.io.IOException;
import java.io.InputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashSet;
import java.util.IdentityHashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.Set;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.*;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.commons.logging.LogFactory;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.fs.BlockLocation;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter;
import org.apache.hadoop.mapreduce.security.TokenCache;
import org.apache.hadoop.net.NetworkTopology;
import org.apache.hadoop.net.Node;
import org.apache.hadoop.net.NodeBase;
import org.apache.hadoop.util.ReflectionUtils;
import org.apache.hadoop.util.StringUtils;
import org.apache.commons.lang.ArrayUtils;
import org.apache.hadoop.io.NetCDFArrayWritable;
import java.util.List;
import ucar.nc2.*;
import ucar.nc2.iosp.*;
import ucar.nc2.iosp.netcdf3.*;
import ucar.unidata.io.*;
import ucar.nc2.dataset.*;
import ucar.ma2.Array;
import ucar.ma2.ArrayFloat;
import java.util.Arrays;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.InputSplit;
import org.apache.hadoop.mapred.RecordReader;
public class NetCDFInputFormat extends FileInputFormat {

  private static final Log LOG
= LogFactory.getLog(NetCDFInputFormat.class.getName());


  private NetCDFInfo getNetCDFInfo(Path file, FileSystem fs, JobConf
job)
  {

//traverse header and return chunk start and size arrays
NetCDFInfo result = new NetCDFInfo();//library call

NetcdfFile ncFile;
Variable v;
ncFile = null;
try {
ncFile = NetcdfDataset.openFile(file.toString(), null);

v = ncFile.findVariable("rsut");
//List vs = ncFile.getVariables();
//v = vs.get(vs.size()-1);

LOG.info("Variable is "+ v.getFullName());
result.fileSize = ncFile.vfileSize;
result.recStart = ncFile.vrecStart;
Long[] metaArray = v.reallyReadMeta().toArray(new
Long[(int)(ncFile.vnumRecs)]);
result.chunkStarts =ArrayUtils.toPrimitive(metaArray);
//result.chunkSizes = nc.chunkSizes;
result.numRecs = ncFile.vnumRecs;
result.recSize = ncFile.vrecSize;
result.smallRecSize = ncFile.vsmallRecSize;
//result.shape = v.shape;

} catch (Exception e)

{
LOG.info("Bad... "+ e);
}
try{if (ncFile!=null)ncFile.close();}catch (Exception e)
{LOG.info("Bad2... "+e);}

return result;
  }

  @Override
  public InputSplit[] getSplits(JobConf job, int numSplits)
throws IOException {
FileStatus[] files = listStatus(job);


LOG.info( "[SAMAN] beginning of getSplits" );
LOG.info( "[SAMAN] " + files.length );
// Save the number of input files in the job-conf
job.setLong(NUM_INPUT_FILES, files.length);
long totalSize = 0;   // compute total size
for (FileStatus file: files) {// check we have
valid files
  if (file.isDir()) {
throw new IOException("Not a file: "+ file.getPath());
  }
  LOG.info ("[net] adding "+file.getPath());
  totalSize += file.getLen();
}

long goalSize = totalSize / (numSplits == 0 ? 1 : numSplits);
//long minSize = Math.max(job.getLong("mapred.min.split.size", 1),
 //   minSplitSize);

// generate splits
ArrayList splits = new ArrayList(numSplits);
NetworkTopology clusterMap = new NetworkTopology();
for (FileStatus file: files) {
  Path path = file.getPath();
  FileSystem fs = path.getFileSystem(job);
  long length = file.getLen();
  LOG.info("get file len of "+file.getPath());
  BlockLocation[] blkLocations = fs.getFileBlockLocations(file, 0,
length);
  if ((length != 0) && isSplitable(fs, pat

[jira] [Created] (HIVE-11489) Jenkins PreCommit-HIVE-SPARK-Build fails with TestCliDriver.initializationError

2015-08-06 Thread JIRA
Sergio Peña created HIVE-11489:
--

 Summary: Jenkins PreCommit-HIVE-SPARK-Build fails with 
TestCliDriver.initializationError
 Key: HIVE-11489
 URL: https://issues.apache.org/jira/browse/HIVE-11489
 Project: Hive
  Issue Type: Task
  Components: Testing Infrastructure
Reporter: Sergio Peña
Assignee: Sergio Peña


The Jenkins job {{PreCommit-HIVE-SPARK-Build}} is failing due to many 
{{TestCliDriver.initializationError}} test results.

{noformat}
Error Message

Unexpected exception java.io.FileNotFoundException: 
/data/hive-ptest/working/apache-git-source-source/itests/qtest/target/generated-test-sources/java/org/apache/hadoop/hive/cli/TestCliDriverQFileNames.txt
 (No such file or directory)
 at java.io.FileInputStream.open(Native Method)
 at java.io.FileInputStream.(FileInputStream.java:146)
 at java.io.FileReader.(FileReader.java:72)
 at 
org.apache.hadoop.hive.ql.QTestUtil.addTestsToSuiteFromQfileNames(QTestUtil.java:2019)
 at org.apache.hadoop.hive.cli.TestCliDriver.suite(TestCliDriver.java:120)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
org.junit.internal.runners.SuiteMethod.testFromSuiteMethod(SuiteMethod.java:35)
 at org.junit.internal.runners.SuiteMethod.(SuiteMethod.java:24)
 at 
org.junit.internal.builders.SuiteMethodBuilder.runnerForClass(SuiteMethodBuilder.java:11)
 at 
org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
 at 
org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:26)
 at 
org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
 at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:26)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
 at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
 at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
 at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
 at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Stacktrace

junit.framework.AssertionFailedError: Unexpected exception 
java.io.FileNotFoundException: 
/data/hive-ptest/working/apache-git-source-source/itests/qtest/target/generated-test-sources/java/org/apache/hadoop/hive/cli/TestCliDriverQFileNames.txt
 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at java.io.FileReader.(FileReader.java:72)
at 
org.apache.hadoop.hive.ql.QTestUtil.addTestsToSuiteFromQfileNames(QTestUtil.java:2019)
at 
org.apache.hadoop.hive.cli.TestCliDriver.suite(TestCliDriver.java:120)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.junit.internal.runners.SuiteMethod.testFromSuiteMethod(SuiteMethod.java:35)
at org.junit.internal.runners.SuiteMethod.(SuiteMethod.java:24)
at 
org.junit.internal.builders.SuiteMethodBuilder.runnerForClass(SuiteMethodBuilder.java:11)
at 
org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
at 
org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:26)
at 
org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
at 
org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:26)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:262)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

at junit.framework.Assert.fail(Assert.java:57)
at 
org.apache.hadoop.hive.ql.QTestUtil.addTestsToSuiteFromQfileNames(QTestUtil.java:2045)
at 
org.apache.hadoop.hive.cli.TestCliDriver.suite

Re: issue while reading parquet file in hive

2015-08-06 Thread Sergio Pena
I haven't used parquet cascade. But, basically an INT96 type is just a
12-bytes binary. if you can have a int96 type in the schema, then you can
write the 12-bytes using binary methods. Hive tries to match the table
schema (timestamp) with the parquet schema (int96). If both are correct, it
then reads the data as binary.

- Sergio

On Wed, Aug 5, 2015 at 11:04 PM, Santlal J Gupta <
santlal.gu...@bitwiseglobal.com> wrote:

> Hi,
>
> Int96 is not supported in the cascading parquet. It supports Int32 and
> Int64. So that's why I have used binary instead of Int96.
>
> Thanks,
> Santlal J. Gupta
>
> -Original Message-
> From: Sergio Pena [mailto:sergio.p...@cloudera.com]
> Sent: Wednesday, August 5, 2015 11:00 PM
> To: dev@hive.apache.org
> Subject: Re: issue while reading parquet file in hive
>
> Hi Santlal,
>
> Hive uses parquet int96 type to write and read timestamps. Probably the
> error is because of that. You can try with int96 instead of binary.
>
> - Sergio
>
> On Tue, Jul 21, 2015 at 1:54 AM, Santlal J Gupta <
> santlal.gu...@bitwiseglobal.com> wrote:
>
> > Hello,
> >
> >
> >
> > I have following issue.
> >
> >
> >
> > I have created parquet file through cascading parquet  and want to
> > load into the hive table.
> >
> > My datafile contain data of type timestamp.
> >
> > Cascading parquet does not  support  timestamp data type , so while
> > creating parquet file I have given as binary type. After generating
> > parquet file , this  Parquet file is loaded successfully in the hive .
> >
> >
> >
> > While creating hive table I have given the column type as timestamp.
> >
> >
> >
> > Code :
> >
> >
> >
> > package com.parquet.TimestampTest;
> >
> >
> >
> > import cascading.flow.FlowDef;
> >
> > import cascading.flow.hadoop.HadoopFlowConnector;
> >
> > import cascading.pipe.Pipe;
> >
> > import cascading.scheme.Scheme;
> >
> > import cascading.scheme.hadoop.TextDelimited;
> >
> > import cascading.tap.SinkMode;
> >
> > import cascading.tap.Tap;
> >
> > import cascading.tap.hadoop.Hfs;
> >
> > import cascading.tuple.Fields;
> >
> > import parquet.cascading.ParquetTupleScheme;
> >
> >
> >
> > public class GenrateTimeStampParquetFile {
> >
> > static String inputPath =
> > "target/input/timestampInputFile1";
> >
> > static String outputPath =
> > "target/parquetOutput/TimestampOutput";
> >
> >
> >
> > public static void main(String[] args) {
> >
> >
> >
> > write();
> >
> > }
> >
> >
> >
> > private static void write() {
> >
> > // TODO Auto-generated method stub
> >
> >
> >
> > Fields field = new
> > Fields("timestampField").applyTypes(String.class);
> >
> > Scheme sourceSch = new
> > TextDelimited(field, false, "\n");
> >
> >
> >
> > Fields outputField = new
> > Fields("timestampField");
> >
> >
> >
> > Scheme sinkSch = new
> > ParquetTupleScheme(field, outputField,
> >
> >
> > "message TimeStampTest{optional binary timestampField ;}");
> >
> >
> >
> > Tap source = new Hfs(sourceSch,
> > inputPath);
> >
> > Tap sink = new Hfs(sinkSch,
> > outputPath, SinkMode.REPLACE);
> >
> >
> >
> > Pipe pipe = new Pipe("Hive
> > timestamp");
> >
> >
> >
> > FlowDef fd =
> > FlowDef.flowDef().addSource(pipe, source).addTailSink(pipe, sink);
> >
> >
> >
> > new
> > HadoopFlowConnector().connect(fd).complete();
> >
> > }
> >
> > }
> >
> >
> >
> > Input file:
> >
> >
> >
> > timestampInputFile1
> >
> >
> >
> > timestampField
> >
> > 1988-05-25 15:15:15.254
> >
> > 1987-05-06 14:14:25.362
> >
> >
> >
> > After running the code following files are generated.
> >
> > Output :
> >
> > 1. part-0-m-0.parquet
> >
> > 2. _SUCCESS
> >
> > 3. _metadata
> >
> > 4. _common_metadata
> >
> >
> >
> > I have created the table in hive to load the
> > part-0-m-0.parquet file.
> >
> >
> >
> > I have written following query in the hive.
> >
> > Query :
> >
> >
> >
> > hive> create table test3(timestampField timestamp) stored as parquet;
> >
> > hive> load data local inpath
> > '/home/hduser/parquet_testing/part-0-m-0.parquet' into table
> > test3;
> >
> > hive> select  * from test3;
> >
> >
> >
> > After running above command I got following as output.
> >
> >
> >
> > Output :
> >
> >
> >
> > OK
> >
> > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> >
> > SLF4J: Defaulting to no-operation (NOP) logger implementation
> >
> > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> > further details.
> >
> > Failed with exception
> > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
> > java.lang.ClassCastException: org.apac

Re: Review Request 37156: HIVE-7476 : CTAS does not work properly for s3

2015-08-06 Thread Sergio Pena

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/37156/#review94403
---

Ship it!


Ship It!

- Sergio Pena


On Aug. 6, 2015, 1:32 a.m., Szehon Ho wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/37156/
> ---
> 
> (Updated Aug. 6, 2015, 1:32 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-7476
> https://issues.apache.org/jira/browse/HIVE-7476
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Currently, CTAS is broken when target is on S3 and source tables are not, or 
> more generally, where source and target tables are on different file systems. 
>  
> 
> Mainly the issues was that during the Move operation (last stage of CTAS), it 
> was using the destination FileSystem object to run the operations on both the 
> source/dest files, thus error when running on a source.  The fix is to use 
> the source FileSystem to run operations on the source file, and the dest 
> FileSystem to run operations on the dest File.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java 0a466e4 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 5840802 
> 
> Diff: https://reviews.apache.org/r/37156/diff/
> 
> 
> Testing
> ---
> 
> Manually ran CTAS to create a table on S3.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>



[jira] [Created] (HIVE-11488) Add sessionId info to HS2 log

2015-08-06 Thread Aihua Xu (JIRA)
Aihua Xu created HIVE-11488:
---

 Summary: Add sessionId info to HS2 log
 Key: HIVE-11488
 URL: https://issues.apache.org/jira/browse/HIVE-11488
 Project: Hive
  Issue Type: New Feature
  Components: Logging
Affects Versions: 2.0.0
Reporter: Aihua Xu


Session is critical for a multi-user system like Hive. Currently Hive doesn't 
log seessionId to the log file, which sometimes make debugging and analysis 
difficult when multiple activities are going on at the same time and the log 
from different sessions are mixed together.

Currently, Hive already has the sessionId saved in SessionState and also there 
is another sessionId in SessionHandle (Seems not used and I'm still looking to 
understand it). Generally we should have one sessionId from the beginning in 
the client side and server side. Seems we have some work on that side first.

The sessionId then can be added to log4j supported mapped diagnostic context 
(MDC) and can be configured to output to log file through the log4j property. 
MDC is per thread, so we need to add sessionId to the HS2 main thread and then 
it will be inherited by the child threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11487) Add getNumPartitionsByFilter api in metastore api

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11487:
--

 Summary: Add getNumPartitionsByFilter api in metastore api
 Key: HIVE-11487
 URL: https://issues.apache.org/jira/browse/HIVE-11487
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Amareshwari Sriramadasu


Adding api for getting number of partitions for a filter will be more optimal 
when we are only interested in the number. getAllPartitions will construct all 
the partition object which can be time consuming and not required.

Here is a commit we pushed in a forked repo in our organization - 
https://github.com/inmobi/hive/commit/68b3534d3e6c4d978132043cec668798ed53e444.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11486) Hive should log exceptions for better debuggability with full trace

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11486:
--

 Summary: Hive should log exceptions for better debuggability with 
full trace
 Key: HIVE-11486
 URL: https://issues.apache.org/jira/browse/HIVE-11486
 Project: Hive
  Issue Type: Improvement
  Components: Diagnosability
Reporter: Amareshwari Sriramadasu


For ex : 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L2638
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java#L315



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11485) Session close should not close async SQL operations

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11485:
--

 Summary: Session close should not close async SQL operations
 Key: HIVE-11485
 URL: https://issues.apache.org/jira/browse/HIVE-11485
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Amareshwari Sriramadasu


Right now, session close on HiveServer closes all operations. But, queries 
running are actually available across sessions and they are not tied to a 
session (expect the launch - which requires configuration and resources). And 
it allows getting the status of the query across sessions.

But session close of the session ( on which operation is launched) closes all 
the operations as well. 

So, we should avoid closing all operations upon closing a session.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11484) Fix ObjectInspector for Char and VarChar

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11484:
--

 Summary: Fix ObjectInspector for Char and VarChar
 Key: HIVE-11484
 URL: https://issues.apache.org/jira/browse/HIVE-11484
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Reporter: Amareshwari Sriramadasu


The creation of HiveChar and Varchar is not happening through ObjectInspector.

Here is fix we pushed internally : 
https://github.com/InMobi/hive/commit/fe95c7850e7130448209141155f28b25d3504216



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11483) Add encoding and decoding for query string config

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11483:
--

 Summary: Add encoding and decoding for query string config
 Key: HIVE-11483
 URL: https://issues.apache.org/jira/browse/HIVE-11483
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu


We have seen some queries in production where some of the literals passed in 
the query have control characters, which result in exception when query string 
is set in the job xml.

Proposing a solution to encode the query string in configuration and provide 
getters decoded string.

Here is a commit in a forked repo : 
https://github.com/InMobi/hive/commit/2faf5761191fa3103a0d779fde584d494ed75bf5

Suggestions are welcome on the solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-11482) Add retrying thrift client for HiveServer2

2015-08-06 Thread Amareshwari Sriramadasu (JIRA)
Amareshwari Sriramadasu created HIVE-11482:
--

 Summary: Add retrying thrift client for HiveServer2
 Key: HIVE-11482
 URL: https://issues.apache.org/jira/browse/HIVE-11482
 Project: Hive
  Issue Type: Improvement
  Components: HiveServer2
Reporter: Amareshwari Sriramadasu


Similar to 
https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java,
 this improvement request is to add a retrying thrift client for HiveServer2 to 
do retries upon thrift exceptions.

Here are few commits done on a forked branch that can be picked - 
https://github.com/InMobi/hive/commit/7fb957fb9c2b6000d37c53294e256460010cb6b7
https://github.com/InMobi/hive/commit/11e4b330f051c3f58927a276d562446761c9cd6d
https://github.com/InMobi/hive/commit/241386fd870373a9253dca0bcbdd4ea7e665406c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 36962: CBO: Calcite Operator To Hive Operator (Calcite Return Path): Groupby Optimizer assumes the schema can match after removing RS and GBY

2015-08-06 Thread Jesús Camacho Rodríguez

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36962/#review94382
---



ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 


I wouldn't remove this method; although it is not used right now, it is 
generic enough to be used in the future.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java (line 573)


The mismatch that can exist is only due to different column names, am I 
right?
Then could we add an assert here:
parent.getSchema().getSignature().size() == 
curr.getSchema().getSignature().size() ?


- Jesús Camacho Rodríguez


On Aug. 3, 2015, 11:30 p.m., pengcheng xiong wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36962/
> ---
> 
> (Updated Aug. 3, 2015, 11:30 p.m.)
> 
> 
> Review request for hive and Jesús Camacho Rodríguez.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> solution is to add a SEL in between
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 0f02737 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GroupByOptimizer.java 
> af54286 
> 
> Diff: https://reviews.apache.org/r/36962/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>