RE: Welcome Rui Li to Hive PMC
Congratulations Rui! -- Santlal On Thu, May 25, 2017 at 12:40 AM, Chao Sun wrote: > Congratulations Rui!! > > On Wed, May 24, 2017 at 9:19 PM, Xuefu Zhang wrote: > > > Hi all, > > > > It's an honer to announce that Apache Hive PMC has recently voted to > invite > > Rui Li as a new Hive PMC member. Rui is a long time Hive contributor > > and committer, and has made significant contribution in Hive > > especially in > Hive > > on Spark. Please join me in congratulating him and looking forward > > to a bigger role that he will play in Apache Hive project. > > > > Thanks, > > Xuefu > > >
Not able to use IN clause in multiple partition
Hi, I am trying to use multiple partition column in IN clause using struct. I am getting StandardStructObjectInspector exception. I have tried below query. hive> create table partm(f1 int, f2 int) partitioned by (f3 int , f4 int) row format delimited fields terminated by ',' stored as textfile location '/user/ojasd/warehouse/partm'; hive> insert into table partm partition (f3=2, f4=4) values(1,2); hive> insert into table partm partition (f3=1, f4=3) values(2,4); hive> insert into table partm partition (f3=6, f4=7) values(8,9); hive> select * from partm; OK partm.f1partm.f2partm.f3partm.f4 2 41 3 1 22 4 896 7 hive> select * from partm where f3=1 and f4=3; OK partm.f1 partm.f2 partm.f3 partm.f4 2 4 1 3 hive> select * from partm where struct(f3,f4) IN (struct(1,3),struct(2,4)); FAILED: ClassCastException org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector cannot be cast to org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector Can you point out is there anything wrong in this query. I am using Hive 1.1.0-cdh5.5.4 version Thanks Santlal J Gupta **Disclaimer** This e-mail message and any attachments may contain confidential information and is for the sole use of the intended recipient(s) only. Any views or opinions presented or implied are solely those of the author and do not necessarily represent the views of BitWise. If you are not the intended recipient(s), you are hereby notified that disclosure, printing, copying, forwarding, distribution, or the taking of any action whatsoever in reliance on the contents of this electronic information is strictly prohibited. If you have received this e-mail message in error, please immediately notify the sender and delete the electronic message and any attachments.BitWise does not accept liability for any virus introduced by this e-mail or any attachments.
RE: More than one table created at the same location
hi, I also found that more than one table can be created by pointing to same location. But @Naveen i had found some different behavior in case of count(*). Below thing i have tried . hive> create table test1(f1 int,f2 int) stored as orc location '/user/xyz/warehouse/test1' ; hive> insert into test1 values(3,4); hive> insert into test1 values(1,2); hive> select * from test1; OK 3 4 1 2 hive> select count(*) from test1; Query ID = xyz_20160901102727_df97784e-6166-4044-8ebd-dc76661bb78c Total jobs = 1 OK 2 hive> create table test2(f1 int,f2 int) stored as orc location '/user/xyz/warehouse/test1' ; hive> insert into test2 values(5,6); hive> select * from test2; OK 3 4 1 2 5 6 hive> select count(*) from test2; Query ID = xyz_20160901102929_ba7736e0-a9bf-4f43-ace1-27c9ba0a709c Total jobs = 1 . OK 3 hive> select count(*) from test1; Query ID = xyz_20160901103131_58be44e2-fae5-4c44-88b3-40922cd55066 Total jobs = 1 . OK 3 hadoop fs -ls /user/xyz/warehouse/test1 Found 3 items -rwxr-xr-x 2 xyz xyz246 2016-09-01 10:26 /user/xyz/warehouse/test1/00_0 -rwxr-xr-x 2 xyz xyz246 2016-09-01 10:26 /user/xyz/warehouse/test1/00_0_copy_1 -rwxr-xr-x 2 xyz xyz246 2016-09-01 10:28 /user/xyz/warehouse/test1/00_0_copy_2 so from this, it is confirm that in hive table test2 is reading data from table test1. So, is it expected behavior in hive or it is bug? Thanks Santlal J Gupta -Original Message- From: naveen mahadevuni [mailto:nmahadev...@gmail.com] Sent: Wednesday, August 31, 2016 7:42 PM To: dev@hive.apache.org Subject: Re: More than one table created at the same location Hi, I created external table, copied data files to that location and then count returns 4. It is ambiguous, can it be documented? hive> CREATE EXTERNAL TABLE test_ext (col1 INT, col2 INT) > stored as orc > LOCATION '/apps/hive/warehouse/ext'; OK Time taken: 9.875 seconds hive> select count(*) from test_ext; Query ID = root_20160831094725_14753b28-68bb-4106-89b7-45052e0cf9a1 Total jobs = 1 Launching Job 1 out of 1 OK 4 Time taken: 30.366 seconds, Fetched: 1 row(s) hive> select * from test_ext; OK 1 2 3 4 1 2 3 4 Time taken: 6.478 seconds, Fetched: 4 row(s) On Wed, Aug 31, 2016 at 2:27 AM, Thejas Nair wrote: > Naveen, > Can you please verify if you create these tables as external tables > the results are correct ? > In case of managed tables, the assumption is that there is a 1:1 > mapping between tables and the locations and all update to the table > are through hive. With that assumption, it relies on stats to return > results in queries like count(*) . > > > On Tue, Aug 30, 2016 at 4:18 AM, Abhishek Somani < > abhisheksoman...@gmail.com > > wrote: > > > For the 2nd table(after both inserts are over), isn't the return > > count expected to be 4? In that case, isn't the the bug that the > > count was returned wrong(maybe from the stats as mentioned) rather > > the fact that another table was allowed to be created at the same location? > > > > I might be very wrong, so pardon my ignorance. > > > > On Tue, Aug 30, 2016 at 3:06 AM, Alan Gates > wrote: > > > > > Note that Hive doesn’t track individual files, just which > > > directory a table stores its files in. So we wouldn’t expect this > > > to work. The > bug > > is > > > more that Hive doesn’t detect that two tables are trying to use > > > the > same > > > directory. I’m not sure we’re anxious to fix this since it would > > > mean > > when > > > creating a table Hive would need to search all existing tables to > > > make > > sure > > > none of them are using the directory the new table wants to use. > > > > > > Alan. > > > > > > > On Aug 30, 2016, at 04:17, Sergey Shelukhin > > > > > > > wrote: > > > > > > > > This is a bug, or rather an unexpected usage. I suspect the > > > > correct > > count > > > > value is coming from statistics. > > > > Can you file a JIRA? > > > > > > > > On 16/8/29, 00:51, "naveen mahadevuni" > wrote: > > > > > > > >> Hi, > > > >> > > > >> Is the following behavior a bug? I believe at least one part of > > > >> it > is > > a > > > >> bug. I created two Hive tables at the same location and > > > >> inserted > rows > > in > > > >> two tables. count(*) returns the correct count for each > > > >> indiv
RE: [ANNOUNCE] New Hive Committer - Siddharth Seth
Congratulations !! -Original Message- From: Chetna C [mailto:chetna@gmail.com] Sent: Thursday, October 22, 2015 8:58 AM To: dev@hive.apache.org Cc: Siddharth Seth Subject: Re: [ANNOUNCE] New Hive Committer - Siddharth Seth Congratulations !! On Oct 22, 2015 5:13 AM, "Pengcheng Xiong" wrote: > Congrats Sid! > > On Wed, Oct 21, 2015 at 2:14 PM, Sergey Shelukhin > > wrote: > > > The Apache Hive PMC has voted to make Siddharth Seth a committer on > > the Apache Hive Project. > > > > Please join me in congratulating Sid! > > > > Thanks, > > Sergey. > > > > >
Issue while storing Date data in hive table (stored as parquet) with cascading-hive
Hi, I am beginner with cascading-hive. Through cascading hive, i want to load data into the hive table which is stored as parquet format. My data contains one field which is date. I have created hive table in parquet format. But when i tried to load date data into the hive table (i.e. stored as parquet) , it failed to load. Here in sink I have mentioned HiveTap and I have map the field with Binary(String) as Date datatype is not available in Cascading-parquet. I have tried some sample code. code : public class ReadText_StoredIn_Parquet_Date { static String inpath="parquet_input/ReadText_StoredIn_Parquet_Date.txt"; public static void main(String[] args) { // TODO Auto-generated method stub Properties properties = new Properties(); AppProps.setApplicationJarClass( properties, TestExample.class ); AppProps.addApplicationTag( properties, "Cascading-HiveDemoPart1" ); Scheme sourceSch = new TextDelimited(new Fields("dob"),true,"\n"); Tap inTapCallCenter = new Hfs( sourceSch, inpath ); String columnFields[]={"dob"}; String columnType[]={"date"}; String databaseName="hive_parquet"; String tableName= "parquet_date"; HiveTableDescriptor sinkTableDescriptor = new HiveTableDescriptor (databaseName ,tableName, columnFields, columnType ); ParquetTupleScheme scheme = new ParquetTupleScheme(new Fields(columnFields),new Fields(columnFields), "message ReadText_Parquet_string_int{optional Binary dob; }"); HiveTap sinkTap = new HiveTap( sinkTableDescriptor, scheme, SinkMode.REPLACE, true ); Pipe copyPipe = new Pipe( "copyPipe" ); FlowDef def=FlowDef.flowDef().addSource(copyPipe, inTapCallCenter).addTailSink(copyPipe, sinkTap); new Hadoop2MR1FlowConnector(properties).connect(def).complete(); } } This code works fine, it loads data into the table (i.e. stored as parquet format). But while I read data it will give exception I have used ParquetTupleScheme to generate scheme for HiveTap. I have used following query. Query: hive (hive_parquet)> create table parquet_date(dob date) stored as parquet; hive (hive_parquet)> select * from parquet_date; Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.hive.serde2.io.DateWritable can you please assist me to how to store date data value into the hive table(i.e. stored as parquet) by using ParquetTupleScheme or by any another way. currently i am using: hive-1.2.0 hadoop Thanks, Santlal J. Gupta **Disclaimer** This e-mail message and any attachments may contain confidential information and is for the sole use of the intended recipient(s) only. Any views or opinions presented or implied are solely those of the author and do not necessarily represent the views of BitWise. If you are not the intended recipient(s), you are hereby notified that disclosure, printing, copying, forwarding, distribution, or the taking of any action whatsoever in reliance on the contents of this electronic information is strictly prohibited. If you have received this e-mail message in error, please immediately notify the sender and delete the electronic message and any attachments.BitWise does not accept liability for any virus introduced by this e-mail or any attachments.
RE: issue while reading parquet file in hive
Hi, Int96 is not supported in the cascading parquet. It supports Int32 and Int64. So that's why I have used binary instead of Int96. Thanks, Santlal J. Gupta -Original Message- From: Sergio Pena [mailto:sergio.p...@cloudera.com] Sent: Wednesday, August 5, 2015 11:00 PM To: dev@hive.apache.org Subject: Re: issue while reading parquet file in hive Hi Santlal, Hive uses parquet int96 type to write and read timestamps. Probably the error is because of that. You can try with int96 instead of binary. - Sergio On Tue, Jul 21, 2015 at 1:54 AM, Santlal J Gupta < santlal.gu...@bitwiseglobal.com> wrote: > Hello, > > > > I have following issue. > > > > I have created parquet file through cascading parquet and want to > load into the hive table. > > My datafile contain data of type timestamp. > > Cascading parquet does not support timestamp data type , so while > creating parquet file I have given as binary type. After generating > parquet file , this Parquet file is loaded successfully in the hive . > > > > While creating hive table I have given the column type as timestamp. > > > > Code : > > > > package com.parquet.TimestampTest; > > > > import cascading.flow.FlowDef; > > import cascading.flow.hadoop.HadoopFlowConnector; > > import cascading.pipe.Pipe; > > import cascading.scheme.Scheme; > > import cascading.scheme.hadoop.TextDelimited; > > import cascading.tap.SinkMode; > > import cascading.tap.Tap; > > import cascading.tap.hadoop.Hfs; > > import cascading.tuple.Fields; > > import parquet.cascading.ParquetTupleScheme; > > > > public class GenrateTimeStampParquetFile { > > static String inputPath = > "target/input/timestampInputFile1"; > > static String outputPath = > "target/parquetOutput/TimestampOutput"; > > > > public static void main(String[] args) { > > > > write(); > > } > > > > private static void write() { > > // TODO Auto-generated method stub > > > > Fields field = new > Fields("timestampField").applyTypes(String.class); > > Scheme sourceSch = new > TextDelimited(field, false, "\n"); > > > > Fields outputField = new > Fields("timestampField"); > > > > Scheme sinkSch = new > ParquetTupleScheme(field, outputField, > > > "message TimeStampTest{optional binary timestampField ;}"); > > > > Tap source = new Hfs(sourceSch, > inputPath); > > Tap sink = new Hfs(sinkSch, > outputPath, SinkMode.REPLACE); > > > > Pipe pipe = new Pipe("Hive > timestamp"); > > > > FlowDef fd = > FlowDef.flowDef().addSource(pipe, source).addTailSink(pipe, sink); > > > > new > HadoopFlowConnector().connect(fd).complete(); > > } > > } > > > > Input file: > > > > timestampInputFile1 > > > > timestampField > > 1988-05-25 15:15:15.254 > > 1987-05-06 14:14:25.362 > > > > After running the code following files are generated. > > Output : > > 1. part-0-m-0.parquet > > 2. _SUCCESS > > 3. _metadata > > 4. _common_metadata > > > > I have created the table in hive to load the > part-0-m-0.parquet file. > > > > I have written following query in the hive. > > Query : > > > > hive> create table test3(timestampField timestamp) stored as parquet; > > hive> load data local inpath > '/home/hduser/parquet_testing/part-0-m-0.parquet' into table > test3; > > hive> select * from test3; > > > > After running above command I got following as output. > > > > Output : > > > > OK > > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > > SLF4J: Defaulting to no-operation (NOP) logger implementation > > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for > further details. > > Failed with exception > java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable > cannot be cast to org.apache.had
Issue while working with parquet hive
Hi, I want to use Date and BigDecimal datatype with parquet hive. Currently I am using Hive 0.12.0-cdh5.1.0 version. When I have written following query I get as given below . Query : hive (primitive_db)> create table big_date_test( dob DATE , salary BIGDECIMAL ) stored as parquet; NoViableAltException(26@[]) at org.apache.hadoop.hive.ql.parse.HiveParser.type(HiveParser.java:31711) at org.apache.hadoop.hive.ql.parse.HiveParser.colType(HiveParser.java:31476) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameType(HiveParser.java:31176) at org.apache.hadoop.hive.ql.parse.HiveParser.columnNameTypeList(HiveParser.java:29401) at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4439) at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2084) at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1344) at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:983) at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:190) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:434) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:352) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:995) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1038) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:921) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:422) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:790) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:684) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:623) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) FAILED: ParseException line 1:46 cannot recognize input near 'bigdecimal' ')' 'stored' in column type Please guide me, which version should I use so that I am able to use these datatypes. Thanks Santlal **Disclaimer** This e-mail message and any attachments may contain confidential information and is for the sole use of the intended recipient(s) only. Any views or opinions presented or implied are solely those of the author and do not necessarily represent the views of BitWise. If you are not the intended recipient(s), you are hereby notified that disclosure, printing, copying, forwarding, distribution, or the taking of any action whatsoever in reliance on the contents of this electronic information is strictly prohibited. If you have received this e-mail message in error, please immediately notify the sender and delete the electronic message and any attachments.BitWise does not accept liability for any virus introduced by this e-mail or any attachments.
Issue while working with Date
Hi, I am beginner with working in hive. I have one issue. I have seen the source code of hive_master. In that source code there is one class named Date_Writable. In this class there is get and set method. One of set method tatking int as argument. public DateWritable(int d) { set(d); } /** * Set the DateWritable based on the days since epoch date. * @param d integer value representing days since epoch date */ public void set(int d) { daysSinceEpoch = d; } /** * Set the DateWritable based on the year/month/day of the date in the local timezone. * @param d Date value */ public void set(Date d) { if (d == null) { daysSinceEpoch = 0; return; } set(dateToDays(d)); } public void set(DateWritable d) { set(d.daysSinceEpoch); } And there is one of method named get which will return the daysSinceEpoch public int getDays() { return daysSinceEpoch; } So my query is that if I pass the daysSinceEpoch as value to date column so it should take it. For this I have written following query. Query : hive (primitive_type)> create table dateDemo(data Date) ; OK Time taken: 0.296 seconds hive (primitive_type)> insert into dateDemo values(50); OK Time taken: 24.529 seconds hive (primitive_type)> select * from dateDemo; OK NULL Time taken: 0.168 seconds, Fetched: 1 row(s) So can I insert epochDays in the date type column. So that when I read the column I will get actual date. Please guide me for this issue. Thanks, Santlal J. Gupta **Disclaimer** This e-mail message and any attachments may contain confidential information and is for the sole use of the intended recipient(s) only. Any views or opinions presented or implied are solely those of the author and do not necessarily represent the views of BitWise. If you are not the intended recipient(s), you are hereby notified that disclosure, printing, copying, forwarding, distribution, or the taking of any action whatsoever in reliance on the contents of this electronic information is strictly prohibited. If you have received this e-mail message in error, please immediately notify the sender and delete the electronic message and any attachments.BitWise does not accept liability for any virus introduced by this e-mail or any attachments.
RE: [ANNOUNCE] New Hive PMC Member - Sushanth Sowmyan
Congrats Sushanth . -Original Message- From: kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com] Sent: Friday, July 24, 2015 2:43 AM To: dev@hive.apache.org Cc: Sushanth Sowmyan Subject: Re: [ANNOUNCE] New Hive PMC Member - Sushanth Sowmyan Congrats Sushanth! On Thu, Jul 23, 2015 at 3:40 PM, Eugene Koifman wrote: > Congratulations! > > On 7/22/15, 9:45 AM, "Carl Steinbach" wrote: > > >I am pleased to announce that Sushanth Sowmyan has been elected to > >the Hive Project Management Committee. Please join me in > >congratulating Sushanth! > > > >Thanks. > > > >- Carl > > -- Swarnim
issue while reading parquet file in hive
Hello, I have following issue. I have created parquet file through cascading parquet and want to load into the hive table. My datafile contain data of type timestamp. Cascading parquet does not support timestamp data type , so while creating parquet file I have given as binary type. After generating parquet file , this Parquet file is loaded successfully in the hive . While creating hive table I have given the column type as timestamp. Code : package com.parquet.TimestampTest; import cascading.flow.FlowDef; import cascading.flow.hadoop.HadoopFlowConnector; import cascading.pipe.Pipe; import cascading.scheme.Scheme; import cascading.scheme.hadoop.TextDelimited; import cascading.tap.SinkMode; import cascading.tap.Tap; import cascading.tap.hadoop.Hfs; import cascading.tuple.Fields; import parquet.cascading.ParquetTupleScheme; public class GenrateTimeStampParquetFile { static String inputPath = "target/input/timestampInputFile1"; static String outputPath = "target/parquetOutput/TimestampOutput"; public static void main(String[] args) { write(); } private static void write() { // TODO Auto-generated method stub Fields field = new Fields("timestampField").applyTypes(String.class); Scheme sourceSch = new TextDelimited(field, false, "\n"); Fields outputField = new Fields("timestampField"); Scheme sinkSch = new ParquetTupleScheme(field, outputField, "message TimeStampTest{optional binary timestampField ;}"); Tap source = new Hfs(sourceSch, inputPath); Tap sink = new Hfs(sinkSch, outputPath, SinkMode.REPLACE); Pipe pipe = new Pipe("Hive timestamp"); FlowDef fd = FlowDef.flowDef().addSource(pipe, source).addTailSink(pipe, sink); new HadoopFlowConnector().connect(fd).complete(); } } Input file: timestampInputFile1 timestampField 1988-05-25 15:15:15.254 1987-05-06 14:14:25.362 After running the code following files are generated. Output : 1. part-0-m-0.parquet 2. _SUCCESS 3. _metadata 4. _common_metadata I have created the table in hive to load the part-0-m-0.parquet file. I have written following query in the hive. Query : hive> create table test3(timestampField timestamp) stored as parquet; hive> load data local inpath '/home/hduser/parquet_testing/part-0-m-0.parquet' into table test3; hive> select * from test3; After running above command I got following as output. Output : OK SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.BytesWritable cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritable But I have got above exception. So please help me to solve this problem. Currently I am using Hive 1.1.0-cdh5.4.2. Cascading 2.5.1 parquet-format-2.2.0 Thanks Santlal J. Gupta **Disclaimer** This e-mail message and any attachments may contain confidential information and is for the sole use of the intended recipient(s) only. Any views or opinions presented or implied are solely those of the author and do not necessarily represent the views of BitWise. If you are not the intended recipient(s), you are hereby notified that disclosure, printing, copying, forwarding, distribution, or the taking of any action whatsoever in reliance on the contents of this electronic information is strictly prohibited. If you have received this e-mail message in error, please immediately notify the sender and delete the electronic message and any attachments.BitWise does not accept liability for any virus introduced by this e-mail or any attachments.