An error occured when writing tables to avro files
hi I create an avro format table follow the wiki https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-Hive0.14andlater An error occured when insert data from another table that created by previous steps. I am using hive-0.14.0/hive-1.2.0 + hadoop-2.6.0. Do you have any idea? hive CREATE TABLE as_avro(string1 STRING, int1 INT, tinyint1 TINYINT, smallint1 SMALLINT, bigint1 BIGINT, boolean1 BOOLEAN, float1 FLOAT, double1 DOUBLE, list1 ARRAYSTRING, map1 MAPSTRING,INT, struct1 STRUCTsint:INT,sboolean:BOOLEAN,sstring:STRING, union1 uniontypeFLOAT, BOOLEAN, STRING, enum1 STRING, nullableint INT, bytes1 BINARY, fixed1 BINARY) STORED AS AVRO; OK Time taken: 0.11 seconds hive INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer; FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different 'as_avro': Cannot convert column 11 from uniontypefloat,boolean,string to uniontypevoid,float,boolean,string. I do not understand Why the column union1 look like this uniontypevoid,float,boolean,string Thanks zhuweimin
cast column float
Hi, I queried a table based on value of two float columns select count(*) from u where xlong_u = 7.1578474 and xlat_u = 55.192524; select count(*) from u where xlong_u = cast(7.1578474 as float) and xlat_u = cast(55.192524 as float); Both query returned 0 records, even though there are some records matched the condition. What can be wrong? I am using Hive 0.14 BR, Patcharee
Re: cast column float
could you also provide some sample dataset for these two columns? On Wed, May 27, 2015 at 7:17 PM, patcharee patcharee.thong...@uni.no wrote: Hi, I queried a table based on value of two float columns select count(*) from u where xlong_u = 7.1578474 and xlat_u = 55.192524; select count(*) from u where xlong_u = cast(7.1578474 as float) and xlat_u = cast(55.192524 as float); Both query returned 0 records, even though there are some records matched the condition. What can be wrong? I am using Hive 0.14 BR, Patcharee
only timestamp column value of previous row gets reset
Hi, I want to cross check a scenario with you and make sure its not a problem on my end. I am trying do to HCatalog read on an edge node and I am seeing a strange behavior with timestamp data type. My hive version is hive 0.13.0.2 First, this is the way documentation suggests the reading to be. ( https://cwiki.apache.org/confluence/display/Hive/HCatalog+ReaderWriter) for(InputSplit split : readCntxt.getSplits()){ HCatReader reader = DataTransferFactory.getHCatReader(split, readerCntxt.getConf()); IteratorHCatRecord itr = reader.read(); while(itr.hasNext()){ HCatRecord *read* = itr.next(); } } I am storing the iterator *read* into a buffer for later use in main(). Later I access this iterator from the stored buffer and drain it by printing out the rows in another thread, and I see the following behavior. “The column value of data type *timestamp *of a previous row gets reset to 1*969-12-31 19:00:00.0* when the column value in the current row has *null*. Columns of other data types in previous row do not get affected by presence of *null* in its current column value. Also changing the order of columns in source data doesn’t change the behavior” hive describe bug; dtcol date tscol timestamp stcol string Time taken: 0.058 seconds, Fetched: 3 row(s) hive select * from bug; 9779-11-21 2014-04-01 11:30:55 abc 9779-11-21 2014-04-04 11:30:55 def NULLNULL Read in thread - 9779-11-21 2014-04-01 11:30:55.0 abc Read in thread - 9779-11-21 *1969-12-31 19:00:00.0* def Read in thread - null null Can this be an issue in Hive timestamp implementation ? Regards, Ujjwal
Re: Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS
I'm afraid you're at the wrong community. You might have a better chance to get an answer in Spark community. Thanks, Xuefu On Wed, May 27, 2015 at 5:44 PM, Sanjay Subramanian sanjaysubraman...@yahoo.com wrote: hey guys On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x , there are about 300+ hive tables. The data is stored an text (moving slowly to Parquet) on HDFS. I want to use SparkSQL and point to the Hive metadata and be able to define JOINS etc using a programming structure like this import org.apache.spark.sql.hive.HiveContext val sqlContext = new HiveContext(sc) val schemaRdd = sqlContext.sql(some complex SQL) Is that the way to go ? Some guidance will be great. thanks sanjay
Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS
hey guys On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x , there are about 300+ hive tables.The data is stored an text (moving slowly to Parquet) on HDFS.I want to use SparkSQL and point to the Hive metadata and be able to define JOINS etc using a programming structure like this import org.apache.spark.sql.hive.HiveContextval sqlContext = new HiveContext(sc)val schemaRdd = sqlContext.sql(some complex SQL) Is that the way to go ? Some guidance will be great. thanks sanjay