An error occured when writing tables to avro files

2015-05-27 Thread 朱 偉民
hi

I  create an avro format table follow the wiki
https://cwiki.apache.org/confluence/display/Hive/AvroSerDe#AvroSerDe-Hive0.14andlater

An error occured when insert data from another table that created by previous 
steps.
I am using hive-0.14.0/hive-1.2.0 + hadoop-2.6.0.
Do you have any idea?

hive CREATE TABLE as_avro(string1 STRING,
  int1 INT,
  tinyint1 TINYINT,
  smallint1 SMALLINT,
  bigint1 BIGINT,
  boolean1 BOOLEAN,
  float1 FLOAT,
  double1 DOUBLE,
  list1 ARRAYSTRING,
  map1 MAPSTRING,INT,
  struct1 
STRUCTsint:INT,sboolean:BOOLEAN,sstring:STRING,
  union1 uniontypeFLOAT, BOOLEAN, STRING,
  enum1 STRING,
  nullableint INT,
  bytes1 BINARY,
  fixed1 BINARY)
 STORED AS AVRO;
OK
Time taken: 0.11 seconds

hive INSERT OVERWRITE TABLE as_avro SELECT * FROM test_serializer;
FAILED: SemanticException [Error 10044]: Line 1:23 Cannot insert into target 
table because column number/types are different 'as_avro': Cannot convert 
column 11 from uniontypefloat,boolean,string to 
uniontypevoid,float,boolean,string.

I do not understand Why the column union1 look like this
   uniontypevoid,float,boolean,string

Thanks
zhuweimin



cast column float

2015-05-27 Thread patcharee

Hi,

I queried a table based on value of two float columns

select count(*) from u where xlong_u = 7.1578474 and xlat_u = 55.192524;
select count(*) from u where xlong_u = cast(7.1578474 as float) and 
xlat_u = cast(55.192524 as float);


Both query returned 0 records, even though there are some records 
matched the condition. What can be wrong? I am using Hive 0.14


BR,
Patcharee


Re: cast column float

2015-05-27 Thread Bhagwan S. Soni
could you also provide some sample dataset for these two columns?

On Wed, May 27, 2015 at 7:17 PM, patcharee patcharee.thong...@uni.no
wrote:

 Hi,

 I queried a table based on value of two float columns

 select count(*) from u where xlong_u = 7.1578474 and xlat_u = 55.192524;
 select count(*) from u where xlong_u = cast(7.1578474 as float) and xlat_u
 = cast(55.192524 as float);

 Both query returned 0 records, even though there are some records matched
 the condition. What can be wrong? I am using Hive 0.14

 BR,
 Patcharee



only timestamp column value of previous row gets reset

2015-05-27 Thread Ujjwal
Hi,



I want to cross check a scenario with you and make sure its not a problem
on my end.


I am trying do to HCatalog read on an edge node and I am seeing a strange
behavior with timestamp data type. My hive version is hive 0.13.0.2



First, this is the way documentation suggests the reading to be. (
https://cwiki.apache.org/confluence/display/Hive/HCatalog+ReaderWriter)



for(InputSplit split : readCntxt.getSplits()){

HCatReader reader = DataTransferFactory.getHCatReader(split,

readerCntxt.getConf());

   IteratorHCatRecord itr = reader.read();

   while(itr.hasNext()){

  HCatRecord *read* = itr.next();

  }

}


I am storing the iterator *read* into a buffer for later use in main().
Later I access this iterator from the stored buffer and drain it by
printing out the rows in another thread, and I see the following behavior.



“The column value of data type *timestamp *of a previous row gets
reset to 1*969-12-31
19:00:00.0* when the column value in the current row has *null*. Columns of
other data types in previous row do not get affected by presence of *null*
in its current column value. Also changing the order of columns in source
data doesn’t change the behavior”




hive describe bug;

dtcol   date

tscol   timestamp

stcol   string

Time taken: 0.058 seconds, Fetched: 3 row(s)

hive select * from bug;
9779-11-21  2014-04-01 11:30:55 abc
9779-11-21  2014-04-04 11:30:55 def
NULLNULL


Read in thread - 9779-11-21 2014-04-01 11:30:55.0   abc
Read in thread - 9779-11-21 *1969-12-31 19:00:00.0*   def
Read in thread - null   null


Can this be an issue in Hive timestamp implementation ?

Regards,
Ujjwal


Re: Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS

2015-05-27 Thread Xuefu Zhang
I'm afraid you're at the wrong community. You might have a better chance to
get an answer in Spark community.

Thanks,
Xuefu

On Wed, May 27, 2015 at 5:44 PM, Sanjay Subramanian 
sanjaysubraman...@yahoo.com wrote:

 hey guys

 On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x
 , there are about 300+ hive tables.
 The data is stored an text (moving slowly to Parquet) on HDFS.
 I want to use SparkSQL and point to the Hive metadata and be able to
 define JOINS etc using a programming structure like this

 import org.apache.spark.sql.hive.HiveContext
 val sqlContext = new HiveContext(sc)
 val schemaRdd = sqlContext.sql(some complex SQL)


 Is that the way to go ? Some guidance will be great.

 thanks

 sanjay






Pointing SparkSQL to existing Hive Metadata with data file locations in HDFS

2015-05-27 Thread Sanjay Subramanian
hey guys
On the Hive/Hadoop ecosystem we have using Cloudera distribution CDH 5.2.x , 
there are about 300+ hive tables.The data is stored an text (moving slowly to 
Parquet) on HDFS.I want to use SparkSQL and point to the Hive metadata and be 
able to define JOINS etc using a programming structure like this 
import org.apache.spark.sql.hive.HiveContextval sqlContext = new 
HiveContext(sc)val schemaRdd = sqlContext.sql(some complex SQL)

Is that the way to go ? Some guidance will be great.
thanks
sanjay