Re: Hive queries returning all NULL values.
Raymond - you were the closest. Parquet field names contained '::' ex. bag1::user_name Hope it will help anyone in the future Thanks for all your help Tor On Sun, Aug 17, 2014 at 7:50 PM, Raymond Lau wrote: > Do your field names in your parquet files contain upper case letters by > any chance ex. userName? Hive will not read the data of external tables if > they are not completely lower case field names, it doesn't convert them > properly in the case of external tables. > On Aug 17, 2014 8:00 AM, "hadoop hive" wrote: > >> Take a small set of data like 2-5 line and insert it... >> >> After that you can try insert first 10 column and then next 10 till you >> fund your problematic column >> On Aug 17, 2014 8:37 PM, "Tor Ivry" wrote: >> >>> Is there any way to debug this? >>> >>> We are talking about many fields here. >>> How can I see which field has the mismatch? >>> >>> >>> >>> On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive >>> wrote: >>> Hi, You check the data type you have provided while creating external table, it should match with data in files. Thanks Vikas Srivastava On Aug 17, 2014 7:07 PM, "Tor Ivry" wrote: > Hi > > > > I have a hive (0.11) table with the following create syntax: > > > > CREATE EXTERNAL TABLE events( > > … > > ) > > PARTITIONED BY(dt string) > > ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > > STORED AS > > INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" > > OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat" > > LOCATION '/data-events/success’; > > > > Query runs fine. > > > I add hdfs partitions (containing snappy.parquet files). > > > > When I run > > hive > > > select count(*) from events where dt=“20140815” > > I get the correct result > > > > *Problem:* > > When I run > > hive > > > select * from events where dt=“20140815” limit 1; > > I get > > OK > > NULL NULL NULL NULL NULL NULL NULL 20140815 > > > > *The same query in Impala returns the correct values.* > > > > Any idea what could be the issue? > > > > Thanks > > Tor > >>>
Re: Hive queries returning all NULL values.
Do your field names in your parquet files contain upper case letters by any chance ex. userName? Hive will not read the data of external tables if they are not completely lower case field names, it doesn't convert them properly in the case of external tables. On Aug 17, 2014 8:00 AM, "hadoop hive" wrote: > Take a small set of data like 2-5 line and insert it... > > After that you can try insert first 10 column and then next 10 till you > fund your problematic column > On Aug 17, 2014 8:37 PM, "Tor Ivry" wrote: > >> Is there any way to debug this? >> >> We are talking about many fields here. >> How can I see which field has the mismatch? >> >> >> >> On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive >> wrote: >> >>> Hi, >>> >>> You check the data type you have provided while creating external table, >>> it should match with data in files. >>> >>> Thanks >>> Vikas Srivastava >>> On Aug 17, 2014 7:07 PM, "Tor Ivry" wrote: >>> Hi I have a hive (0.11) table with the following create syntax: CREATE EXTERNAL TABLE events( … ) PARTITIONED BY(dt string) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat" LOCATION '/data-events/success’; Query runs fine. I add hdfs partitions (containing snappy.parquet files). When I run hive > select count(*) from events where dt=“20140815” I get the correct result *Problem:* When I run hive > select * from events where dt=“20140815” limit 1; I get OK NULL NULL NULL NULL NULL NULL NULL 20140815 *The same query in Impala returns the correct values.* Any idea what could be the issue? Thanks Tor >>> >>
Re: Hive queries returning all NULL values.
Take a small set of data like 2-5 line and insert it... After that you can try insert first 10 column and then next 10 till you fund your problematic column On Aug 17, 2014 8:37 PM, "Tor Ivry" wrote: > Is there any way to debug this? > > We are talking about many fields here. > How can I see which field has the mismatch? > > > > On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive wrote: > >> Hi, >> >> You check the data type you have provided while creating external table, >> it should match with data in files. >> >> Thanks >> Vikas Srivastava >> On Aug 17, 2014 7:07 PM, "Tor Ivry" wrote: >> >>> Hi >>> >>> >>> >>> I have a hive (0.11) table with the following create syntax: >>> >>> >>> >>> CREATE EXTERNAL TABLE events( >>> >>> … >>> >>> ) >>> >>> PARTITIONED BY(dt string) >>> >>> ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' >>> >>> STORED AS >>> >>> INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" >>> >>> OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat" >>> >>> LOCATION '/data-events/success’; >>> >>> >>> >>> Query runs fine. >>> >>> >>> I add hdfs partitions (containing snappy.parquet files). >>> >>> >>> >>> When I run >>> >>> hive >>> >>> > select count(*) from events where dt=“20140815” >>> >>> I get the correct result >>> >>> >>> >>> *Problem:* >>> >>> When I run >>> >>> hive >>> >>> > select * from events where dt=“20140815” limit 1; >>> >>> I get >>> >>> OK >>> >>> NULL NULL NULL NULL NULL NULL NULL 20140815 >>> >>> >>> >>> *The same query in Impala returns the correct values.* >>> >>> >>> >>> Any idea what could be the issue? >>> >>> >>> >>> Thanks >>> >>> Tor >>> >> >
Re: Hive queries returning all NULL values.
Is there any way to debug this? We are talking about many fields here. How can I see which field has the mismatch? On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive wrote: > Hi, > > You check the data type you have provided while creating external table, > it should match with data in files. > > Thanks > Vikas Srivastava > On Aug 17, 2014 7:07 PM, "Tor Ivry" wrote: > >> Hi >> >> >> >> I have a hive (0.11) table with the following create syntax: >> >> >> >> CREATE EXTERNAL TABLE events( >> >> … >> >> ) >> >> PARTITIONED BY(dt string) >> >> ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' >> >> STORED AS >> >> INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" >> >> OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat" >> >> LOCATION '/data-events/success’; >> >> >> >> Query runs fine. >> >> >> I add hdfs partitions (containing snappy.parquet files). >> >> >> >> When I run >> >> hive >> >> > select count(*) from events where dt=“20140815” >> >> I get the correct result >> >> >> >> *Problem:* >> >> When I run >> >> hive >> >> > select * from events where dt=“20140815” limit 1; >> >> I get >> >> OK >> >> NULL NULL NULL NULL NULL NULL NULL 20140815 >> >> >> >> *The same query in Impala returns the correct values.* >> >> >> >> Any idea what could be the issue? >> >> >> >> Thanks >> >> Tor >> >
Re: Hive queries returning all NULL values.
Hi, You check the data type you have provided while creating external table, it should match with data in files. Thanks Vikas Srivastava On Aug 17, 2014 7:07 PM, "Tor Ivry" wrote: > Hi > > > > I have a hive (0.11) table with the following create syntax: > > > > CREATE EXTERNAL TABLE events( > > … > > ) > > PARTITIONED BY(dt string) > > ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > > STORED AS > > INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat" > > OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat" > > LOCATION '/data-events/success’; > > > > Query runs fine. > > > I add hdfs partitions (containing snappy.parquet files). > > > > When I run > > hive > > > select count(*) from events where dt=“20140815” > > I get the correct result > > > > *Problem:* > > When I run > > hive > > > select * from events where dt=“20140815” limit 1; > > I get > > OK > > NULL NULL NULL NULL NULL NULL NULL 20140815 > > > > *The same query in Impala returns the correct values.* > > > > Any idea what could be the issue? > > > > Thanks > > Tor >