CDH4.5 HiveServer2 InterruptedException

2014-08-17 Thread Ji ZHANG
Hi,

I'm using CDH4.5 and its built-in HiveServer2. Sometimes it throws the
following exception, and the job cannot be submitted:

2014-08-18 09:16:33,346 INFO
org.apache.hadoop.hive.ql.exec.ExecDriver: Making Temp Directory:
hdfs://nameservice1/tmp/hive-hive-hadoop/hive_2014-08-18_09-16-32_093_3323860800312087449-967/-ext-10001
2014-08-18 09:16:33,350 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send params to server
java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1279)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:924)
at org.apache.hadoop.ipc.Client.call(Client.java:1211)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy14.mkdirs(Unknown Source)

I googled around and this bug comes up:

https://issues.apache.org/jira/browse/HADOOP-6762

Is it related? Or there's something else I can do to prevent this?

Thanks.


why webchat_server listen to port 8080

2014-08-17 Thread no...@sina.cn

hi everyone:
I install Hive 0.13 and I find configuration in 
HIVE_HOME/hcatalog/etc/webhcat/webhcat-default.xml:    
templeton.port    50111
but when I start webchat_server, it listened to port 8080. 


no...@sina.cn
 



why webchat_server listen to port 8080

2014-08-17 Thread no...@sina.cn






hi everyone:
I install Hive 0.13 and I find configuration in 
HIVE_HOME/hcatalog/etc/webhcat/webhcat-default.xml:    
templeton.port    50111
but when I start webchat_server, it listened to port 8080. 


no...@sina.cn
 



Re: New lines causing new rows

2014-08-17 Thread Andre Araujo
Hi, Charles,

What's the storage format for the raw data source?
What's the definition of your view?


On 18 August 2014 04:20, Charles Robertson 
wrote:

> HI all,
>
> I am loading some data into a Hive table, and one of the fields contains
> text which I believe contains new line characters. I have a view which
> reads data from this table, and the new line characters appear to be
> starting new rows
>
> Doing 'select * from [mytable] limit 10;' in the hive console returns ten
> rows, on more than ten lines. Doing 'select * from [view] limit 10' in the
> console return ten lines but fewer than ten rows.
>
> I've tried using the 'translate' function in the view definition to
> replace \r with a space character, but that seems to have just broken
> everything (it complains of a missing EOF).
>
> Can anyone suggest a better way to remove the line breaks and/or prevent
> the view treating them as new rows?
>
> Thanks,
> Charles
>



-- 
André Araújo
Big Data Consultant/Solutions Architect
The Pythian Group - Australia - www.pythian.com

Office (calls from within Australia): 1300 366 021 x1270
Office (international): +61 2 8016 7000  x270 *OR* +1 613 565 8696   x1270
Mobile: +61 410 323 559
Fax: +61 2 9805 0544
IM: pythianaraujo @ AIM/MSN/Y! or ara...@pythian.com @ GTalk

“Success is not about standing at the top, it's the steps you leave behind.”
— Iker Pou (rock climber)

-- 


--





New lines causing new rows

2014-08-17 Thread Charles Robertson
HI all,

I am loading some data into a Hive table, and one of the fields contains
text which I believe contains new line characters. I have a view which
reads data from this table, and the new line characters appear to be
starting new rows

Doing 'select * from [mytable] limit 10;' in the hive console returns ten
rows, on more than ten lines. Doing 'select * from [view] limit 10' in the
console return ten lines but fewer than ten rows.

I've tried using the 'translate' function in the view definition to replace
\r with a space character, but that seems to have just broken everything
(it complains of a missing EOF).

Can anyone suggest a better way to remove the line breaks and/or prevent
the view treating them as new rows?

Thanks,
Charles


Re: SerDe errors

2014-08-17 Thread Charles Robertson
Hi Roberto,

This got solved with the help from another user - the e-mails don't seem to
have made it to the user list. There was a problem with the json serde
which means it didn't seem to like deserialising an object nested inside
the main object. Changing to the Amazon serde fixed it.

Thanks,
Charles


On 14 August 2014 17:49, Roberto Congiu  wrote:

> Can you provide the CREATE statement used to create the table and a sample
> of the json that's causing the error ?
> It sounds like you have a field declared as bigint on the schema, but it's
> actually an object.
>
>
> On Wed, Aug 13, 2014 at 5:05 AM, Charles Robertson <
> charles.robert...@gmail.com> wrote:
>
>> Hi all,
>>
>> I have a Hive table which relies on a JSON SerDe to read the underlying
>> files. When I ran the create script I specified the SerDe and it all went
>> fine and the data was visible in the views above the table. When I tried to
>> query the table directly, though, I received a ClassNotFound error. I
>> solved this by putting the SerDe JAR in /usr/lib/hive/lib.
>>
>> Now, however, when I try to query the data I get:
>>
>> Failed with exception
>> java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException:
>> java.lang.ClassCastException: org.json.JSONObject cannot be cast to
>> [Ljava.lang.Object;
>>
>> (The serde is the json serde provided by Apache)
>>
>> Can anyone suggest why it was working before, but no longer is?
>>
>> Thanks,
>> Charles
>>
>
>
>
> --
> --
> Good judgement comes with experience.
> Experience comes with bad judgement.
> --
> Roberto Congiu - Data Engineer - OpenX
> tel: +1 626 466 1141
>


Re: Hive queries returning all NULL values.

2014-08-17 Thread Raymond Lau
Do your field names in your parquet files contain upper case letters by any
chance ex. userName?  Hive will not read the data of external tables if
they are not completely lower case field names, it doesn't convert them
properly in the case of external tables.
On Aug 17, 2014 8:00 AM, "hadoop hive"  wrote:

> Take a small set of data like 2-5 line and insert it...
>
> After that you can try insert first 10 column and then next 10 till you
> fund your problematic column
> On Aug 17, 2014 8:37 PM, "Tor Ivry"  wrote:
>
>> Is there any way to debug this?
>>
>> We are talking about many fields here.
>> How can I see which field has the mismatch?
>>
>>
>>
>> On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive 
>> wrote:
>>
>>> Hi,
>>>
>>> You check the data type you have provided while creating external table,
>>> it should match with data in files.
>>>
>>> Thanks
>>> Vikas Srivastava
>>> On Aug 17, 2014 7:07 PM, "Tor Ivry"  wrote:
>>>
  Hi



 I have a hive (0.11) table with the following create syntax:



 CREATE EXTERNAL TABLE events(

 …

 )

 PARTITIONED BY(dt string)

   ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'

   STORED AS

 INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"

 OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"

 LOCATION '/data-events/success’;



 Query runs fine.


 I add hdfs partitions (containing snappy.parquet files).



 When I run

 hive

 > select count(*) from events where dt=“20140815”

 I get the correct result



 *Problem:*

 When I run

 hive

 > select * from events where dt=“20140815” limit 1;

 I get

 OK

 NULL NULL NULL NULL NULL NULL NULL 20140815



 *The same query in Impala returns the correct values.*



 Any idea what could be the issue?



 Thanks

 Tor

>>>
>>


Re: Hive queries returning all NULL values.

2014-08-17 Thread hadoop hive
Take a small set of data like 2-5 line and insert it...

After that you can try insert first 10 column and then next 10 till you
fund your problematic column
On Aug 17, 2014 8:37 PM, "Tor Ivry"  wrote:

> Is there any way to debug this?
>
> We are talking about many fields here.
> How can I see which field has the mismatch?
>
>
>
> On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive  wrote:
>
>> Hi,
>>
>> You check the data type you have provided while creating external table,
>> it should match with data in files.
>>
>> Thanks
>> Vikas Srivastava
>> On Aug 17, 2014 7:07 PM, "Tor Ivry"  wrote:
>>
>>>  Hi
>>>
>>>
>>>
>>> I have a hive (0.11) table with the following create syntax:
>>>
>>>
>>>
>>> CREATE EXTERNAL TABLE events(
>>>
>>> …
>>>
>>> )
>>>
>>> PARTITIONED BY(dt string)
>>>
>>>   ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>>>
>>>   STORED AS
>>>
>>> INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
>>>
>>> OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
>>>
>>> LOCATION '/data-events/success’;
>>>
>>>
>>>
>>> Query runs fine.
>>>
>>>
>>> I add hdfs partitions (containing snappy.parquet files).
>>>
>>>
>>>
>>> When I run
>>>
>>> hive
>>>
>>> > select count(*) from events where dt=“20140815”
>>>
>>> I get the correct result
>>>
>>>
>>>
>>> *Problem:*
>>>
>>> When I run
>>>
>>> hive
>>>
>>> > select * from events where dt=“20140815” limit 1;
>>>
>>> I get
>>>
>>> OK
>>>
>>> NULL NULL NULL NULL NULL NULL NULL 20140815
>>>
>>>
>>>
>>> *The same query in Impala returns the correct values.*
>>>
>>>
>>>
>>> Any idea what could be the issue?
>>>
>>>
>>>
>>> Thanks
>>>
>>> Tor
>>>
>>
>


Re: Hive queries returning all NULL values.

2014-08-17 Thread Tor Ivry
Is there any way to debug this?

We are talking about many fields here.
How can I see which field has the mismatch?



On Sun, Aug 17, 2014 at 4:30 PM, hadoop hive  wrote:

> Hi,
>
> You check the data type you have provided while creating external table,
> it should match with data in files.
>
> Thanks
> Vikas Srivastava
> On Aug 17, 2014 7:07 PM, "Tor Ivry"  wrote:
>
>>  Hi
>>
>>
>>
>> I have a hive (0.11) table with the following create syntax:
>>
>>
>>
>> CREATE EXTERNAL TABLE events(
>>
>> …
>>
>> )
>>
>> PARTITIONED BY(dt string)
>>
>>   ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>>
>>   STORED AS
>>
>> INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
>>
>> OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
>>
>> LOCATION '/data-events/success’;
>>
>>
>>
>> Query runs fine.
>>
>>
>> I add hdfs partitions (containing snappy.parquet files).
>>
>>
>>
>> When I run
>>
>> hive
>>
>> > select count(*) from events where dt=“20140815”
>>
>> I get the correct result
>>
>>
>>
>> *Problem:*
>>
>> When I run
>>
>> hive
>>
>> > select * from events where dt=“20140815” limit 1;
>>
>> I get
>>
>> OK
>>
>> NULL NULL NULL NULL NULL NULL NULL 20140815
>>
>>
>>
>> *The same query in Impala returns the correct values.*
>>
>>
>>
>> Any idea what could be the issue?
>>
>>
>>
>> Thanks
>>
>> Tor
>>
>


Re: Hive queries returning all NULL values.

2014-08-17 Thread hadoop hive
Hi,

You check the data type you have provided while creating external table, it
should match with data in files.

Thanks
Vikas Srivastava
On Aug 17, 2014 7:07 PM, "Tor Ivry"  wrote:

> Hi
>
>
>
> I have a hive (0.11) table with the following create syntax:
>
>
>
> CREATE EXTERNAL TABLE events(
>
> …
>
> )
>
> PARTITIONED BY(dt string)
>
>   ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
>
>   STORED AS
>
> INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
>
> OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
>
> LOCATION '/data-events/success’;
>
>
>
> Query runs fine.
>
>
> I add hdfs partitions (containing snappy.parquet files).
>
>
>
> When I run
>
> hive
>
> > select count(*) from events where dt=“20140815”
>
> I get the correct result
>
>
>
> *Problem:*
>
> When I run
>
> hive
>
> > select * from events where dt=“20140815” limit 1;
>
> I get
>
> OK
>
> NULL NULL NULL NULL NULL NULL NULL 20140815
>
>
>
> *The same query in Impala returns the correct values.*
>
>
>
> Any idea what could be the issue?
>
>
>
> Thanks
>
> Tor
>


Hive queries returning all NULL values.

2014-08-17 Thread Tor Ivry
Hi



I have a hive (0.11) table with the following create syntax:



CREATE EXTERNAL TABLE events(

…

)

PARTITIONED BY(dt string)

  ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'

  STORED AS

INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"

OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"

LOCATION '/data-events/success’;



Query runs fine.


I add hdfs partitions (containing snappy.parquet files).



When I run

hive

> select count(*) from events where dt=“20140815”

I get the correct result



*Problem:*

When I run

hive

> select * from events where dt=“20140815” limit 1;

I get

OK

NULL NULL NULL NULL NULL NULL NULL 20140815



*The same query in Impala returns the correct values.*



Any idea what could be the issue?



Thanks

Tor


Hive queries returning all NULL values.

2014-08-17 Thread Tor Ivry
Hi



I have a hive (0.11) table with the following create syntax:



CREATE EXTERNAL TABLE events(

…

)

PARTITIONED BY(dt string)

  ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'

  STORED AS

INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"

OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"

LOCATION '/data-events/success’;



Query runs fine.


I add hdfs partitions (containing snappy.parquet files).



When I run

hive

> select count(*) from events where dt=“20140815”

I get the correct result



*Problem:*

When I run

hive

> select * from events where dt=“20140815” limit 1;

I get

OK

NULL NULL NULL NULL NULL NULL NULL 20140815



*The same query in Impala returns the correct values.*



Any idea what could be the issue?



Thanks

Tor


Hive queries returning all NULL values.

2014-08-17 Thread Tor Ivry
Hi



I have a hive (0.11) table with the following create syntax:



CREATE EXTERNAL TABLE events(

…

)

PARTITIONED BY(dt string)

  ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'

  STORED AS

INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"

OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"

LOCATION '/data-events/success’;



Query runs fine.


I add hdfs partitions (containing snappy.parquet files).



When I run

hive

> select count(*) from events where dt=“20140815”

I get the correct result



*Problem:*

When I run

hive

> select * from events where dt=“20140815” limit 1;

I get

OK

NULL NULL NULL NULL NULL NULL NULL 20140815



*The same query in Impala returns the correct values.*



Any idea what could be the issue?



Thanks

Tor