Using Hive Thrift SerDe

2011-06-27 Thread Ayon Sinha
Hi,

I can't seem to make Hive find the Thrift SerDe. What do I need to do?

hive> add jar /usr/lib/hive/lib/hive-serde-0.7.0-cdh3u0.jar;                    
                                                 
Added /usr/lib/hive/lib/hive-serde-0.7.0-cdh3u0.jar to class path
Added resource: /usr/lib/hive/lib/hive-serde-0.7.0-cdh3u0.jar
hive> CREATE external TABLE scratch.sk_logs (user_id bigint, request_type int)  
                                                 
    >    ROW FORMAT SERDE 
"org.apache.hadoop.hive.serde2.thrift.ThriftByteStreamTypedSerDe" LOCATION 
'/user/ayon/hiveserdetest' ;
FAILED: Error in metadata: Cannot validate serde: 
org.apache.hadoop.hive.serde2.thrift.ThriftByteStreamTypedSerDe
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.DDLTask
hive> 


 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.


Re: wiki has moved!

2011-06-27 Thread John Sichi
On Jun 27, 2011, at 5:16 PM,  wrote:
> I don't have control over the MoinMoin server; if someone has something 
> specific they can create an INFRA request, but the page name translation is 
> not 1-to-1, so it's probably not worth the effort; the old stuff should age 
> out, and the new stuff will get crawled soon enough.


Hmm, but looks like there's a robots.txt which blocks crawlers in

https://cwiki.apache.org/confluence

Instead, from looking at other projects such as Avro, I guess the crawlers are 
supposed to hit the generated HTML under

https://cwiki.apache.org/Hive

But the HTML pages only seem to get regenerated on edit, so most of them aren't 
there post-import; let's see if a cron job kicks in.

Also, the CSS is missing border padding, so we'll need to fix that.

JVS



loading datafiles in s3

2011-06-27 Thread Kennon Lee
Hello,
We're using hive on amazon elastic mapreduce to process logs on s3, and I
had a couple basic questions. Apologies if they've been answered already-- I
gathered most info from the hive tutorial on amazon (
http://aws.amazon.com/articles/2855), as well as from skimming the hive wiki
pages, but I'm still very new to all of this. So, questions:

1) Is it possible to partition on directories that do not have the "key="
prefix? Our logs are organized like s3://bucketname/dir//MM/DD/HH/*.bz2
and so ideally we could partition on that structure instead of adding "dt="
to every directory name. I found an old thread discussing this (
http://search-hadoop.com/m/SGTqLox5Il/partition+directory/v=threaded)
but couldnt find the actual syntax.

2) How does hive handle tab-delimited files where rows sometimes have
different column counts? For instance, if we are parsing an event log that
contains multiple events, some of which have more columns associated with
them:

event_a user_id apple 300
event_b user_id cat

If i define my hive table to have 4 columns, how will hive react to the
event_b row?

Thanks!


Re: wiki has moved!

2011-06-27 Thread John Sichi
On Jun 27, 2011, at 4:37 PM, Time Less wrote:
> Might as well add me as editor. I've found tons of errors and problems. Not 
> the least of which the regexserde is now completely borked and nonsensical. 
> Compare "([^]*) ([^]*) ..." against "([^ ]*) ([^ ]*) ..." - I thought I was 
> going insane.

Email me your Confluence account name.

> Also, Google still points to the old documentation which doesn't exist. You 
> need to add in some 301 so Google will get the message, too: 
> http://en.wikipedia.org/wiki/HTTP_301. I believe Google isn't the only HTTP 
> client that will benefit from 301 status.

I don't have control over the MoinMoin server; if someone has something 
specific they can create an INFRA request, but the page name translation is not 
1-to-1, so it's probably not worth the effort; the old stuff should age out, 
and the new stuff will get crawled soon enough.

JVS



Re: Bizarro Hive (Hadoop?) Error

2011-06-27 Thread Sumanth V
You are hitting this bug - https://issues.apache.org/jira/browse/HIVE-1579
I consistently hit this bug for one of the Hive queries.


Sumanth



On Mon, Jun 27, 2011 at 5:08 PM, Time Less  wrote:

> Today I'm getting this error again. A Google search brought me back to...
> you guessed it... my own post. But this time no HDFS corruption. Bounced all
> services, namenode, jobtracker, datanodes, tasktrackers. Still same error.
> Here's what it looks like:
>
> *Fsck Output:
> *FSCK ended at Mon Jun 27 17:02:07 PDT 2011 in 818 milliseconds
> The filesystem under path '/' is HEALTHY
>
> *Hive Query Output:
> *-bash-3.2$ hive -e "select * from air_client_logs where loglevel =
> '[FATAL]'"
> Hive history file=/tmp/hdfs/hive_job_log_hdfs_201106271702_317571839.txt
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> Starting Job = job_201106271658_0002, Tracking URL =
> http://hadooptest1:50030/jobdetails.jsp?jobid=job_201106271658_0002
> Kill Command = /usr/lib/hadoop/bin/hadoop job
> -Dmapred.job.tracker=hadooptest1:54311 -kill job_201106271658_0002
> 2011-06-27 17:02:39,874 Stage-1 map = 0%,  reduce = 0%
> 2011-06-27 17:03:02,111 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201106271658_0002 with errors
>
> java.lang.RuntimeException: Error while reading from task log url
> at
> org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:889)
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:680)
> at
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
> at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:425)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: java.io.IOException: Server returned HTTP response code: 400 for
> URL:
> http://hadooptest14:50060/tasklog?taskid=attempt_201106271658_0002_m_08_0&all=true
>
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
> at java.net.URL.openStream(URL.java:1010)
> at
> org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
> ... 16 more
> Ended Job = job_201106271658_0002 with exception
> 'java.lang.RuntimeException(Error while reading from task log url)'
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> Again, trying to go to "
> http://hadooptest14:50060/tasklog?taskid=attempt_201106271658_0002_m_08_0&all=true";
> returns that argument attemptid error.
>
> What am I doing wrong here? I appear to keep doing it, whatever it is.
>
>
>
> On Fri, May 6, 2011 at 6:47 PM, Time Less  wrote:
>
>> My cluster went corrupt-mode. I wiped it and deleted the Hive metastore
>> and started over. In the process, I did a "yum upgrade" which probably took
>> me from CDH3b4 to CDH3u0. Now everytime I submit a Hive query of complexity
>> requiring a map/reduce job*, I get this error:
>>
>> 2011-05-06 18:39:14,533 Stage-1 map = 100%,  reduce = 100%
>> Ended Job = job_201104081532_0509 with errors
>> java.lang.RuntimeException: Error while reading from task log url
>> at
>> org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
>> at
>> org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:889)
>> at
>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:680)
>> at
>> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
>> .[snip].
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> at java.lang.reflect.Method.invoke(Method.java:597)
>> at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>> Caused by: java.io.IOException: Server returned HTTP response code: 400
>> for URL:
>> http://hadooptest3:50060/tasklog?taskid=attempt_201104081532_0509_m_02_2&all=true
>> at

Re: Bizarro Hive (Hadoop?) Error

2011-06-27 Thread Time Less
Today I'm getting this error again. A Google search brought me back to...
you guessed it... my own post. But this time no HDFS corruption. Bounced all
services, namenode, jobtracker, datanodes, tasktrackers. Still same error.
Here's what it looks like:

*Fsck Output:
*FSCK ended at Mon Jun 27 17:02:07 PDT 2011 in 818 milliseconds
The filesystem under path '/' is HEALTHY

*Hive Query Output:
*-bash-3.2$ hive -e "select * from air_client_logs where loglevel =
'[FATAL]'"
Hive history file=/tmp/hdfs/hive_job_log_hdfs_201106271702_317571839.txt
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201106271658_0002, Tracking URL =
http://hadooptest1:50030/jobdetails.jsp?jobid=job_201106271658_0002
Kill Command = /usr/lib/hadoop/bin/hadoop job
-Dmapred.job.tracker=hadooptest1:54311 -kill job_201106271658_0002
2011-06-27 17:02:39,874 Stage-1 map = 0%,  reduce = 0%
2011-06-27 17:03:02,111 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201106271658_0002 with errors
java.lang.RuntimeException: Error while reading from task log url
at
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:889)
at
org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:680)
at
org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:425)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for
URL:
http://hadooptest14:50060/tasklog?taskid=attempt_201106271658_0002_m_08_0&all=true
at
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
at java.net.URL.openStream(URL.java:1010)
at
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
... 16 more
Ended Job = job_201106271658_0002 with exception
'java.lang.RuntimeException(Error while reading from task log url)'
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.MapRedTask

Again, trying to go to "
http://hadooptest14:50060/tasklog?taskid=attempt_201106271658_0002_m_08_0&all=true";
returns that argument attemptid error.

What am I doing wrong here? I appear to keep doing it, whatever it is.


On Fri, May 6, 2011 at 6:47 PM, Time Less  wrote:

> My cluster went corrupt-mode. I wiped it and deleted the Hive metastore and
> started over. In the process, I did a "yum upgrade" which probably took me
> from CDH3b4 to CDH3u0. Now everytime I submit a Hive query of complexity
> requiring a map/reduce job*, I get this error:
>
> 2011-05-06 18:39:14,533 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_201104081532_0509 with errors
> java.lang.RuntimeException: Error while reading from task log url
> at
> org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.showJobFailDebugInfo(ExecDriver.java:889)
> at
> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:680)
> at
> org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:123)
> .[snip].
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: java.io.IOException: Server returned HTTP response code: 400 for
> URL:
> http://hadooptest3:50060/tasklog?taskid=attempt_201104081532_0509_m_02_2&all=true
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
> at java.net.URL.openStream(URL.java:1010)
> at
> org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
> ... 16 more
> Ended Job = job_201104081532_0509 with exception
> 'java.lang.RuntimeException(Error while reading from task

Re: wiki has moved!

2011-06-27 Thread Time Less
> We need your help (or at least tolerance) to deal with some of the
> imperfections in the migration process:
>
> https://cwiki.apache.org/confluence/display/Hive/AboutThisWiki
>
> If you already an editor on the old wiki, or if you would like to help with
> fixing/editing now, contact me for write access to the new one.  If you turn
> out to be a spammer, I will hunt you down, disembowel you, and feed your
> entrails to my dog.
>

Might as well add me as editor. I've found tons of errors and problems. Not
the least of which the regexserde is now completely borked and nonsensical.
Compare "([^]*) ([^]*) ..." against "([^ ]*) ([^ ]*) ..." - I thought I was
going insane.

Also, Google still points to the old documentation which doesn't exist. You
need to add in some 301 so Google will get the message, too:
http://en.wikipedia.org/wiki/HTTP_301. I believe Google isn't the only HTTP
client that will benefit from 301 status.

--
RiotGames::SrDataArchitect::TimEllis


Loading seq file into hive

2011-06-27 Thread Mapred Learn
Hi,
I have seq files with key as line number and value is ctrl B delimited text.
a sample value is:
45454^B567^Brtrt^B-7.8
56577^B345^Bdrtd^B-0.9

when I create a table like:
create table temp_seq (no. int, code string, rank string, amt string)
row format delimited fields terminated by '\002' lines terminated by '\n'
stored as sequencefile;

It creates the table.

When I load a file as:

load data inpath '/tmp/test' into table temp_seq;

even this succeeds.

But when I try to select *, I don't see the fields that were loaded as
delmiited text and I see it separated as some weird boundaries and some
fields in text of seq file combined in select * output and rest all fields
at the end, coming as NULL, as follows:

 45454567  rtrt-7.8 NULL NULL
 56577345 drtd-0.9 NULL NULL.

how can I get this data to correspond to the exact fields in Seq File Values
output ?

Thanks in advance,
-JJ


RESOLVED: Re: URGENT: I need the Hive Server setup Wiki

2011-06-27 Thread Ayon Sinha
Thanks everyone.
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.




From: John Sichi 
To: "" ; Ayon Sinha 

Sent: Monday, June 27, 2011 2:39 PM
Subject: Re: URGENT: I need the Hive Server setup Wiki 

It's not empty, but the links on it were broken; I just fixed them.

jVS

On Jun 27, 2011, at 2:31 PM, Ayon Sinha wrote:

> https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServer
>  is empty
> and the old link is gone.
>  
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.

Re: URGENT: I need the Hive Server setup Wiki

2011-06-27 Thread John Sichi
It's not empty, but the links on it were broken; I just fixed them.

jVS

On Jun 27, 2011, at 2:31 PM, Ayon Sinha wrote:

> https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServer
>  is empty
> and the old link is gone.
>  
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.



Re: URGENT: I need the Hive Server setup Wiki

2011-06-27 Thread Ted Yu
There're 3 links on that page.

On Mon, Jun 27, 2011 at 2:39 PM, Ted Yu  wrote:

> wiki has moved.
> See
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServer
>
>
> On Mon, Jun 27, 2011 at 2:31 PM, Ayon Sinha  wrote:
>
>>
>> https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServeris
>>  empty
>> and the old link is gone.
>>
>> -Ayon
>> See My Photos on Flickr 
>> Also check out my Blog for answers to commonly asked 
>> questions.
>>
>
>


Re: URGENT: I need the Hive Server setup Wiki

2011-06-27 Thread Ted Yu
wiki has moved.
See
https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServer

On Mon, Jun 27, 2011 at 2:31 PM, Ayon Sinha  wrote:

>
> https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServeris
>  empty
> and the old link is gone.
>
> -Ayon
> See My Photos on Flickr 
> Also check out my Blog for answers to commonly asked 
> questions.
>


URGENT: I need the Hive Server setup Wiki

2011-06-27 Thread Ayon Sinha
https://cwiki.apache.org/confluence/display/Hive/AdminManual+SettingUpHiveServer
 is empty

and the old link is gone.
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.


updated location for Apache Hive Contributor Day

2011-06-27 Thread John Sichi
The location details are on the event page:

http://hivecontribday2011.eventbrite.com

There are still plenty of seats left for anyone interested.

JVS



Re: Resend -> how to load sequence file with decimal data

2011-06-27 Thread Mapred Learn
Hi Steven,
With load data you give some info about data also. As in Tom' White's book:
create external table external_table(dummy string)
location
load data

Now dummy string is a field in this data. Similarly, what I have is a dcimal
field. How do I specify it in the create command ?



On Fri, Jun 24, 2011 at 5:12 PM, Steven Wong  wrote:

>  Not sure if this is what you’re asking for: Hive has a LOAD DATA command.
> There is no decimal data type.
>
> ** **
>
> ** **
>
> *From:* Mapred Learn [mailto:mapred.le...@gmail.com]
> *Sent:* Thursday, June 23, 2011 7:25 AM
> *To:* user@hive.apache.org; mapreduce-u...@hadoop.apache.org;
> cdh-u...@cloudera.org
> *Subject:* Resend -> how to load sequence file with decimal data 
>
> ** **
>
> ** **
>
>  Hi,
> I have a sequence file where The value is text with delimited data and some
> fields are decimal fields.
> For eg: decimal(16,6). Sample value : 123.456735.
> How do I upload such a sequence file in hive and what shud I give in table
> definition for decimal values as above ?
>
> Thanks in advance !
>
>
> Sent from my iPhone
>