Re: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver

2010-10-13 Thread Narendra
The error happened again! The logs said hadoop was unable to create temp
files required for mapreds. We freed up some space in the home directory and
it started working again.


On Wed, Oct 13, 2010 at 7:43 AM, Adarsh Sharma wrote:

> Narendra wrote:
>
>> Even I am facing the same issue. In fact, select * works but if you give a
>> where clause, it cribs.
>>  Regards,
>> Narendra
>>
>>  On Mon, Oct 11, 2010 at 1:40 PM, Guru Prasad 
>> > guru.pra...@ibibogroup.com>> wrote:
>>
>>Hi all,
>>When i am running "select count(1) from searchlogs", I am getting
>>following errors.
>>While "select * from searchlogs" is working properly.
>>
>>
>>  
>> --
>>2010-10-11 03:52:54,371 Stage-1 map = 0%,  reduce = 0%
>>2010-10-11 03:53:00,467 Stage-1 map = 4%,  reduce = 0%
>>2010-10-11 03:53:02,542 Stage-1 map = 5%,  reduce = 0%
>>2010-10-11 03:53:03,566 Stage-1 map = 9%,  reduce = 0%
>>2010-10-11 03:53:05,640 Stage-1 map = 13%,  reduce = 0%
>>2010-10-11 03:53:06,644 Stage-1 map = 16%,  reduce = 0%
>>2010-10-11 03:53:08,671 Stage-1 map = 20%,  reduce = 0%
>>2010-10-11 03:53:09,720 Stage-1 map = 24%,  reduce = 0%
>>2010-10-11 03:53:11,824 Stage-1 map = 29%,  reduce = 0%
>>2010-10-11 03:53:14,883 Stage-1 map = 36%,  reduce = 0%
>>2010-10-11 03:53:18,000 Stage-1 map = 44%,  reduce = 0%
>>2010-10-11 03:53:21,037 Stage-1 map = 51%,  reduce = 0%
>>2010-10-11 03:53:24,085 Stage-1 map = 56%,  reduce = 0%
>>2010-10-11 03:53:27,203 Stage-1 map = 62%,  reduce = 0%
>>2010-10-11 03:53:30,294 Stage-1 map = 69%,  reduce = 0%
>>2010-10-11 03:53:33,519 Stage-1 map = 75%,  reduce = 0%
>>2010-10-11 03:53:36,599 Stage-1 map = 82%,  reduce = 0%
>>2010-10-11 03:53:39,866 Stage-1 map = 89%,  reduce = 0%
>>2010-10-11 03:53:41,931 Stage-1 map = 91%,  reduce = 0%
>>2010-10-11 03:53:42,942 Stage-1 map = 95%,  reduce = 0%
>>2010-10-11 03:53:45,031 Stage-1 map = 98%,  reduce = 0%
>>2010-10-11 03:53:46,042 Stage-1 map = 100%,  reduce = 0%
>>2010-10-11 03:53:52,083 Stage-1 map = 100%,  reduce = 1%
>>*2010-10-11 03:54:21,328 Stage-1 map = 100%,  reduce = 0%*
>>2010-10-11 03:54:33,424 Stage-1 map = 100%,  reduce = 2%
>>2010-10-11 03:55:12,690 Stage-1 map = 100%,  reduce = 3%
>>*2010-10-11 03:55:42,905 Stage-1 map = 100%,  reduce = 0%*
>>2010-10-11 03:55:55,010 Stage-1 map = 100%,  reduce = 4%
>>2010-10-11 03:56:10,106 Stage-1 map = 100%,  reduce = 5%
>>2010-10-11 03:56:49,384 Stage-1 map = 100%,  reduce = 7%
>>*2010-10-11 03:57:04,478 Stage-1 map = 100%,  reduce = 0%*
>>2010-10-11 03:57:16,558 Stage-1 map = 100%,  reduce = 7%
>>2010-10-11 03:57:31,679 Stage-1 map = 100%,  reduce = 8%
>>2010-10-11 03:58:08,901 Stage-1 map = 100%,  reduce = 9%
>>*2010-10-11 03:58:27,026 Stage-1 map = 100%,  reduce = 0%*
>>2010-10-11 03:58:33,067 Stage-1 map = 100%,  reduce = 100%
>>Ended Job = job_201009280549_0050 with errors
>>
>>Failed tasks with most(4) failures :
>>Task URL:
>>
>> http://xx.xx.xx.xxx:50030/taskdetails.jsp?jobid=job_201009280549_0050&tipid=task_201009280549_0050_r_00
>><
>> http://xx.xx.xx.xxx:50030/taskdetails.jsp?jobid=job_201009280549_0050&tipid=task_201009280549_0050_r_00
>> >
>>
>>FAILED: Execution Error, return code 2 from
>>org.apache.hadoop.hive.ql.exec.ExecDriver
>>
>>  
>> ---
>>
>>Please help me out.
>>
>>Thanks & Regards
>>Guru Prasad
>>~guru
>>
>>This message is intended only for the use of the addressee and may
>> contain information that is privileged, confidential and exempt from
>> disclosure under applicable law. If the reader of this message is not the
>> intended recipient, or the employee or agent responsible for delivering
>> the message to the intended recipient, you are hereby notified that any
>> dissemination, distribution or copying of this communication is strictly
>> prohibited. If you have received this e-mail in error, please notify us
>> immediately by return e-mail and delete this e-mail and all attachments from
>> your system.
>>
>>
>> For resolving this issue
> Check this Url in Jobtracker Web Interface.
>
> Failed tasks with most(4) failures :
> Task URL:
> http://xx.xx.xx.xxx:50030/taskdetails.jsp?jobid=job_201009280549_0050&tipid=task_201009280549_0050_r_00<
> http://xx.xx.xx.xxx:50030/taskdetails.jsp?jobid=job_201009280549_0050&tipid=task_201009280549_0050_r_00
> >
>
> There is a log for this Job ID where detailed information of error is given
> .
>
> Regards
> Adarsh Sharma
>
>


RE: Exception in hive startup

2010-10-13 Thread Steven Wong
Before mucking with the code, it should be worth a try to clarify the wiki 
first. It's easy to miss the point that  is build/dist for most 
people; I guess people tend to mistake  for . I would 
replace all references to  by build/dist and just mention once 
that, for advanced users, you can change it to something else via blah blah 
config.


-Original Message-
From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Wednesday, October 13, 2010 9:59 PM
To: user@hive.apache.org
Subject: Re: Exception in hive startup

On Thu, Oct 14, 2010 at 12:49 AM, Ted Yu  wrote:
> This should be documented in README.txt
>
> On Wed, Oct 13, 2010 at 6:14 PM, Steven Wong  wrote:
>>
>> You need to run hive_root/build/dist/bin/hive, not hive_root/bin/hive.
>>
>>
>>
>>
>>
>> From: hdev ml [mailto:hde...@gmail.com]
>> Sent: Wednesday, October 13, 2010 2:18 PM
>> To: hive-u...@hadoop.apache.org
>> Subject: Exception in hive startup
>>
>>
>>
>> Hi all,
>>
>> I installed Hadoop 0.20.2 and installed hive 0.5.0.
>>
>> I followed all the instructions on Hive's getting started page for setting
>> up environment variables like HADOOP_HOME
>>
>> When I run from command prompt in the hive installation folder as
>> "bin/hive" it gives me following exception
>>
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/hadoop/hive/conf/HiveConf
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:247)
>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.conf.HiveConf
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>     ... 3 more
>>
>> Please note that my Hadoop installation is working fine.
>>
>> What could be the cause of this? Anybody has any idea?
>>
>> Thanks
>> Harshad
>

It actually is.

Downloading and building


- svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk hive_trunk
- cd hive_trunk
- hive_trunk> ant package -Dtarget.dir= -Dhadoop.version=0.17.2.1

If you do not specify a value for target.dir it will use the default
value build/dist. Then you can run the unit tests if you want to make
sure it is correctly built:

...

Running Hive


Hive uses hadoop that means:
- you must have hadoop in your path OR
- export HADOOP=/bin/hadoop

To use hive command line interface (cli) from the shell:
$ cd 
$ bin/hive

but for whatever reason people seem to been drawn to the bin/hive
right off the trunk. I think if we camouflaged that directory somehow
we might help stop people from wandering onto it.



Re: Exception in hive startup

2010-10-13 Thread Edward Capriolo
On Thu, Oct 14, 2010 at 12:49 AM, Ted Yu  wrote:
> This should be documented in README.txt
>
> On Wed, Oct 13, 2010 at 6:14 PM, Steven Wong  wrote:
>>
>> You need to run hive_root/build/dist/bin/hive, not hive_root/bin/hive.
>>
>>
>>
>>
>>
>> From: hdev ml [mailto:hde...@gmail.com]
>> Sent: Wednesday, October 13, 2010 2:18 PM
>> To: hive-u...@hadoop.apache.org
>> Subject: Exception in hive startup
>>
>>
>>
>> Hi all,
>>
>> I installed Hadoop 0.20.2 and installed hive 0.5.0.
>>
>> I followed all the instructions on Hive's getting started page for setting
>> up environment variables like HADOOP_HOME
>>
>> When I run from command prompt in the hive installation folder as
>> "bin/hive" it gives me following exception
>>
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/hadoop/hive/conf/HiveConf
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:247)
>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.hadoop.hive.conf.HiveConf
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>>     ... 3 more
>>
>> Please note that my Hadoop installation is working fine.
>>
>> What could be the cause of this? Anybody has any idea?
>>
>> Thanks
>> Harshad
>

It actually is.

Downloading and building


- svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk hive_trunk
- cd hive_trunk
- hive_trunk> ant package -Dtarget.dir= -Dhadoop.version=0.17.2.1

If you do not specify a value for target.dir it will use the default
value build/dist. Then you can run the unit tests if you want to make
sure it is correctly built:

...

Running Hive


Hive uses hadoop that means:
- you must have hadoop in your path OR
- export HADOOP=/bin/hadoop

To use hive command line interface (cli) from the shell:
$ cd 
$ bin/hive

but for whatever reason people seem to been drawn to the bin/hive
right off the trunk. I think if we camouflaged that directory somehow
we might help stop people from wandering onto it.


Re: Exception in hive startup

2010-10-13 Thread Ted Yu
This should be documented in README.txt

On Wed, Oct 13, 2010 at 6:14 PM, Steven Wong  wrote:

>  You need to run hive_root/build/dist/bin/hive, not hive_root/bin/hive.
>
>
>
>
>
> *From:* hdev ml [mailto:hde...@gmail.com]
> *Sent:* Wednesday, October 13, 2010 2:18 PM
> *To:* hive-u...@hadoop.apache.org
> *Subject:* Exception in hive startup
>
>
>
> Hi all,
>
> I installed Hadoop 0.20.2 and installed hive 0.5.0.
>
> I followed all the instructions on Hive's getting started page for setting
> up environment variables like HADOOP_HOME
>
> When I run from command prompt in the hive installation folder as
> "bin/hive" it gives me following exception
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hadoop/hive/conf/HiveConf
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.conf.HiveConf
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
> ... 3 more
>
> Please note that my Hadoop installation is working fine.
>
> What could be the cause of this? Anybody has any idea?
>
> Thanks
> Harshad
>


Re: Exception in hive startup

2010-10-13 Thread Edward Capriolo
We really need to hide /bin/hive somewhere that people can not
find it so easily.

Edward

On Wed, Oct 13, 2010 at 9:14 PM, Steven Wong  wrote:
> You need to run hive_root/build/dist/bin/hive, not hive_root/bin/hive.
>
>
>
>
>
> From: hdev ml [mailto:hde...@gmail.com]
> Sent: Wednesday, October 13, 2010 2:18 PM
> To: hive-u...@hadoop.apache.org
> Subject: Exception in hive startup
>
>
>
> Hi all,
>
> I installed Hadoop 0.20.2 and installed hive 0.5.0.
>
> I followed all the instructions on Hive's getting started page for setting
> up environment variables like HADOOP_HOME
>
> When I run from command prompt in the hive installation folder as "bin/hive"
> it gives me following exception
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hadoop/hive/conf/HiveConf
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:247)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.conf.HiveConf
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>     at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
>     ... 3 more
>
> Please note that my Hadoop installation is working fine.
>
> What could be the cause of this? Anybody has any idea?
>
> Thanks
> Harshad


RE: Exception in hive startup

2010-10-13 Thread Steven Wong
You need to run hive_root/build/dist/bin/hive, not hive_root/bin/hive.


From: hdev ml [mailto:hde...@gmail.com]
Sent: Wednesday, October 13, 2010 2:18 PM
To: hive-u...@hadoop.apache.org
Subject: Exception in hive startup

Hi all,

I installed Hadoop 0.20.2 and installed hive 0.5.0.

I followed all the instructions on Hive's getting started page for setting up 
environment variables like HADOOP_HOME

When I run from command prompt in the hive installation folder as "bin/hive" it 
gives me following exception

Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/hadoop/hive/conf/HiveConf
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
... 3 more

Please note that my Hadoop installation is working fine.

What could be the cause of this? Anybody has any idea?

Thanks
Harshad


Re: HBase as input AND output?

2010-10-13 Thread John Sichi
If your query only accesses HBase tables, then yes, Hive does not access any 
source data directly from HDFS (although of course it may put intermediate 
results in HDFS, e.g. for the result of a join).

However, if your query does something like join a HBase table with a native 
Hive table, then it will read data from both HBase and HDFS.

Likewise, on the write side, it depends whether your INSERT targets an HBase 
table vs a native Hive table.

The read and write sides are independent.

JVS

On Oct 13, 2010, at 2:24 PM, Otis Gospodnetic wrote:

> Thanks Tim.
> (and sorry for the duplicate email - need to fix my Hive email filter)
> 
> 
> Just to clarify one bit, though.
> When using Hive without HBase one has data stored in the appropriate 
> directories 
> on HDFS and runs MR jobs against those data.
> 
> But, when using Hive *with* HBase, does Hive require any such data to be 
> present 
> in the HDFS?
> In other words, when using Hive with HBase, one really uses only Hive's 
> ability 
> to translate a Hive QL statement to a set of MR jobs (and read from/write to 
> HBase) and execute them against only data stored in HBase.  Is this correct?
> 
> Thanks,
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
> 
> - Original Message 
>> From: Tim Robertson 
>> To: user@hive.apache.org
>> Sent: Wed, October 13, 2010 4:45:31 PM
>> Subject: Re: HBase as input AND output?
>> 
>> That's right.  Hive can use an HBase table as an input format to  the
>> hive query regardless of output format, and can also write the  output
>> to an HBase table regardless of the input format.  You can  also
>> supposedly do a join in Hive that uses 1 side of the join from  an
>> HBase table, and the other side a text file, which is very powerful.
>> I  haven't done it myself, but intend to shortly.
>> 
>> HTH,
>> Tim
>> 
>> 
>> On  Wed, Oct 13, 2010 at 10:07 PM, Otis Gospodnetic
>>   wrote:
>>> Hi,
>>> 
>>> I was wondering how I can query data stored  in HBase and remembered Hive's 
>> HBase
>>> integration:
>>> http://wiki.apache.org/hadoop/Hive/HBaseIntegration
>>> 
>>> After  watching John Sichi's video
>>> 
>> (http://developer.yahoo.com/blogs/hadoop/posts/2010/04/hundreds_of_hadoop_fans_at_the/
>> 
>>>  ) I have a better idea about what functionality this integration provides, 
>>>  
>> but
>>> I still have some questions.
>>> 
>>> Would it be correct to  say that Hive-HBase integration makes the following 
>> data
>>> flow  possible:
>>> 
>>> 0) Hive or Files => Custom HQL statement that  aggregates data  ==> HBase
>>> 1) HBase ==> Custom HQL statement that  aggregates data  ==> HBase
>>> 2) HBase ==> Custom HQL statement that  aggregates data  ==> output 
>> (console?)
>>> 
>>> Of the above, 1) is  what I'm wondering the most about right now.
>>> 
>>> In other words, it  seems to me that Hive may be able to look at *just* data
>>> stored in HBase  *without* the typical data/files in HDFS that Hive 
>>> normally 
>> runs
>>> its MR  jobs against.
>>> 
>>> Is this correct?
>>> 
>>> Thanks,
>>> Otis
>>> 
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Hadoop  ecosystem search :: http://search-hadoop.com/
>>> 
>>> 
>> 



RE: USING .. AS column names

2010-10-13 Thread Paul Yang
For insert overwrite, the column names don't matter - the order of the columns 
dictate how they are inserted into the table so the behavior is not specific to 
the transform clause.

Also, when you use AS with transform, you're just assigning column aliases to 
the output of the transform. For example,

from t select transform('foo', 'bar', 'baz') USING '/bin/cat' AS (b, a, c) 
limit 1

will assign the alias b to the first column, a to the second column, etc. Then 
you can do something like this to select the contents of foo:

select b from (from t select transform('foo', 'bar', 'baz') USING '/bin/cat' AS 
(b, a, c) limit 1) subq;

From: Dave Brondsema [mailto:dbronds...@geek.net]
Sent: Wednesday, October 13, 2010 3:01 PM
To: hive-u...@hadoop.apache.org
Subject: USING .. AS column names

What are the "AS" columns used for in TRANSFORM USING?  All I can find is 
http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform#Schema-less_Map-reduce_Scripts
 but that only mentions what happens when it isn't there.  It seems like it 
doesn't really matter what the column names are.  And, more than once I've used 
table column names, but in a different order than the table structure, but they 
went into the table in the table-structure order, not the AS column order.  
That confused me a lot.

In this example, table "t" is used just to get some literals selected:

create table test2(a string, b string, c string)

# AS column names in a different order
from t insert overwrite table test2 select transform('foo', 'bar', 'baz') USING 
'/bin/cat' AS (b, a, c) limit 1
# I expect 'bar', 'foo', 'baz'
select * from test2
> ['foo', 'bar', 'baz']
# but I get this, and am confused
# (even more so when type conversion happens with non-string columns)

# AS column names don't matter at all actually
from t insert overwrite table test2 select transform('foo', 'bar', 'baz') USING 
'/bin/cat' AS (x, y, z) limit 1
select * from test2
> ['foo', 'bar', 'baz']


I'd recommend that Hive either support column reordering with the AS statement, 
or make it completely optional (although this may be backwards-incompatible 
with the docs at the link above).

--
Dave Brondsema
Software Engineer
Geeknet

www.geek.net


USING .. AS column names

2010-10-13 Thread Dave Brondsema
What are the "AS" columns used for in TRANSFORM USING?  All I can find is
http://wiki.apache.org/hadoop/Hive/LanguageManual/Transform#Schema-less_Map-reduce_Scriptsbut
that only mentions what happens when it isn't there.  It seems like it
doesn't really matter what the column names are.  And, more than once I've
used table column names, but in a different order than the table structure,
but they went into the table in the table-structure order, not the AS column
order.  That confused me a lot.

In this example, table "t" is used just to get some literals selected:

create table test2(a string, b string, c string)

# AS column names in a different order
from t insert overwrite table test2 select transform('foo', 'bar', 'baz')
USING '/bin/cat' AS (b, a, c) limit 1
# I expect 'bar', 'foo', 'baz'
select * from test2
> ['foo', 'bar', 'baz']
# but I get this, and am confused
# (even more so when type conversion happens with non-string columns)

# AS column names don't matter at all actually
from t insert overwrite table test2 select transform('foo', 'bar', 'baz')
USING '/bin/cat' AS (x, y, z) limit 1
select * from test2
> ['foo', 'bar', 'baz']


I'd recommend that Hive either support column reordering with the AS
statement, or make it completely optional (although this may be
backwards-incompatible with the docs at the link above).

-- 
Dave Brondsema
Software Engineer
Geeknet

www.geek.net


Re: Exception in hive startup

2010-10-13 Thread Ted Yu
Make sure hive-exec-0.7.0.jar is under lib/ dir.

Found: HiveConf
Class: org.apache.hadoop.hive.conf.HiveConf
Package: org.apache.hadoop.hive.conf
Library Name: hive-exec-0.7.0.jar
Library Path: /Users/tyu/hive/lib/hive-exec-0.7.0.jar

On Wed, Oct 13, 2010 at 2:17 PM, hdev ml  wrote:

> Hi all,
>
> I installed Hadoop 0.20.2 and installed hive 0.5.0.
>
> I followed all the instructions on Hive's getting started page for setting
> up environment variables like HADOOP_HOME
>
> When I run from command prompt in the hive installation folder as
> "bin/hive" it gives me following exception
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/hadoop/hive/conf/HiveConf
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.hadoop.hive.conf.HiveConf
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
> ... 3 more
>
> Please note that my Hadoop installation is working fine.
>
> What could be the cause of this? Anybody has any idea?
>
> Thanks
> Harshad
>


Re: HBase as input AND output?

2010-10-13 Thread Otis Gospodnetic
Thanks Tim.
(and sorry for the duplicate email - need to fix my Hive email filter)


Just to clarify one bit, though.
When using Hive without HBase one has data stored in the appropriate 
directories 
on HDFS and runs MR jobs against those data.

But, when using Hive *with* HBase, does Hive require any such data to be 
present 
in the HDFS?
In other words, when using Hive with HBase, one really uses only Hive's ability 
to translate a Hive QL statement to a set of MR jobs (and read from/write to 
HBase) and execute them against only data stored in HBase.  Is this correct?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Tim Robertson 
> To: user@hive.apache.org
> Sent: Wed, October 13, 2010 4:45:31 PM
> Subject: Re: HBase as input AND output?
> 
> That's right.  Hive can use an HBase table as an input format to  the
> hive query regardless of output format, and can also write the  output
> to an HBase table regardless of the input format.  You can  also
> supposedly do a join in Hive that uses 1 side of the join from  an
> HBase table, and the other side a text file, which is very powerful.
> I  haven't done it myself, but intend to shortly.
> 
> HTH,
> Tim
> 
> 
> On  Wed, Oct 13, 2010 at 10:07 PM, Otis Gospodnetic
>   wrote:
> > Hi,
> >
> > I was wondering how I can query data stored  in HBase and remembered Hive's 
>HBase
> > integration:
> > http://wiki.apache.org/hadoop/Hive/HBaseIntegration
> >
> > After  watching John Sichi's video
> > 
>(http://developer.yahoo.com/blogs/hadoop/posts/2010/04/hundreds_of_hadoop_fans_at_the/
>
> >   ) I have a better idea about what functionality this integration 
> > provides,  
>but
> > I still have some questions.
> >
> > Would it be correct to  say that Hive-HBase integration makes the following 
>data
> > flow  possible:
> >
> > 0) Hive or Files => Custom HQL statement that  aggregates data  ==> HBase
> > 1) HBase ==> Custom HQL statement that  aggregates data  ==> HBase
> > 2) HBase ==> Custom HQL statement that  aggregates data  ==> output 
>(console?)
> >
> > Of the above, 1) is  what I'm wondering the most about right now.
> >
> > In other words, it  seems to me that Hive may be able to look at *just* data
> > stored in HBase  *without* the typical data/files in HDFS that Hive 
> > normally 
>runs
> > its MR  jobs against.
> >
> > Is this correct?
> >
> > Thanks,
> >  Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Hadoop  ecosystem search :: http://search-hadoop.com/
> >
> >
> 


Exception in hive startup

2010-10-13 Thread hdev ml
Hi all,

I installed Hadoop 0.20.2 and installed hive 0.5.0.

I followed all the instructions on Hive's getting started page for setting
up environment variables like HADOOP_HOME

When I run from command prompt in the hive installation folder as "bin/hive"
it gives me following exception

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/hive/conf/HiveConf
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
... 3 more

Please note that my Hadoop installation is working fine.

What could be the cause of this? Anybody has any idea?

Thanks
Harshad


HBase as input AND output

2010-10-13 Thread Otis Gospodnetic
Hi,

I was wondering how I can query data stored in HBase and remembered Hive's 
HBase 

integration:
http://wiki.apache.org/hadoop/Hive/HBaseIntegration

After watching John Sichi's video 
(http://developer.yahoo.com/blogs/hadoop/posts/2010/04/hundreds_of_hadoop_fans_at_the/

) I have a better idea about what functionality this integration provides, but 
I still have some questions.

Would it be correct to say that Hive-HBase integration makes the following data 
flow possible:

0) Hive or Files => Custom HQL statement that aggregates data  ==> HBase
1) HBase ==> Custom HQL statement that aggregates data  ==> HBase
2) HBase ==> Custom HQL statement that aggregates data  ==> output (console?)

Of the above, 1) is what I'm wondering the most about right now.

In other words, it seems to me that Hive may be able to look at *just* data 
stored in HBase *without* the typical data/files in HDFS that Hive normally 
runs 

its MR jobs against.

Is this correct?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



Re: HBase as input AND output?

2010-10-13 Thread Tim Robertson
That's right.  Hive can use an HBase table as an input format to the
hive query regardless of output format, and can also write the output
to an HBase table regardless of the input format.  You can also
supposedly do a join in Hive that uses 1 side of the join from an
HBase table, and the other side a text file, which is very powerful.
I haven't done it myself, but intend to shortly.

HTH,
Tim


On Wed, Oct 13, 2010 at 10:07 PM, Otis Gospodnetic
 wrote:
> Hi,
>
> I was wondering how I can query data stored in HBase and remembered Hive's 
> HBase
> integration:
> http://wiki.apache.org/hadoop/Hive/HBaseIntegration
>
> After watching John Sichi's video
> (http://developer.yahoo.com/blogs/hadoop/posts/2010/04/hundreds_of_hadoop_fans_at_the/
>  ) I have a better idea about what functionality this integration provides, 
> but
> I still have some questions.
>
> Would it be correct to say that Hive-HBase integration makes the following 
> data
> flow possible:
>
> 0) Hive or Files => Custom HQL statement that aggregates data  ==> HBase
> 1) HBase ==> Custom HQL statement that aggregates data  ==> HBase
> 2) HBase ==> Custom HQL statement that aggregates data  ==> output (console?)
>
> Of the above, 1) is what I'm wondering the most about right now.
>
> In other words, it seems to me that Hive may be able to look at *just* data
> stored in HBase *without* the typical data/files in HDFS that Hive normally 
> runs
> its MR jobs against.
>
> Is this correct?
>
> Thanks,
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>


HBase as input AND output?

2010-10-13 Thread Otis Gospodnetic
Hi,

I was wondering how I can query data stored in HBase and remembered Hive's 
HBase 
integration:
http://wiki.apache.org/hadoop/Hive/HBaseIntegration

After watching John Sichi's video 
(http://developer.yahoo.com/blogs/hadoop/posts/2010/04/hundreds_of_hadoop_fans_at_the/
 ) I have a better idea about what functionality this integration provides, but 
I still have some questions.

Would it be correct to say that Hive-HBase integration makes the following data 
flow possible:

0) Hive or Files => Custom HQL statement that aggregates data  ==> HBase
1) HBase ==> Custom HQL statement that aggregates data  ==> HBase
2) HBase ==> Custom HQL statement that aggregates data  ==> output (console?)

Of the above, 1) is what I'm wondering the most about right now.

In other words, it seems to me that Hive may be able to look at *just* data 
stored in HBase *without* the typical data/files in HDFS that Hive normally 
runs 
its MR jobs against.

Is this correct?

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



Re: Hive 0.6.0 Release

2010-10-13 Thread David Lary
Thanks

On Oct 13, 2010, at 2:52 PM, Otis Gospodnetic wrote:

> David,
> 
> Looks like the release was cut either last Friday or will be cut this coming 
> Friday:
> 
>  http://search-hadoop.com/?q=Hive+0.6.0
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
>> From: David Lary 
>> To: hive-u...@hadoop.apache.org; hive-...@hadoop.apache.org
>> Cc: David Lary 
>> Sent: Tue, October 12, 2010 9:46:34 AM
>> Subject: Hive 0.6.0 Release
>> 
>> Was Hive 0.6.0 Released? If so where is it available for download  please?
>> 
>> Thanks
>> 
>> David
>> 



Re: Hive 0.6.0 Release

2010-10-13 Thread Otis Gospodnetic
David,

Looks like the release was cut either last Friday or will be cut this coming 
Friday:

  http://search-hadoop.com/?q=Hive+0.6.0

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: David Lary 
> To: hive-u...@hadoop.apache.org; hive-...@hadoop.apache.org
> Cc: David Lary 
> Sent: Tue, October 12, 2010 9:46:34 AM
> Subject: Hive 0.6.0 Release
> 
> Was Hive 0.6.0 Released? If so where is it available for download  please?
> 
> Thanks
> 
> David
> 


Re: boolean types thru a transform script

2010-10-13 Thread Dave Brondsema
Transform scripts only output text, so Hive has to convert from string to
the column's data type (boolean in this case).  So if you send an empty
string "", that will be converted to boolean FALSE.

FYI, on the way in to a transform script, booleans come through as strings
"true" and "false".

On Tue, Oct 12, 2010 at 12:17 PM, Luke Crouch  wrote:

> I'm trying to pass a FALSE value thru a custom transform script to another
> table, like so:
>
> FROM (
> FROM downloads
> SELECT project, file, os, FALSE as folder, country, dt
> WHERE dt='2010-05-14'
> DISTRIBUTE BY project
> SORT BY project asc, file asc
> ) b
> INSERT OVERWRITE TABLE dl_day PARTITION (dt='2010-05-14', project)
> SELECT TRANSFORM(file, os, country, folder, dt, project) USING
> 'transformwrap reduce.py  --verbose' AS (file, downloads, os, folder,
> country, project)
>
> > describe dl_day
> ['file', 'string', '']
> ['downloads', 'int', '']
> ['os', 'string', '']
> ['country', 'string', '']
> ['folder', 'boolean', '']
> ['dt', 'string', '']
> ['project', 'string', '']
>
> When I log the 'folder' value from inside reduce.py, it shows:
>
> 2010-10-12 15:32:10,914 - dstat - INFO - reduce to stdout, h[folder]:
>
> i.e., an empty string. But when the INSERT executes, it seems to treat the
> value as TRUE (or string 'true')?
>
> > select folder from dl_day
> ['true']
> ['true']
> ['true']
> ['true']
> ...
>
> How can I preserve the FALSE value thru the transform script?
>
> Thanks,
> -L
>



-- 
Dave Brondsema
Software Engineer
Geeknet

www.geek.net