Re: Need rank()

2013-04-02 Thread Keith Wiley
On Apr 2, 2013, at 12:59 , Edward Capriolo wrote:

> I feel your pain I do not have time to look at the project ATM. However 
> trying to do Java development these days without understanding git, or maven 
> is fairly impossible, most projects depend on N others that depend on M 
> others. So trying to even compile the code outside of maven requires a lot of 
> manually fetching jar etc. Your better off taking a day and getting good with 
> maven, IMHO. You will be dealing with it for the foreseeable future. 


I agree.  I'll cozy up with it first chance I get...but sometimes you're on a 
deadline and have to go for the low-hanging fruit.  I fully understand the 
motivation behind your approach.  I'll circle back around it when I get some 
free time.

Cheers!


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
   --  Galileo Galilei




Re: Need rank()

2013-04-02 Thread Edward Capriolo
git clone https://github.com/edwardcapriolo/hive_test.git
cd hive_test/
mvn -Dmaven.test.skip=true install
cd ..
[edward@jackintosh java]$ git clone
https://github.com/edwardcapriolo/hive-rank.git
Cloning into 'hive-rank'...
remote: Counting objects: 74, done.
remote: Compressing objects: 100% (35/35), done.
remote: Total 74 (delta 12), reused 70 (delta 8)
Unpacking objects: 100% (74/74), done.
[edward@jackintosh java]$ cd hive-rank/
[edward@jackintosh hive-rank]$ mvn install -Dmaven.test.skip=true
/usr/java/jdk1.7.0_13
...
[INFO] Installing
/home/edward/Documents/java/hive-rank/target/hive-rank-1.0.0-SNAPSHOT.jar
to
/home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.jar
[INFO] Installing /home/edward/Documents/java/hive-rank/pom.xml to
/home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.pom
[INFO]

[INFO] BUILD SUCCESS
[INFO]

[INFO] Total time: 2.986s
[INFO] Finished at: Tue Apr 02 16:11:41 EDT 2013
[INFO] Final Memory: 17M/210M
[INFO]


Then copy:
cp
/home/edward/Documents/java/hive-rank/target/hive-rank-1.0.0-SNAPSHOT.jar
to
/home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.jar
to your hadoop lib.

add jar 
. etc etc


On Tue, Apr 2, 2013 at 3:51 PM, Keith Wiley  wrote:

> Yep, the original article is definitely erroneous in this regard.  I
> figured out that eventually.  I'm not sure how much I can trust that
> resource now.  I may have to look elsewhere.  I agree that Edward's
> description is pretty good, but as I said earlier, I can't actually use his
> code, so I'm trying to cobble a workable solution together from the various
> resources available.  Ritesh's article, despite the error in the Hive
> syntax, is still useful in that it enables one to quickly compile a simple
> rank jar without relying on git, maven, or other project dependencies --
> problems which have plagued me with Edward's approach.  So, if I can use
> Ritesh's method to write a simple rank function, and Edward's accurate
> description of how to construct the query, then I can put all the pieces
> together into a workable solution.
>
> I'll let you know if I get it.
>
> On Apr 2, 2013, at 10:56 , Igor Tatarinov wrote:
>
> > You are getting the error because you are ORDERing BY rank but rank is
> not in the top SELECT
> >
> > Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a
> subquery:
> > SELECT ..., rank(user)
> > FROM (SELECT ... DISTRIBUTE BY ... SORT BY)
> >
> > igor
> > decide.com
> >
> >
> > On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley 
> wrote:
> > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> >
> > >
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
> >
> > Is there any possibility there is a bug in Ritesh Agrawal's query
> statement from that article?  I created a test table with the exact column
> names from the example in the article and used a minimally altered version
> of the command (I removed the where clause to simplify things a bit) and
> got an error which suggests there is something slightly wrong with the
> command (or perhaps the table has to be configured a special way).  Here's
> what I get when I almost perfectly duplicate that example:
> >
> > hive> describe test;
> > OK
> > userstring
> > categorystring
> > value   int
> > Time taken: 0.082 seconds
> > ==
> > hive> select * from test;
> > OK
> > user1   cat11
> > user1   cat12
> > user1   cat13
> > user1   cat210
> > user1   cat220
> > user1   cat230
> > user2   cat111
> > user2   cat121
> > user2   cat131
> > user2   cat25
> > user2   cat26
> > user2   cat27
> > Time taken: 0.202 seconds
> > ==
> > hive> SELECT user, category, value
> > > FROM (
> > > SELECT user, category, rank(user) as rank, value
> > > FROM test
> > > DISTRIBUTE BY user
> > > SORT BY user, value desc
> > > ) a
> > > WHERE rank > 1
> > > ORDER BY user, rank;
> > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or
> column reference 'rank': (possible column names are: _col0, _col1, _col2)
> > hive>
> >
> >
> 
> > Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
> >
> > "What I primarily learned in grad school is how much I *don't* know.
> > Consequently, I left grad school with a higher ignorance to knowledge
> ratio than
> > when I entered."
> >--  Keith Wiley
> >
> __

Re: Need rank()

2013-04-02 Thread Edward Capriolo
I feel your pain I do not have time to look at the project ATM. However
trying to do Java development these days without understanding git, or
maven is fairly impossible, most projects depend on N others that depend on
M others. So trying to even compile the code outside of maven requires a
lot of manually fetching jar etc. Your better off taking a day and getting
good with maven, IMHO. You will be dealing with it for the foreseeable
future.


Re: Need rank()

2013-04-02 Thread Keith Wiley
Yep, the original article is definitely erroneous in this regard.  I figured 
out that eventually.  I'm not sure how much I can trust that resource now.  I 
may have to look elsewhere.  I agree that Edward's description is pretty good, 
but as I said earlier, I can't actually use his code, so I'm trying to cobble a 
workable solution together from the various resources available.  Ritesh's 
article, despite the error in the Hive syntax, is still useful in that it 
enables one to quickly compile a simple rank jar without relying on git, maven, 
or other project dependencies -- problems which have plagued me with Edward's 
approach.  So, if I can use Ritesh's method to write a simple rank function, 
and Edward's accurate description of how to construct the query, then I can put 
all the pieces together into a workable solution.

I'll let you know if I get it.

On Apr 2, 2013, at 10:56 , Igor Tatarinov wrote:

> You are getting the error because you are ORDERing BY rank but rank is not in 
> the top SELECT
> 
> Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a 
> subquery:
> SELECT ..., rank(user)
> FROM (SELECT ... DISTRIBUTE BY ... SORT BY)
> 
> igor
> decide.com
> 
> 
> On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley  wrote:
> On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> 
> > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
> 
> Is there any possibility there is a bug in Ritesh Agrawal's query statement 
> from that article?  I created a test table with the exact column names from 
> the example in the article and used a minimally altered version of the 
> command (I removed the where clause to simplify things a bit) and got an 
> error which suggests there is something slightly wrong with the command (or 
> perhaps the table has to be configured a special way).  Here's what I get 
> when I almost perfectly duplicate that example:
> 
> hive> describe test;
> OK
> userstring
> categorystring
> value   int
> Time taken: 0.082 seconds
> ==
> hive> select * from test;
> OK
> user1   cat11
> user1   cat12
> user1   cat13
> user1   cat210
> user1   cat220
> user1   cat230
> user2   cat111
> user2   cat121
> user2   cat131
> user2   cat25
> user2   cat26
> user2   cat27
> Time taken: 0.202 seconds
> ==
> hive> SELECT user, category, value
> > FROM (
> > SELECT user, category, rank(user) as rank, value
> > FROM test
> > DISTRIBUTE BY user
> > SORT BY user, value desc
> > ) a
> > WHERE rank > 1
> > ORDER BY user, rank;
> FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column 
> reference 'rank': (possible column names are: _col0, _col1, _col2)
> hive>
> 
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
> 
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge ratio 
> than
> when I entered."
>--  Keith Wiley
> 
> 
> 



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Luminous beings are we, not this crude matter."
   --  Yoda




Re: Need rank()

2013-04-02 Thread Keith Wiley
I'm pretty sure there are at least a few bugs in that article.  The problem 
described below goes away if "rank" is added to the outer select statement.  If 
there's one error I suppose there may be others.  Although I have gotten the 
query to run, I'm not getting properly ranked results yet.  I'm still working 
on it.

On Apr 2, 2013, at 10:03 , Keith Wiley wrote:

> On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> 
>> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
> 
> Is there any possibility there is a bug in Ritesh Agrawal's query statement 
> from that article?  I created a test table with the exact column names from 
> the example in the article and used a minimally altered version of the 
> command (I removed the where clause to simplify things a bit) and got an 
> error which suggests there is something slightly wrong with the command (or 
> perhaps the table has to be configured a special way).  Here's what I get 
> when I almost perfectly duplicate that example:
> 
> hive> describe test;
> OK
> user  string  
> category  string  
> value int 
> Time taken: 0.082 seconds
> ==
> hive> select * from test;
> OK
> user1 cat11
> user1 cat12
> user1 cat13
> user1 cat210
> user1 cat220
> user1 cat230
> user2 cat111
> user2 cat121
> user2 cat131
> user2 cat25
> user2 cat26
> user2 cat27
> Time taken: 0.202 seconds
> ==
> hive> SELECT user, category, value
>> FROM (
>> SELECT user, category, rank(user) as rank, value
>> FROM test
>> DISTRIBUTE BY user
>> SORT BY user, value desc
>> ) a
>> WHERE rank > 1
>> ORDER BY user, rank;
> FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column 
> reference 'rank': (possible column names are: _col0, _col1, _col2)
> hive> 
> 
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
> 
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge ratio 
> than
> when I entered."
>   --  Keith Wiley
> 
> 



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
   --  Edwin A. Abbott, Flatland




Re: Need rank()

2013-04-02 Thread Keith Wiley
I suppose.  Like I said, I was just following the example on that website and 
it didn't seem to have any comments on it remarking on errors in the article.  
Thus my confusion about the whole thing.  It is presented as being accurate and 
is old enough that any errors would have been caught and labeled in the 
comments long ago.  I also thought it was a little odd to name the function the 
same thing as the alias assigned to the result.  I'll try disambiguating that.

On Apr 2, 2013, at 10:22 , Nitin Pawar wrote:

> I am not sure about this 
> but you defined a rank function and then aliased the column as rank itself. 
> Can this cause the issue? 
> 
> From the error it looks like it could not figure out which column is rank in 
> the where clause "WHERE rank > 1" 
> 
> from Edward's blog the correct query looks similar  but his function is named 
> as p_rank 
> 
> SELECT
>  category,country,product,sales,rank
> FROM (
>  SELECT
>category,country,product,sales,
>p_rank(category, country) rank
>  FROM (
>SELECT
>  category,country,product,
>  sales
>FROM p_rank_demo
>DISTRIBUTE BY
>  category,country
>SORT BY
>  category,country,sales desc) t1) t2
> WHERE rank <= 3
> 
> 
> On Tue, Apr 2, 2013 at 10:45 PM, Keith Wiley  wrote:
> I did all that, I just didn't paste it into the email.  That isn't the 
> problem.  Sorry for the confusion.
> 
> hive> add jar Rank.jar;
> hive> create temporary function rank as 'com.example.hive.udf.Rank';
> 
> BTW, small typo, the condition at the end uses less-then, not 
> greater-then...obviously...but that isn't the problem either.
> 
> On Apr 2, 2013, at 10:06 , Nitin Pawar wrote:
> 
> > I dont see you added rank in the classpath and defined rank function
> >
> > can you try add jar and define the function and try again
> >
> >
> >
> > On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley  wrote:
> > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> >
> > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
> >
> > Is there any possibility there is a bug in Ritesh Agrawal's query statement 
> > from that article?  I created a test table with the exact column names from 
> > the example in the article and used a minimally altered version of the 
> > command (I removed the where clause to simplify things a bit) and got an 
> > error which suggests there is something slightly wrong with the command (or 
> > perhaps the table has to be configured a special way).  Here's what I get 
> > when I almost perfectly duplicate that example:
> >
> > hive> describe test;
> > OK
> > userstring
> > categorystring
> > value   int
> > Time taken: 0.082 seconds
> > ==
> > hive> select * from test;
> > OK
> > user1   cat11
> > user1   cat12
> > user1   cat13
> > user1   cat210
> > user1   cat220
> > user1   cat230
> > user2   cat111
> > user2   cat121
> > user2   cat131
> > user2   cat25
> > user2   cat26
> > user2   cat27
> > Time taken: 0.202 seconds
> > ==
> > hive> SELECT user, category, value
> > > FROM (
> > > SELECT user, category, rank(user) as rank, value
> > > FROM test
> > > DISTRIBUTE BY user
> > > SORT BY user, value desc
> > > ) a
> > > WHERE rank > 1
> > > ORDER BY user, rank;
> > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column 
> > reference 'rank': (possible column names are: _col0, _col1, _col2)
> > hive>
> >
> > 
> > Keith Wiley kwi...@keithwiley.com keithwiley.com
> > music.keithwiley.com
> >
> > "What I primarily learned in grad school is how much I *don't* know.
> > Consequently, I left grad school with a higher ignorance to knowledge ratio 
> > than
> > when I entered."
> >--  Keith Wiley
> > 
> >
> >
> >
> >
> > --
> > Nitin Pawar
> 
> 
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
> 
> "It's a fine line between meticulous and obsessive-compulsive and a slippery
> rope between obsessive-compulsive and debilitatingly slow."
>--  Keith Wiley
> 
> 
> 
> 
> 
> -- 
> Nitin Pawar



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies:

Re: Need rank()

2013-04-02 Thread Igor Tatarinov
You are getting the error because you are ORDERing BY rank but rank is not
in the top SELECT

Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a
subquery:
SELECT ..., rank(user)
FROM (SELECT ... DISTRIBUTE BY ... SORT BY)

igor
decide.com


On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley  wrote:

> On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
>
> >
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
>
> Is there any possibility there is a bug in Ritesh Agrawal's query
> statement from that article?  I created a test table with the exact column
> names from the example in the article and used a minimally altered version
> of the command (I removed the where clause to simplify things a bit) and
> got an error which suggests there is something slightly wrong with the
> command (or perhaps the table has to be configured a special way).  Here's
> what I get when I almost perfectly duplicate that example:
>
> hive> describe test;
> OK
> userstring
> categorystring
> value   int
> Time taken: 0.082 seconds
> ==
> hive> select * from test;
> OK
> user1   cat11
> user1   cat12
> user1   cat13
> user1   cat210
> user1   cat220
> user1   cat230
> user2   cat111
> user2   cat121
> user2   cat131
> user2   cat25
> user2   cat26
> user2   cat27
> Time taken: 0.202 seconds
> ==
> hive> SELECT user, category, value
> > FROM (
> > SELECT user, category, rank(user) as rank, value
> > FROM test
> > DISTRIBUTE BY user
> > SORT BY user, value desc
> > ) a
> > WHERE rank > 1
> > ORDER BY user, rank;
> FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or
> column reference 'rank': (possible column names are: _col0, _col1, _col2)
> hive>
>
>
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge
> ratio than
> when I entered."
>--  Keith Wiley
>
> 
>
>


Re: Need rank()

2013-04-02 Thread Nitin Pawar
I am not sure about this
but you defined a rank function and then aliased the column as rank itself.
Can this cause the issue?

>From the error it looks like it could not figure out which column is rank
in the where clause "WHERE rank > 1"

from Edward's blog the correct query looks similar  but his function is
named as p_rank

SELECT
 category,country,product,sales,rank
FROM (
 SELECT
   category,country,product,sales,
   p_rank(category, country) rank
 FROM (
   SELECT
 category,country,product,
 sales
   FROM p_rank_demo
   DISTRIBUTE BY
 category,country
   SORT BY
 category,country,sales desc) t1) t2
WHERE rank <= 3


On Tue, Apr 2, 2013 at 10:45 PM, Keith Wiley  wrote:

> I did all that, I just didn't paste it into the email.  That isn't the
> problem.  Sorry for the confusion.
>
> hive> add jar Rank.jar;
> hive> create temporary function rank as 'com.example.hive.udf.Rank';
>
> BTW, small typo, the condition at the end uses less-then, not
> greater-then...obviously...but that isn't the problem either.
>
> On Apr 2, 2013, at 10:06 , Nitin Pawar wrote:
>
> > I dont see you added rank in the classpath and defined rank function
> >
> > can you try add jar and define the function and try again
> >
> >
> >
> > On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley 
> wrote:
> > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> >
> > >
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
> >
> > Is there any possibility there is a bug in Ritesh Agrawal's query
> statement from that article?  I created a test table with the exact column
> names from the example in the article and used a minimally altered version
> of the command (I removed the where clause to simplify things a bit) and
> got an error which suggests there is something slightly wrong with the
> command (or perhaps the table has to be configured a special way).  Here's
> what I get when I almost perfectly duplicate that example:
> >
> > hive> describe test;
> > OK
> > userstring
> > categorystring
> > value   int
> > Time taken: 0.082 seconds
> > ==
> > hive> select * from test;
> > OK
> > user1   cat11
> > user1   cat12
> > user1   cat13
> > user1   cat210
> > user1   cat220
> > user1   cat230
> > user2   cat111
> > user2   cat121
> > user2   cat131
> > user2   cat25
> > user2   cat26
> > user2   cat27
> > Time taken: 0.202 seconds
> > ==
> > hive> SELECT user, category, value
> > > FROM (
> > > SELECT user, category, rank(user) as rank, value
> > > FROM test
> > > DISTRIBUTE BY user
> > > SORT BY user, value desc
> > > ) a
> > > WHERE rank > 1
> > > ORDER BY user, rank;
> > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or
> column reference 'rank': (possible column names are: _col0, _col1, _col2)
> > hive>
> >
> >
> 
> > Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
> >
> > "What I primarily learned in grad school is how much I *don't* know.
> > Consequently, I left grad school with a higher ignorance to knowledge
> ratio than
> > when I entered."
> >--  Keith Wiley
> >
> 
> >
> >
> >
> >
> > --
> > Nitin Pawar
>
>
>
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "It's a fine line between meticulous and obsessive-compulsive and a
> slippery
> rope between obsessive-compulsive and debilitatingly slow."
>--  Keith Wiley
>
> 
>
>


-- 
Nitin Pawar


Re: Need rank()

2013-04-02 Thread Keith Wiley
I did all that, I just didn't paste it into the email.  That isn't the problem. 
 Sorry for the confusion.

hive> add jar Rank.jar;
hive> create temporary function rank as 'com.example.hive.udf.Rank';

BTW, small typo, the condition at the end uses less-then, not 
greater-then...obviously...but that isn't the problem either.

On Apr 2, 2013, at 10:06 , Nitin Pawar wrote:

> I dont see you added rank in the classpath and defined rank function 
> 
> can you try add jar and define the function and try again 
> 
> 
> 
> On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley  wrote:
> On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
> 
> > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
> 
> Is there any possibility there is a bug in Ritesh Agrawal's query statement 
> from that article?  I created a test table with the exact column names from 
> the example in the article and used a minimally altered version of the 
> command (I removed the where clause to simplify things a bit) and got an 
> error which suggests there is something slightly wrong with the command (or 
> perhaps the table has to be configured a special way).  Here's what I get 
> when I almost perfectly duplicate that example:
> 
> hive> describe test;
> OK
> userstring
> categorystring
> value   int
> Time taken: 0.082 seconds
> ==
> hive> select * from test;
> OK
> user1   cat11
> user1   cat12
> user1   cat13
> user1   cat210
> user1   cat220
> user1   cat230
> user2   cat111
> user2   cat121
> user2   cat131
> user2   cat25
> user2   cat26
> user2   cat27
> Time taken: 0.202 seconds
> ==
> hive> SELECT user, category, value
> > FROM (
> > SELECT user, category, rank(user) as rank, value
> > FROM test
> > DISTRIBUTE BY user
> > SORT BY user, value desc
> > ) a
> > WHERE rank > 1
> > ORDER BY user, rank;
> FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column 
> reference 'rank': (possible column names are: _col0, _col1, _col2)
> hive>
> 
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
> 
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge ratio 
> than
> when I entered."
>--  Keith Wiley
> 
> 
> 
> 
> 
> -- 
> Nitin Pawar



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
   --  Keith Wiley




Re: Need rank()

2013-04-02 Thread Nitin Pawar
I dont see you added rank in the classpath and defined rank function

can you try add jar and define the function and try again



On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley  wrote:

> On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:
>
> >
> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/
>
> Is there any possibility there is a bug in Ritesh Agrawal's query
> statement from that article?  I created a test table with the exact column
> names from the example in the article and used a minimally altered version
> of the command (I removed the where clause to simplify things a bit) and
> got an error which suggests there is something slightly wrong with the
> command (or perhaps the table has to be configured a special way).  Here's
> what I get when I almost perfectly duplicate that example:
>
> hive> describe test;
> OK
> userstring
> categorystring
> value   int
> Time taken: 0.082 seconds
> ==
> hive> select * from test;
> OK
> user1   cat11
> user1   cat12
> user1   cat13
> user1   cat210
> user1   cat220
> user1   cat230
> user2   cat111
> user2   cat121
> user2   cat131
> user2   cat25
> user2   cat26
> user2   cat27
> Time taken: 0.202 seconds
> ==
> hive> SELECT user, category, value
> > FROM (
> > SELECT user, category, rank(user) as rank, value
> > FROM test
> > DISTRIBUTE BY user
> > SORT BY user, value desc
> > ) a
> > WHERE rank > 1
> > ORDER BY user, rank;
> FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or
> column reference 'rank': (possible column names are: _col0, _col1, _col2)
> hive>
>
>
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "What I primarily learned in grad school is how much I *don't* know.
> Consequently, I left grad school with a higher ignorance to knowledge
> ratio than
> when I entered."
>--  Keith Wiley
>
> 
>
>


-- 
Nitin Pawar


Re: Need rank()

2013-04-02 Thread Keith Wiley
On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:

> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/

Is there any possibility there is a bug in Ritesh Agrawal's query statement 
from that article?  I created a test table with the exact column names from the 
example in the article and used a minimally altered version of the command (I 
removed the where clause to simplify things a bit) and got an error which 
suggests there is something slightly wrong with the command (or perhaps the 
table has to be configured a special way).  Here's what I get when I almost 
perfectly duplicate that example:

hive> describe test;
OK
userstring  
categorystring  
value   int 
Time taken: 0.082 seconds
==
hive> select * from test;
OK
user1   cat11
user1   cat12
user1   cat13
user1   cat210
user1   cat220
user1   cat230
user2   cat111
user2   cat121
user2   cat131
user2   cat25
user2   cat26
user2   cat27
Time taken: 0.202 seconds
==
hive> SELECT user, category, value
> FROM (
> SELECT user, category, rank(user) as rank, value
> FROM test
> DISTRIBUTE BY user
> SORT BY user, value desc
> ) a
> WHERE rank > 1
> ORDER BY user, rank;
FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column 
reference 'rank': (possible column names are: _col0, _col1, _col2)
hive> 


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
   --  Keith Wiley




Re: Need rank(), can't build m6d's version

2013-04-02 Thread Keith Wiley
Thank you Edward.  I appreciate you trying to help out.  I'll see if I can get 
that to work.

On Apr 1, 2013, at 17:09 , Edward Capriolo wrote:

> approx steps
> git clone hive-test 
> mvn install
> git clone hive-rang
> mvn install
> 
> or just
> git clone hive-test
> mvn -Dmaven.test.skip=true install


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
   --  Mark Twain




Re: Need rank(), can't build m6d's version

2013-04-01 Thread Edward Capriolo
approx steps
git clone hive-test
mvn install
git clone hive-rang
mvn install

or just
git clone hive-test
mvn -Dmaven.test.skip=true install



On Mon, Apr 1, 2013 at 6:45 PM, Keith Wiley  wrote:

> I need rank() in Hive.  I have't had much luck with Edward Capriolo's on
> git and it comes with no documentation.  It depends on hive-test (also by
> Edward) and I can't get maven to build the tests for hive-test which
> prevents me from building the hive-test package (the jar)...which is all
> pretty tangential to the real goal, which is rank of course.
>
> Is there any place to download the hive-rank jar pre-built instead of
> having to build it out of git?  If not, has anyone built it recently, or
> namely has anyone built the hive-test project it depends on?  The following
> is just one example of the sort of error I see.  Remember this error is on
> hive-test, which frankly I don't even care about.  :-)
>
> Feeling a little desperate...thanks for any help.
>
>
> ---
> Test set: com.jointhegrid.hive_test.EmbeddedHiveExampleTest
>
> ---
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.878 sec
> <<< FAILURE!
> testA(com.jointhegrid.hive_test.EmbeddedHiveExampleTest)  Time elapsed:
> 7.823 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<9>
> at org.junit.Assert.fail(Assert.java:91)
> at org.junit.Assert.failNotEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:126)
> at org.junit.Assert.assertEquals(Assert.java:470)
> at org.junit.Assert.assertEquals(Assert.java:454)
> at
> com.jointhegrid.hive_test.EmbeddedHiveExampleTest.testA(EmbeddedHiveExampleTest.java:51)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:243)
> at junit.framework.TestSuite.run(TestSuite.java:238)
> at
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> at
> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
> at $Proxy0.invoke(Unknown Source)
> at
> org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150)
> at
> org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91)
> at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
>
>
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "I do not feel obliged to believe that the same God who has endowed us with
> sense, reason, and intellect has intended us to forgo their use."
>--  Galileo Galilei
>
> 
>
>


Re: Need rank(), can't build m6d's version

2013-04-01 Thread Keith Wiley
Thanks!  That looks like it has serious potential.  I'll study up on it.

On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote:

> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
   --  Homer Simpson




Re: Need rank(), can't build m6d's version

2013-04-01 Thread Alexander Pivovarov
http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/


On Mon, Apr 1, 2013 at 3:45 PM, Keith Wiley  wrote:

> I need rank() in Hive.  I have't had much luck with Edward Capriolo's on
> git and it comes with no documentation.  It depends on hive-test (also by
> Edward) and I can't get maven to build the tests for hive-test which
> prevents me from building the hive-test package (the jar)...which is all
> pretty tangential to the real goal, which is rank of course.
>
> Is there any place to download the hive-rank jar pre-built instead of
> having to build it out of git?  If not, has anyone built it recently, or
> namely has anyone built the hive-test project it depends on?  The following
> is just one example of the sort of error I see.  Remember this error is on
> hive-test, which frankly I don't even care about.  :-)
>
> Feeling a little desperate...thanks for any help.
>
>
> ---
> Test set: com.jointhegrid.hive_test.EmbeddedHiveExampleTest
>
> ---
> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.878 sec
> <<< FAILURE!
> testA(com.jointhegrid.hive_test.EmbeddedHiveExampleTest)  Time elapsed:
> 7.823 sec  <<< FAILURE!
> java.lang.AssertionError: expected:<0> but was:<9>
> at org.junit.Assert.fail(Assert.java:91)
> at org.junit.Assert.failNotEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:126)
> at org.junit.Assert.assertEquals(Assert.java:470)
> at org.junit.Assert.assertEquals(Assert.java:454)
> at
> com.jointhegrid.hive_test.EmbeddedHiveExampleTest.testA(EmbeddedHiveExampleTest.java:51)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:243)
> at junit.framework.TestSuite.run(TestSuite.java:238)
> at
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> at
> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
> at $Proxy0.invoke(Unknown Source)
> at
> org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150)
> at
> org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91)
> at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)
>
>
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "I do not feel obliged to believe that the same God who has endowed us with
> sense, reason, and intellect has intended us to forgo their use."
>--  Galileo Galilei
>
> 
>
>


Need rank(), can't build m6d's version

2013-04-01 Thread Keith Wiley
I need rank() in Hive.  I have't had much luck with Edward Capriolo's on git 
and it comes with no documentation.  It depends on hive-test (also by Edward) 
and I can't get maven to build the tests for hive-test which prevents me from 
building the hive-test package (the jar)...which is all pretty tangential to 
the real goal, which is rank of course.

Is there any place to download the hive-rank jar pre-built instead of having to 
build it out of git?  If not, has anyone built it recently, or namely has 
anyone built the hive-test project it depends on?  The following is just one 
example of the sort of error I see.  Remember this error is on hive-test, which 
frankly I don't even care about.  :-)

Feeling a little desperate...thanks for any help.

---
Test set: com.jointhegrid.hive_test.EmbeddedHiveExampleTest
---
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.878 sec <<< 
FAILURE!
testA(com.jointhegrid.hive_test.EmbeddedHiveExampleTest)  Time elapsed: 7.823 
sec  <<< FAILURE!
java.lang.AssertionError: expected:<0> but was:<9>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at org.junit.Assert.assertEquals(Assert.java:454)
at 
com.jointhegrid.hive_test.EmbeddedHiveExampleTest.testA(EmbeddedHiveExampleTest.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103)
at $Proxy0.invoke(Unknown Source)
at 
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150)
at 
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69)


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
   --  Galileo Galilei