Re: Need rank()
On Apr 2, 2013, at 12:59 , Edward Capriolo wrote: > I feel your pain I do not have time to look at the project ATM. However > trying to do Java development these days without understanding git, or maven > is fairly impossible, most projects depend on N others that depend on M > others. So trying to even compile the code outside of maven requires a lot of > manually fetching jar etc. Your better off taking a day and getting good with > maven, IMHO. You will be dealing with it for the foreseeable future. I agree. I'll cozy up with it first chance I get...but sometimes you're on a deadline and have to go for the low-hanging fruit. I fully understand the motivation behind your approach. I'll circle back around it when I get some free time. Cheers! Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "I do not feel obliged to believe that the same God who has endowed us with sense, reason, and intellect has intended us to forgo their use." -- Galileo Galilei
Re: Need rank()
git clone https://github.com/edwardcapriolo/hive_test.git cd hive_test/ mvn -Dmaven.test.skip=true install cd .. [edward@jackintosh java]$ git clone https://github.com/edwardcapriolo/hive-rank.git Cloning into 'hive-rank'... remote: Counting objects: 74, done. remote: Compressing objects: 100% (35/35), done. remote: Total 74 (delta 12), reused 70 (delta 8) Unpacking objects: 100% (74/74), done. [edward@jackintosh java]$ cd hive-rank/ [edward@jackintosh hive-rank]$ mvn install -Dmaven.test.skip=true /usr/java/jdk1.7.0_13 ... [INFO] Installing /home/edward/Documents/java/hive-rank/target/hive-rank-1.0.0-SNAPSHOT.jar to /home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.jar [INFO] Installing /home/edward/Documents/java/hive-rank/pom.xml to /home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.pom [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 2.986s [INFO] Finished at: Tue Apr 02 16:11:41 EDT 2013 [INFO] Final Memory: 17M/210M [INFO] Then copy: cp /home/edward/Documents/java/hive-rank/target/hive-rank-1.0.0-SNAPSHOT.jar to /home/edward/.m2/repository/com/m6d/hive-rank/1.0.0-SNAPSHOT/hive-rank-1.0.0-SNAPSHOT.jar to your hadoop lib. add jar . etc etc On Tue, Apr 2, 2013 at 3:51 PM, Keith Wiley wrote: > Yep, the original article is definitely erroneous in this regard. I > figured out that eventually. I'm not sure how much I can trust that > resource now. I may have to look elsewhere. I agree that Edward's > description is pretty good, but as I said earlier, I can't actually use his > code, so I'm trying to cobble a workable solution together from the various > resources available. Ritesh's article, despite the error in the Hive > syntax, is still useful in that it enables one to quickly compile a simple > rank jar without relying on git, maven, or other project dependencies -- > problems which have plagued me with Edward's approach. So, if I can use > Ritesh's method to write a simple rank function, and Edward's accurate > description of how to construct the query, then I can put all the pieces > together into a workable solution. > > I'll let you know if I get it. > > On Apr 2, 2013, at 10:56 , Igor Tatarinov wrote: > > > You are getting the error because you are ORDERing BY rank but rank is > not in the top SELECT > > > > Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a > subquery: > > SELECT ..., rank(user) > > FROM (SELECT ... DISTRIBUTE BY ... SORT BY) > > > > igor > > decide.com > > > > > > On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley > wrote: > > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > > > > > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ > > > > Is there any possibility there is a bug in Ritesh Agrawal's query > statement from that article? I created a test table with the exact column > names from the example in the article and used a minimally altered version > of the command (I removed the where clause to simplify things a bit) and > got an error which suggests there is something slightly wrong with the > command (or perhaps the table has to be configured a special way). Here's > what I get when I almost perfectly duplicate that example: > > > > hive> describe test; > > OK > > userstring > > categorystring > > value int > > Time taken: 0.082 seconds > > == > > hive> select * from test; > > OK > > user1 cat11 > > user1 cat12 > > user1 cat13 > > user1 cat210 > > user1 cat220 > > user1 cat230 > > user2 cat111 > > user2 cat121 > > user2 cat131 > > user2 cat25 > > user2 cat26 > > user2 cat27 > > Time taken: 0.202 seconds > > == > > hive> SELECT user, category, value > > > FROM ( > > > SELECT user, category, rank(user) as rank, value > > > FROM test > > > DISTRIBUTE BY user > > > SORT BY user, value desc > > > ) a > > > WHERE rank > 1 > > > ORDER BY user, rank; > > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or > column reference 'rank': (possible column names are: _col0, _col1, _col2) > > hive> > > > > > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > > > "What I primarily learned in grad school is how much I *don't* know. > > Consequently, I left grad school with a higher ignorance to knowledge > ratio than > > when I entered." > >-- Keith Wiley > > > __
Re: Need rank()
I feel your pain I do not have time to look at the project ATM. However trying to do Java development these days without understanding git, or maven is fairly impossible, most projects depend on N others that depend on M others. So trying to even compile the code outside of maven requires a lot of manually fetching jar etc. Your better off taking a day and getting good with maven, IMHO. You will be dealing with it for the foreseeable future.
Re: Need rank()
Yep, the original article is definitely erroneous in this regard. I figured out that eventually. I'm not sure how much I can trust that resource now. I may have to look elsewhere. I agree that Edward's description is pretty good, but as I said earlier, I can't actually use his code, so I'm trying to cobble a workable solution together from the various resources available. Ritesh's article, despite the error in the Hive syntax, is still useful in that it enables one to quickly compile a simple rank jar without relying on git, maven, or other project dependencies -- problems which have plagued me with Edward's approach. So, if I can use Ritesh's method to write a simple rank function, and Edward's accurate description of how to construct the query, then I can put all the pieces together into a workable solution. I'll let you know if I get it. On Apr 2, 2013, at 10:56 , Igor Tatarinov wrote: > You are getting the error because you are ORDERing BY rank but rank is not in > the top SELECT > > Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a > subquery: > SELECT ..., rank(user) > FROM (SELECT ... DISTRIBUTE BY ... SORT BY) > > igor > decide.com > > > On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley wrote: > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ > > Is there any possibility there is a bug in Ritesh Agrawal's query statement > from that article? I created a test table with the exact column names from > the example in the article and used a minimally altered version of the > command (I removed the where clause to simplify things a bit) and got an > error which suggests there is something slightly wrong with the command (or > perhaps the table has to be configured a special way). Here's what I get > when I almost perfectly duplicate that example: > > hive> describe test; > OK > userstring > categorystring > value int > Time taken: 0.082 seconds > == > hive> select * from test; > OK > user1 cat11 > user1 cat12 > user1 cat13 > user1 cat210 > user1 cat220 > user1 cat230 > user2 cat111 > user2 cat121 > user2 cat131 > user2 cat25 > user2 cat26 > user2 cat27 > Time taken: 0.202 seconds > == > hive> SELECT user, category, value > > FROM ( > > SELECT user, category, rank(user) as rank, value > > FROM test > > DISTRIBUTE BY user > > SORT BY user, value desc > > ) a > > WHERE rank > 1 > > ORDER BY user, rank; > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column > reference 'rank': (possible column names are: _col0, _col1, _col2) > hive> > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "What I primarily learned in grad school is how much I *don't* know. > Consequently, I left grad school with a higher ignorance to knowledge ratio > than > when I entered." >-- Keith Wiley > > > Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "Luminous beings are we, not this crude matter." -- Yoda
Re: Need rank()
I'm pretty sure there are at least a few bugs in that article. The problem described below goes away if "rank" is added to the outer select statement. If there's one error I suppose there may be others. Although I have gotten the query to run, I'm not getting properly ranked results yet. I'm still working on it. On Apr 2, 2013, at 10:03 , Keith Wiley wrote: > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > >> http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ > > Is there any possibility there is a bug in Ritesh Agrawal's query statement > from that article? I created a test table with the exact column names from > the example in the article and used a minimally altered version of the > command (I removed the where clause to simplify things a bit) and got an > error which suggests there is something slightly wrong with the command (or > perhaps the table has to be configured a special way). Here's what I get > when I almost perfectly duplicate that example: > > hive> describe test; > OK > user string > category string > value int > Time taken: 0.082 seconds > == > hive> select * from test; > OK > user1 cat11 > user1 cat12 > user1 cat13 > user1 cat210 > user1 cat220 > user1 cat230 > user2 cat111 > user2 cat121 > user2 cat131 > user2 cat25 > user2 cat26 > user2 cat27 > Time taken: 0.202 seconds > == > hive> SELECT user, category, value >> FROM ( >> SELECT user, category, rank(user) as rank, value >> FROM test >> DISTRIBUTE BY user >> SORT BY user, value desc >> ) a >> WHERE rank > 1 >> ORDER BY user, rank; > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column > reference 'rank': (possible column names are: _col0, _col1, _col2) > hive> > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "What I primarily learned in grad school is how much I *don't* know. > Consequently, I left grad school with a higher ignorance to knowledge ratio > than > when I entered." > -- Keith Wiley > > Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "Yet mark his perfect self-contentment, and hence learn his lesson, that to be self-contented is to be vile and ignorant, and that to aspire is better than to be blindly and impotently happy." -- Edwin A. Abbott, Flatland
Re: Need rank()
I suppose. Like I said, I was just following the example on that website and it didn't seem to have any comments on it remarking on errors in the article. Thus my confusion about the whole thing. It is presented as being accurate and is old enough that any errors would have been caught and labeled in the comments long ago. I also thought it was a little odd to name the function the same thing as the alias assigned to the result. I'll try disambiguating that. On Apr 2, 2013, at 10:22 , Nitin Pawar wrote: > I am not sure about this > but you defined a rank function and then aliased the column as rank itself. > Can this cause the issue? > > From the error it looks like it could not figure out which column is rank in > the where clause "WHERE rank > 1" > > from Edward's blog the correct query looks similar but his function is named > as p_rank > > SELECT > category,country,product,sales,rank > FROM ( > SELECT >category,country,product,sales, >p_rank(category, country) rank > FROM ( >SELECT > category,country,product, > sales >FROM p_rank_demo >DISTRIBUTE BY > category,country >SORT BY > category,country,sales desc) t1) t2 > WHERE rank <= 3 > > > On Tue, Apr 2, 2013 at 10:45 PM, Keith Wiley wrote: > I did all that, I just didn't paste it into the email. That isn't the > problem. Sorry for the confusion. > > hive> add jar Rank.jar; > hive> create temporary function rank as 'com.example.hive.udf.Rank'; > > BTW, small typo, the condition at the end uses less-then, not > greater-then...obviously...but that isn't the problem either. > > On Apr 2, 2013, at 10:06 , Nitin Pawar wrote: > > > I dont see you added rank in the classpath and defined rank function > > > > can you try add jar and define the function and try again > > > > > > > > On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley wrote: > > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > > > > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ > > > > Is there any possibility there is a bug in Ritesh Agrawal's query statement > > from that article? I created a test table with the exact column names from > > the example in the article and used a minimally altered version of the > > command (I removed the where clause to simplify things a bit) and got an > > error which suggests there is something slightly wrong with the command (or > > perhaps the table has to be configured a special way). Here's what I get > > when I almost perfectly duplicate that example: > > > > hive> describe test; > > OK > > userstring > > categorystring > > value int > > Time taken: 0.082 seconds > > == > > hive> select * from test; > > OK > > user1 cat11 > > user1 cat12 > > user1 cat13 > > user1 cat210 > > user1 cat220 > > user1 cat230 > > user2 cat111 > > user2 cat121 > > user2 cat131 > > user2 cat25 > > user2 cat26 > > user2 cat27 > > Time taken: 0.202 seconds > > == > > hive> SELECT user, category, value > > > FROM ( > > > SELECT user, category, rank(user) as rank, value > > > FROM test > > > DISTRIBUTE BY user > > > SORT BY user, value desc > > > ) a > > > WHERE rank > 1 > > > ORDER BY user, rank; > > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column > > reference 'rank': (possible column names are: _col0, _col1, _col2) > > hive> > > > > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > > music.keithwiley.com > > > > "What I primarily learned in grad school is how much I *don't* know. > > Consequently, I left grad school with a higher ignorance to knowledge ratio > > than > > when I entered." > >-- Keith Wiley > > > > > > > > > > > > -- > > Nitin Pawar > > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "It's a fine line between meticulous and obsessive-compulsive and a slippery > rope between obsessive-compulsive and debilitatingly slow." >-- Keith Wiley > > > > > > -- > Nitin Pawar Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't scratch. All together this implies:
Re: Need rank()
You are getting the error because you are ORDERing BY rank but rank is not in the top SELECT Also, DISTRIBUTE BY/SORT BY are done after SELECT so you have to use a subquery: SELECT ..., rank(user) FROM (SELECT ... DISTRIBUTE BY ... SORT BY) igor decide.com On Tue, Apr 2, 2013 at 10:03 AM, Keith Wiley wrote: > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > > > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ > > Is there any possibility there is a bug in Ritesh Agrawal's query > statement from that article? I created a test table with the exact column > names from the example in the article and used a minimally altered version > of the command (I removed the where clause to simplify things a bit) and > got an error which suggests there is something slightly wrong with the > command (or perhaps the table has to be configured a special way). Here's > what I get when I almost perfectly duplicate that example: > > hive> describe test; > OK > userstring > categorystring > value int > Time taken: 0.082 seconds > == > hive> select * from test; > OK > user1 cat11 > user1 cat12 > user1 cat13 > user1 cat210 > user1 cat220 > user1 cat230 > user2 cat111 > user2 cat121 > user2 cat131 > user2 cat25 > user2 cat26 > user2 cat27 > Time taken: 0.202 seconds > == > hive> SELECT user, category, value > > FROM ( > > SELECT user, category, rank(user) as rank, value > > FROM test > > DISTRIBUTE BY user > > SORT BY user, value desc > > ) a > > WHERE rank > 1 > > ORDER BY user, rank; > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or > column reference 'rank': (possible column names are: _col0, _col1, _col2) > hive> > > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "What I primarily learned in grad school is how much I *don't* know. > Consequently, I left grad school with a higher ignorance to knowledge > ratio than > when I entered." >-- Keith Wiley > > > >
Re: Need rank()
I am not sure about this but you defined a rank function and then aliased the column as rank itself. Can this cause the issue? >From the error it looks like it could not figure out which column is rank in the where clause "WHERE rank > 1" from Edward's blog the correct query looks similar but his function is named as p_rank SELECT category,country,product,sales,rank FROM ( SELECT category,country,product,sales, p_rank(category, country) rank FROM ( SELECT category,country,product, sales FROM p_rank_demo DISTRIBUTE BY category,country SORT BY category,country,sales desc) t1) t2 WHERE rank <= 3 On Tue, Apr 2, 2013 at 10:45 PM, Keith Wiley wrote: > I did all that, I just didn't paste it into the email. That isn't the > problem. Sorry for the confusion. > > hive> add jar Rank.jar; > hive> create temporary function rank as 'com.example.hive.udf.Rank'; > > BTW, small typo, the condition at the end uses less-then, not > greater-then...obviously...but that isn't the problem either. > > On Apr 2, 2013, at 10:06 , Nitin Pawar wrote: > > > I dont see you added rank in the classpath and defined rank function > > > > can you try add jar and define the function and try again > > > > > > > > On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley > wrote: > > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > > > > > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ > > > > Is there any possibility there is a bug in Ritesh Agrawal's query > statement from that article? I created a test table with the exact column > names from the example in the article and used a minimally altered version > of the command (I removed the where clause to simplify things a bit) and > got an error which suggests there is something slightly wrong with the > command (or perhaps the table has to be configured a special way). Here's > what I get when I almost perfectly duplicate that example: > > > > hive> describe test; > > OK > > userstring > > categorystring > > value int > > Time taken: 0.082 seconds > > == > > hive> select * from test; > > OK > > user1 cat11 > > user1 cat12 > > user1 cat13 > > user1 cat210 > > user1 cat220 > > user1 cat230 > > user2 cat111 > > user2 cat121 > > user2 cat131 > > user2 cat25 > > user2 cat26 > > user2 cat27 > > Time taken: 0.202 seconds > > == > > hive> SELECT user, category, value > > > FROM ( > > > SELECT user, category, rank(user) as rank, value > > > FROM test > > > DISTRIBUTE BY user > > > SORT BY user, value desc > > > ) a > > > WHERE rank > 1 > > > ORDER BY user, rank; > > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or > column reference 'rank': (possible column names are: _col0, _col1, _col2) > > hive> > > > > > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > > > "What I primarily learned in grad school is how much I *don't* know. > > Consequently, I left grad school with a higher ignorance to knowledge > ratio than > > when I entered." > >-- Keith Wiley > > > > > > > > > > > > > -- > > Nitin Pawar > > > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "It's a fine line between meticulous and obsessive-compulsive and a > slippery > rope between obsessive-compulsive and debilitatingly slow." >-- Keith Wiley > > > > -- Nitin Pawar
Re: Need rank()
I did all that, I just didn't paste it into the email. That isn't the problem. Sorry for the confusion. hive> add jar Rank.jar; hive> create temporary function rank as 'com.example.hive.udf.Rank'; BTW, small typo, the condition at the end uses less-then, not greater-then...obviously...but that isn't the problem either. On Apr 2, 2013, at 10:06 , Nitin Pawar wrote: > I dont see you added rank in the classpath and defined rank function > > can you try add jar and define the function and try again > > > > On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley wrote: > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ > > Is there any possibility there is a bug in Ritesh Agrawal's query statement > from that article? I created a test table with the exact column names from > the example in the article and used a minimally altered version of the > command (I removed the where clause to simplify things a bit) and got an > error which suggests there is something slightly wrong with the command (or > perhaps the table has to be configured a special way). Here's what I get > when I almost perfectly duplicate that example: > > hive> describe test; > OK > userstring > categorystring > value int > Time taken: 0.082 seconds > == > hive> select * from test; > OK > user1 cat11 > user1 cat12 > user1 cat13 > user1 cat210 > user1 cat220 > user1 cat230 > user2 cat111 > user2 cat121 > user2 cat131 > user2 cat25 > user2 cat26 > user2 cat27 > Time taken: 0.202 seconds > == > hive> SELECT user, category, value > > FROM ( > > SELECT user, category, rank(user) as rank, value > > FROM test > > DISTRIBUTE BY user > > SORT BY user, value desc > > ) a > > WHERE rank > 1 > > ORDER BY user, rank; > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column > reference 'rank': (possible column names are: _col0, _col1, _col2) > hive> > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "What I primarily learned in grad school is how much I *don't* know. > Consequently, I left grad school with a higher ignorance to knowledge ratio > than > when I entered." >-- Keith Wiley > > > > > > -- > Nitin Pawar Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "It's a fine line between meticulous and obsessive-compulsive and a slippery rope between obsessive-compulsive and debilitatingly slow." -- Keith Wiley
Re: Need rank()
I dont see you added rank in the classpath and defined rank function can you try add jar and define the function and try again On Tue, Apr 2, 2013 at 10:33 PM, Keith Wiley wrote: > On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > > > > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ > > Is there any possibility there is a bug in Ritesh Agrawal's query > statement from that article? I created a test table with the exact column > names from the example in the article and used a minimally altered version > of the command (I removed the where clause to simplify things a bit) and > got an error which suggests there is something slightly wrong with the > command (or perhaps the table has to be configured a special way). Here's > what I get when I almost perfectly duplicate that example: > > hive> describe test; > OK > userstring > categorystring > value int > Time taken: 0.082 seconds > == > hive> select * from test; > OK > user1 cat11 > user1 cat12 > user1 cat13 > user1 cat210 > user1 cat220 > user1 cat230 > user2 cat111 > user2 cat121 > user2 cat131 > user2 cat25 > user2 cat26 > user2 cat27 > Time taken: 0.202 seconds > == > hive> SELECT user, category, value > > FROM ( > > SELECT user, category, rank(user) as rank, value > > FROM test > > DISTRIBUTE BY user > > SORT BY user, value desc > > ) a > > WHERE rank > 1 > > ORDER BY user, rank; > FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or > column reference 'rank': (possible column names are: _col0, _col1, _col2) > hive> > > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "What I primarily learned in grad school is how much I *don't* know. > Consequently, I left grad school with a higher ignorance to knowledge > ratio than > when I entered." >-- Keith Wiley > > > > -- Nitin Pawar
Re: Need rank()
On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ Is there any possibility there is a bug in Ritesh Agrawal's query statement from that article? I created a test table with the exact column names from the example in the article and used a minimally altered version of the command (I removed the where clause to simplify things a bit) and got an error which suggests there is something slightly wrong with the command (or perhaps the table has to be configured a special way). Here's what I get when I almost perfectly duplicate that example: hive> describe test; OK userstring categorystring value int Time taken: 0.082 seconds == hive> select * from test; OK user1 cat11 user1 cat12 user1 cat13 user1 cat210 user1 cat220 user1 cat230 user2 cat111 user2 cat121 user2 cat131 user2 cat25 user2 cat26 user2 cat27 Time taken: 0.202 seconds == hive> SELECT user, category, value > FROM ( > SELECT user, category, rank(user) as rank, value > FROM test > DISTRIBUTE BY user > SORT BY user, value desc > ) a > WHERE rank > 1 > ORDER BY user, rank; FAILED: Error in semantic analysis: Line 9:15 Invalid table alias or column reference 'rank': (possible column names are: _col0, _col1, _col2) hive> Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "What I primarily learned in grad school is how much I *don't* know. Consequently, I left grad school with a higher ignorance to knowledge ratio than when I entered." -- Keith Wiley
Re: Need rank(), can't build m6d's version
Thank you Edward. I appreciate you trying to help out. I'll see if I can get that to work. On Apr 1, 2013, at 17:09 , Edward Capriolo wrote: > approx steps > git clone hive-test > mvn install > git clone hive-rang > mvn install > > or just > git clone hive-test > mvn -Dmaven.test.skip=true install Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "The easy confidence with which I know another man's religion is folly teaches me to suspect that my own is also." -- Mark Twain
Re: Need rank(), can't build m6d's version
approx steps git clone hive-test mvn install git clone hive-rang mvn install or just git clone hive-test mvn -Dmaven.test.skip=true install On Mon, Apr 1, 2013 at 6:45 PM, Keith Wiley wrote: > I need rank() in Hive. I have't had much luck with Edward Capriolo's on > git and it comes with no documentation. It depends on hive-test (also by > Edward) and I can't get maven to build the tests for hive-test which > prevents me from building the hive-test package (the jar)...which is all > pretty tangential to the real goal, which is rank of course. > > Is there any place to download the hive-rank jar pre-built instead of > having to build it out of git? If not, has anyone built it recently, or > namely has anyone built the hive-test project it depends on? The following > is just one example of the sort of error I see. Remember this error is on > hive-test, which frankly I don't even care about. :-) > > Feeling a little desperate...thanks for any help. > > > --- > Test set: com.jointhegrid.hive_test.EmbeddedHiveExampleTest > > --- > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.878 sec > <<< FAILURE! > testA(com.jointhegrid.hive_test.EmbeddedHiveExampleTest) Time elapsed: > 7.823 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<9> > at org.junit.Assert.fail(Assert.java:91) > at org.junit.Assert.failNotEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:126) > at org.junit.Assert.assertEquals(Assert.java:470) > at org.junit.Assert.assertEquals(Assert.java:454) > at > com.jointhegrid.hive_test.EmbeddedHiveExampleTest.testA(EmbeddedHiveExampleTest.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) > at junit.framework.TestResult$1.protect(TestResult.java:110) > at junit.framework.TestResult.runProtected(TestResult.java:128) > at junit.framework.TestResult.run(TestResult.java:113) > at junit.framework.TestCase.run(TestCase.java:124) > at junit.framework.TestSuite.runTest(TestSuite.java:243) > at junit.framework.TestSuite.run(TestSuite.java:238) > at > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) > at > org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103) > at $Proxy0.invoke(Unknown Source) > at > org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150) > at > org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69) > > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "I do not feel obliged to believe that the same God who has endowed us with > sense, reason, and intellect has intended us to forgo their use." >-- Galileo Galilei > > > >
Re: Need rank(), can't build m6d's version
Thanks! That looks like it has serious potential. I'll study up on it. On Apr 1, 2013, at 16:12 , Alexander Pivovarov wrote: > http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "And what if we picked the wrong religion? Every week, we're just making God madder and madder!" -- Homer Simpson
Re: Need rank(), can't build m6d's version
http://ragrawal.wordpress.com/2011/11/18/extract-top-n-records-in-each-group-in-hadoophive/ On Mon, Apr 1, 2013 at 3:45 PM, Keith Wiley wrote: > I need rank() in Hive. I have't had much luck with Edward Capriolo's on > git and it comes with no documentation. It depends on hive-test (also by > Edward) and I can't get maven to build the tests for hive-test which > prevents me from building the hive-test package (the jar)...which is all > pretty tangential to the real goal, which is rank of course. > > Is there any place to download the hive-rank jar pre-built instead of > having to build it out of git? If not, has anyone built it recently, or > namely has anyone built the hive-test project it depends on? The following > is just one example of the sort of error I see. Remember this error is on > hive-test, which frankly I don't even care about. :-) > > Feeling a little desperate...thanks for any help. > > > --- > Test set: com.jointhegrid.hive_test.EmbeddedHiveExampleTest > > --- > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.878 sec > <<< FAILURE! > testA(com.jointhegrid.hive_test.EmbeddedHiveExampleTest) Time elapsed: > 7.823 sec <<< FAILURE! > java.lang.AssertionError: expected:<0> but was:<9> > at org.junit.Assert.fail(Assert.java:91) > at org.junit.Assert.failNotEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:126) > at org.junit.Assert.assertEquals(Assert.java:470) > at org.junit.Assert.assertEquals(Assert.java:454) > at > com.jointhegrid.hive_test.EmbeddedHiveExampleTest.testA(EmbeddedHiveExampleTest.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) > at junit.framework.TestResult$1.protect(TestResult.java:110) > at junit.framework.TestResult.runProtected(TestResult.java:128) > at junit.framework.TestResult.run(TestResult.java:113) > at junit.framework.TestCase.run(TestCase.java:124) > at junit.framework.TestSuite.runTest(TestSuite.java:243) > at junit.framework.TestSuite.run(TestSuite.java:238) > at > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) > at > org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103) > at $Proxy0.invoke(Unknown Source) > at > org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150) > at > org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69) > > > > Keith Wiley kwi...@keithwiley.com keithwiley.com > music.keithwiley.com > > "I do not feel obliged to believe that the same God who has endowed us with > sense, reason, and intellect has intended us to forgo their use." >-- Galileo Galilei > > > >
Need rank(), can't build m6d's version
I need rank() in Hive. I have't had much luck with Edward Capriolo's on git and it comes with no documentation. It depends on hive-test (also by Edward) and I can't get maven to build the tests for hive-test which prevents me from building the hive-test package (the jar)...which is all pretty tangential to the real goal, which is rank of course. Is there any place to download the hive-rank jar pre-built instead of having to build it out of git? If not, has anyone built it recently, or namely has anyone built the hive-test project it depends on? The following is just one example of the sort of error I see. Remember this error is on hive-test, which frankly I don't even care about. :-) Feeling a little desperate...thanks for any help. --- Test set: com.jointhegrid.hive_test.EmbeddedHiveExampleTest --- Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.878 sec <<< FAILURE! testA(com.jointhegrid.hive_test.EmbeddedHiveExampleTest) Time elapsed: 7.823 sec <<< FAILURE! java.lang.AssertionError: expected:<0> but was:<9> at org.junit.Assert.fail(Assert.java:91) at org.junit.Assert.failNotEquals(Assert.java:645) at org.junit.Assert.assertEquals(Assert.java:126) at org.junit.Assert.assertEquals(Assert.java:470) at org.junit.Assert.assertEquals(Assert.java:454) at com.jointhegrid.hive_test.EmbeddedHiveExampleTest.testA(EmbeddedHiveExampleTest.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:35) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:115) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:97) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103) at $Proxy0.invoke(Unknown Source) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69) Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com "I do not feel obliged to believe that the same God who has endowed us with sense, reason, and intellect has intended us to forgo their use." -- Galileo Galilei