ROW_NUMBER() equivalent in Hive
Hi, This is Kumar, and this is my first question in this group. I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering. It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing in UDF can assign same rank to grouping key considering dataset is small, but ordering need to be done in prior step. Can we do this in lot simpler way? Thanks in advance. Regards, Kumar
Re: bucketing on a column with millions of unique IDs
Hi Li The major consideration you should give is regarding the size of bucket. One bucket corresponds to a file in hdfs and you should ensure that every bucket is atleast a block size or in the worst case atleast majority of the buckets should be. So based on the data size you should derive on this rather than the number of rows/records. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Echo Li Date: Wed, 20 Feb 2013 16:19:43 To: Reply-To: user@hive.apache.org Subject: bucketing on a column with millions of unique IDs Hi guys, I plan to bucket a table by "userid" as I'm going to do intense calculation using "group by userid". there are about 110 million rows, with 7 million unique userid, so my question is what is a good number of buckets for this scenario, and how to determine number of buckets? Any input is apprecaited :) Echo
bucketing on a column with millions of unique IDs
Hi guys, I plan to bucket a table by "userid" as I'm going to do intense calculation using "group by userid". there are about 110 million rows, with 7 million unique userid, so my question is what is a good number of buckets for this scenario, and how to determine number of buckets? Any input is apprecaited :) Echo
Re: Need tab separated output file and put limit on number of lines in a output file
Hi Mark, We mostly do insert overwrite into local directory, and at that location multiple files with output of that query are created and we use these files our analysis. So, we want these files to be tab separated. Limiting the number of records means limiting the length of a file, not limiting the overall output. For example, suppose my queries output has 1 lines, I want to put limit on length of a file to 1000 lines, so hive should give me 10 files of 1000 lines each, is there any configuration for this ? Thanks, Chunky. On Wed, Feb 20, 2013 at 10:50 PM, Mark Grover wrote: > Chunky, > There may be another way to do this but to get tab separated output, I > usually create an external table that's tab separated and insert > overwrite into that table. > > For limiting the number of records in the output, you can use the > limit clause in your query. > > Mark > > On Tue, Feb 19, 2013 at 10:53 PM, Chunky Gupta > wrote: > > Hi, > > > > Currently the output file columns of my query is separate by "^A", I > need my > > output to be separated by tab. Can anybody help me in setting this ? > > > > I more doubt, I want to limit the number of lines in output files. For > > example, I do not want any of my output file to be more than 1000 lines, > can > > I set this in configuration ? > > > > Thanks, > > Chunky. >
Re: Need tab separated output file and put limit on number of lines in a output file
Chunky, There may be another way to do this but to get tab separated output, I usually create an external table that's tab separated and insert overwrite into that table. For limiting the number of records in the output, you can use the limit clause in your query. Mark On Tue, Feb 19, 2013 at 10:53 PM, Chunky Gupta wrote: > Hi, > > Currently the output file columns of my query is separate by "^A", I need my > output to be separated by tab. Can anybody help me in setting this ? > > I more doubt, I want to limit the number of lines in output files. For > example, I do not want any of my output file to be more than 1000 lines, can > I set this in configuration ? > > Thanks, > Chunky.
ower of hive table
hi folks, I have a quick question, suppose I create a table in hive by a user and after sometime I want to change the owner of the table. 1- how can I change the owner of the table. 2- does I also need to change the directory owner 3- or what is the feasible way to do that. thanks hadoophive