ROW_NUMBER() equivalent in Hive

2013-02-20 Thread kumar mr
Hi,


This is Kumar, and this is my first question in this group.


I have a requirement to implement ROW_NUMBER() from Teradata in Hive where 
partitioning happens on multiple columns along with multiple column ordering. 
It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing 
in UDF can assign same rank to grouping key considering dataset is small, but 
ordering need to be done in prior step.
Can we do this in lot simpler way? 


Thanks in advance.


Regards,
Kumar


Re: bucketing on a column with millions of unique IDs

2013-02-20 Thread bejoy_ks
Hi Li

The major consideration you should give is regarding the size of bucket. One 
bucket corresponds to a file in hdfs and you should ensure that every bucket is 
atleast a block size or in the worst case atleast majority of the buckets 
should be.

So based on the data size you should derive on this rather than the number of 
rows/records.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Echo Li 
Date: Wed, 20 Feb 2013 16:19:43 
To: 
Reply-To: user@hive.apache.org
Subject: bucketing on a column with millions of unique IDs

Hi guys,

I plan to bucket a table by "userid" as I'm going to do intense calculation
using "group by userid". there are about 110 million rows, with 7 million
unique userid, so my question is what is a good number of buckets for this
scenario, and how to determine number of buckets?

Any input is apprecaited :)

Echo



bucketing on a column with millions of unique IDs

2013-02-20 Thread Echo Li
Hi guys,

I plan to bucket a table by "userid" as I'm going to do intense calculation
using "group by userid". there are about 110 million rows, with 7 million
unique userid, so my question is what is a good number of buckets for this
scenario, and how to determine number of buckets?

Any input is apprecaited :)

Echo


Re: Need tab separated output file and put limit on number of lines in a output file

2013-02-20 Thread Chunky Gupta
Hi Mark,

We mostly do insert overwrite into local directory, and at that location
multiple files with output of that query are created and we use these files
our analysis. So, we want these files to be tab separated.

Limiting the number of records means limiting the length of a file, not
limiting the overall output. For example, suppose my queries output has
1 lines, I want to put limit on length of a file to 1000 lines, so hive
should give me 10 files of 1000 lines each, is there any configuration for
this ?

Thanks,
Chunky.

On Wed, Feb 20, 2013 at 10:50 PM, Mark Grover
wrote:

> Chunky,
> There may be another way to do this but to get tab separated output, I
> usually create an external table that's tab separated and insert
> overwrite into that table.
>
> For limiting the number of records in the output, you can use the
> limit clause in your query.
>
> Mark
>
> On Tue, Feb 19, 2013 at 10:53 PM, Chunky Gupta 
> wrote:
> > Hi,
> >
> > Currently the output file columns of my query is separate by "^A", I
> need my
> > output to be separated by tab. Can anybody help me in setting this ?
> >
> > I more doubt, I want to limit the number of lines in output files. For
> > example, I do not want any of my output file to be more than 1000 lines,
> can
> > I set this in configuration ?
> >
> > Thanks,
> > Chunky.
>


Re: Need tab separated output file and put limit on number of lines in a output file

2013-02-20 Thread Mark Grover
Chunky,
There may be another way to do this but to get tab separated output, I
usually create an external table that's tab separated and insert
overwrite into that table.

For limiting the number of records in the output, you can use the
limit clause in your query.

Mark

On Tue, Feb 19, 2013 at 10:53 PM, Chunky Gupta  wrote:
> Hi,
>
> Currently the output file columns of my query is separate by "^A", I need my
> output to be separated by tab. Can anybody help me in setting this ?
>
> I more doubt, I want to limit the number of lines in output files. For
> example, I do not want any of my output file to be more than 1000 lines, can
> I set this in configuration ?
>
> Thanks,
> Chunky.


ower of hive table

2013-02-20 Thread hadoop hive
hi folks,

I have a quick question, suppose I create a table in hive by a user and
after sometime I want to change the owner of the table.

1- how can I change the owner of the table.
2- does I also need to change the directory owner
3- or what is the feasible way to do that.

thanks
hadoophive