Re: Change in serdeproperties does not update existing partitions

2011-09-14 Thread Ashutosh Chauhan
Hey Maxime,

Looks like there is some confusion here. You need not to recreate partition
any time you update something about the table. If you e.g. are adding new
columns, you can just do alter table add column and then alter table add
partition.. you need not to do anything about existing partition in those
cases and things will work fine. What I was suggesting was a workaround
because of lack of the functionality of changing serdeproperties of existing
partition. Ideally, it should be possible to do so, but currently that
feature is not there.

Hope it helps,
Ashutosh

On Tue, Sep 13, 2011 at 11:48, Maxime Brugidou wrote:

> Thanks Ashutosh for your answer. I actually use external tables so that i
> don't drop my partitions data.
>
> This is still an odd behavior to me and I don't get why someone would
> expect it. Whenever I need to add a column to a table (my table here
> represent a log, and it is common to add fields to logs), I need to drop all
> partitions and recreate them. How do people do in general?
>
> Do you have a use case where people want to alter a table and not update
> existing partitions? Is it so that if your file format evolves you don't
> have to convert the whole history?
>
> Best,
> Maxime
>
> On Tue, Sep 13, 2011 at 7:03 PM, Ashutosh Chauhan wrote:
>
>> Hey Maxime,
>>
>> Yeah, thats intended behavior. After you do alter on table, all subsequent
>> actions on table and partitions will inherit from it. If you want to modify
>> properties of already existing partitions, you should be able to do
>> something like 'alter table test_table partition (day='2011-09-02') set
>> serdeproperties ('input.regex' = '(.*)')' Unfortunately this is not
>> supported currently. Feel free to file a bug for that.
>>
>> A workaround (applicable only because you are using external table) is to
>> drop partition and then add them again. When you drop a partition from
>> external table, only metadata gets wiped out, data is not deleted, so when
>> you will add partition again, it will inherit from table serde properties
>> and you will get what you are looking for. Use this workaround with care,
>> you don't want to loose your data in recreating partitions.
>>
>> Hope it helps,
>> Ashutosh
>>
>> On Tue, Sep 13, 2011 at 06:03, Maxime Brugidou > > wrote:
>>
>>> Hello,
>>>
>>> I am using Hive 0.7 from cloudera cdh3u0 and I encounter a strange
>>> behavior when I update the serdeproperties of a table (for example for the
>>> RegexSerDe).
>>>
>>> If you have a simple partitioned table like
>>>
>>> create external table test_table (
>>> id int)
>>> partitioned by (day string)
>>> row format serde 'org.apache.hadoop.contrib.serde2.RegexSerDe'
>>> with serdeproperties (
>>> 'input.regex' = '.* ([^ ]*)'
>>> );
>>>
>>> alter table test_table add partition (day='2011-09-01');
>>>
>>> alter table test_table set serdeproperties  (
>>> 'input.regex' = '(.*)'
>>> );
>>>
>>> alter table test_table add partition (day='2011-09-02');
>>>
>>>
>>> The first partition will still use the older regex and the new one will
>>> use the new regex. Is this intended behavior? Why?
>>>
>>> Thanks for your help,
>>> Maxime
>>>
>>>
>>
>


Re: Change in serdeproperties does not update existing partitions

2011-09-13 Thread Maxime Brugidou
Thanks Ashutosh for your answer. I actually use external tables so that i
don't drop my partitions data.

This is still an odd behavior to me and I don't get why someone would expect
it. Whenever I need to add a column to a table (my table here represent a
log, and it is common to add fields to logs), I need to drop all partitions
and recreate them. How do people do in general?

Do you have a use case where people want to alter a table and not update
existing partitions? Is it so that if your file format evolves you don't
have to convert the whole history?

Best,
Maxime

On Tue, Sep 13, 2011 at 7:03 PM, Ashutosh Chauhan wrote:

> Hey Maxime,
>
> Yeah, thats intended behavior. After you do alter on table, all subsequent
> actions on table and partitions will inherit from it. If you want to modify
> properties of already existing partitions, you should be able to do
> something like 'alter table test_table partition (day='2011-09-02') set
> serdeproperties ('input.regex' = '(.*)')' Unfortunately this is not
> supported currently. Feel free to file a bug for that.
>
> A workaround (applicable only because you are using external table) is to
> drop partition and then add them again. When you drop a partition from
> external table, only metadata gets wiped out, data is not deleted, so when
> you will add partition again, it will inherit from table serde properties
> and you will get what you are looking for. Use this workaround with care,
> you don't want to loose your data in recreating partitions.
>
> Hope it helps,
> Ashutosh
>
> On Tue, Sep 13, 2011 at 06:03, Maxime Brugidou 
> wrote:
>
>> Hello,
>>
>> I am using Hive 0.7 from cloudera cdh3u0 and I encounter a strange
>> behavior when I update the serdeproperties of a table (for example for the
>> RegexSerDe).
>>
>> If you have a simple partitioned table like
>>
>> create external table test_table (
>> id int)
>> partitioned by (day string)
>> row format serde 'org.apache.hadoop.contrib.serde2.RegexSerDe'
>> with serdeproperties (
>> 'input.regex' = '.* ([^ ]*)'
>> );
>>
>> alter table test_table add partition (day='2011-09-01');
>>
>> alter table test_table set serdeproperties  (
>> 'input.regex' = '(.*)'
>> );
>>
>> alter table test_table add partition (day='2011-09-02');
>>
>>
>> The first partition will still use the older regex and the new one will
>> use the new regex. Is this intended behavior? Why?
>>
>> Thanks for your help,
>> Maxime
>>
>>
>


Re: Change in serdeproperties does not update existing partitions

2011-09-13 Thread Ashutosh Chauhan
Hey Maxime,

Yeah, thats intended behavior. After you do alter on table, all subsequent
actions on table and partitions will inherit from it. If you want to modify
properties of already existing partitions, you should be able to do
something like 'alter table test_table partition (day='2011-09-02') set
serdeproperties ('input.regex' = '(.*)')' Unfortunately this is not
supported currently. Feel free to file a bug for that.

A workaround (applicable only because you are using external table) is to
drop partition and then add them again. When you drop a partition from
external table, only metadata gets wiped out, data is not deleted, so when
you will add partition again, it will inherit from table serde properties
and you will get what you are looking for. Use this workaround with care,
you don't want to loose your data in recreating partitions.

Hope it helps,
Ashutosh

On Tue, Sep 13, 2011 at 06:03, Maxime Brugidou wrote:

> Hello,
>
> I am using Hive 0.7 from cloudera cdh3u0 and I encounter a strange behavior
> when I update the serdeproperties of a table (for example for the
> RegexSerDe).
>
> If you have a simple partitioned table like
>
> create external table test_table (
> id int)
> partitioned by (day string)
> row format serde 'org.apache.hadoop.contrib.serde2.RegexSerDe'
> with serdeproperties (
> 'input.regex' = '.* ([^ ]*)'
> );
>
> alter table test_table add partition (day='2011-09-01');
>
> alter table test_table set serdeproperties  (
> 'input.regex' = '(.*)'
> );
>
> alter table test_table add partition (day='2011-09-02');
>
>
> The first partition will still use the older regex and the new one will use
> the new regex. Is this intended behavior? Why?
>
> Thanks for your help,
> Maxime
>
>


Change in serdeproperties does not update existing partitions

2011-09-13 Thread Maxime Brugidou
Hello,

I am using Hive 0.7 from cloudera cdh3u0 and I encounter a strange behavior
when I update the serdeproperties of a table (for example for the
RegexSerDe).

If you have a simple partitioned table like

create external table test_table (
id int)
partitioned by (day string)
row format serde 'org.apache.hadoop.contrib.serde2.RegexSerDe'
with serdeproperties (
'input.regex' = '.* ([^ ]*)'
);

alter table test_table add partition (day='2011-09-01');

alter table test_table set serdeproperties  (
'input.regex' = '(.*)'
);

alter table test_table add partition (day='2011-09-02');


The first partition will still use the older regex and the new one will use
the new regex. Is this intended behavior? Why?

Thanks for your help,
Maxime