Re: Number of data files and opened file descriptors are not decreasing after DROP TABLE.

2017-04-27 Thread Todd Lipcon
On Mon, Apr 24, 2017 at 8:06 PM, Jason Heo  wrote:

> Thanks David
>
> Hi Mike. I'm using Kudu 1.3.0 bundled in "Cloudera Express 5.10.0 (#85
> built by jenkins on 20170120-1037 git: aa0b5cd5eceaefe2f971c13ab65702
> 0d96bb842a)"
>
> My concern is that something does not free up cleanly and something wastes
> of my resources. eg) I dropped a 30TB table, but in tablet_data, there are
> still 3TB files. And the output of "lsof" shows that tserver opens 50M
> files. So I emailed to know how to remove unnecessarily files.
>

The leftover space usage could come from a couple of different root causes.
For 1.4 we're working on tools (including the below-mentioned fs-check) to
detect and repair the "orphaned" data usage.


>
> It seems I can't use "kudu fs check" though.
>
> $ kudu fs check
> Invalid argument: unknown command 'check'
> Usage: 
> /path/to/cloudera/parcels/KUDU-1.3.0-1.cdh5.11.0.p0.12/bin/../lib/kudu/bin/kudu
> fs  []
>
>  can be one of the following:
> dump   Dump a Kudu filesystem
>   format   Format a new Kudu filesystem
>
> Then I'll try "kudu fs check" when it will be available in Cloudera Manager
>

Sorry, 'fs check' is coming in 1.4, You can build the 'kudu' tool from
source, though, and run it against a 1.3 cluster.

-Todd


>
> Thanks
>
> 2017-04-25 3:54 GMT+09:00 Mike Percy :
>
>> HI Jason,
>> I would strongly recommend upgrading to Kudu 1.3.1 as 1.3.0 has a serious
>> data-loss bug related to re-replication. Please see
>> https://kudu.apache.org/releases/1.3.1/docs/release_notes.html (if you
>> are using the Cloudera version of 1.3.0, no need to worry because it
>> includes the fix for that bug).
>>
>> In 1.3.0 and 1.3.1 you should be able to use the "kudu fs check" tool to
>> see if you have orphaned blocks. If you do, you could use the --repair
>> argument to that tool to repair it if you bring your tablet server offline.
>>
>> That said, Kudu uses hole punching to delete data and the same container
>> files may remain open even after removing data. After dropping tables, you
>> should see disk usage at the file system level drop.
>>
>> I'm not sure that I've answered all your questions. If you have specific
>> concerns, please let us know what you are worried about.
>>
>> Mike
>>
>> On Sun, Apr 23, 2017 at 11:43 PM, Jason Heo 
>> wrote:
>>
>>> Hi.
>>>
>>> Before dropping, there were about 30 tables, 27,000 files in tablet_data
>>>  directory.
>>> I dropped most tables and there is ONLY one table which has 400 tablets
>>> in my test Kudu cluster.
>>> After dropping, there are still 27,000 files in tablet_data directory,
>>> and output of /sbin/lsof is the same before dropping. (kudu tserver
>>> opens almost 50M files)
>>>
>>> I'm curious that this can be resolved using "kudu fs check" which is
>>> available at Kudu 1.4.
>>>
>>> I used Kudu 1.2 when executing `DROP TABLE` and currently using Kudu
>>> 1.3.0
>>>
>>> Regards,
>>>
>>> Jason
>>>
>>>
>>
>


-- 
Todd Lipcon
Software Engineer, Cloudera


Re: Number of data files and opened file descriptors are not decreasing after DROP TABLE.

2017-04-24 Thread Jason Heo
Thanks David

Hi Mike. I'm using Kudu 1.3.0 bundled in "Cloudera Express 5.10.0 (#85
built by jenkins on 20170120-1037 git:
aa0b5cd5eceaefe2f971c13ab657020d96bb842a)"

My concern is that something does not free up cleanly and something wastes
of my resources. eg) I dropped a 30TB table, but in tablet_data, there are
still 3TB files. And the output of "lsof" shows that tserver opens 50M
files. So I emailed to know how to remove unnecessarily files.

It seems I can't use "kudu fs check" though.

$ kudu fs check
Invalid argument: unknown command 'check'
Usage:
/path/to/cloudera/parcels/KUDU-1.3.0-1.cdh5.11.0.p0.12/bin/../lib/kudu/bin/kudu
fs  []

 can be one of the following:
dump   Dump a Kudu filesystem
  format   Format a new Kudu filesystem

Then I'll try "kudu fs check" when it will be available in Cloudera Manager

Thanks

2017-04-25 3:54 GMT+09:00 Mike Percy :

> HI Jason,
> I would strongly recommend upgrading to Kudu 1.3.1 as 1.3.0 has a serious
> data-loss bug related to re-replication. Please see
> https://kudu.apache.org/releases/1.3.1/docs/release_notes.html (if you
> are using the Cloudera version of 1.3.0, no need to worry because it
> includes the fix for that bug).
>
> In 1.3.0 and 1.3.1 you should be able to use the "kudu fs check" tool to
> see if you have orphaned blocks. If you do, you could use the --repair
> argument to that tool to repair it if you bring your tablet server offline.
>
> That said, Kudu uses hole punching to delete data and the same container
> files may remain open even after removing data. After dropping tables, you
> should see disk usage at the file system level drop.
>
> I'm not sure that I've answered all your questions. If you have specific
> concerns, please let us know what you are worried about.
>
> Mike
>
> On Sun, Apr 23, 2017 at 11:43 PM, Jason Heo 
> wrote:
>
>> Hi.
>>
>> Before dropping, there were about 30 tables, 27,000 files in tablet_data
>>  directory.
>> I dropped most tables and there is ONLY one table which has 400 tablets
>> in my test Kudu cluster.
>> After dropping, there are still 27,000 files in tablet_data directory,
>> and output of /sbin/lsof is the same before dropping. (kudu tserver
>> opens almost 50M files)
>>
>> I'm curious that this can be resolved using "kudu fs check" which is
>> available at Kudu 1.4.
>>
>> I used Kudu 1.2 when executing `DROP TABLE` and currently using Kudu 1.3.0
>>
>> Regards,
>>
>> Jason
>>
>>
>


Re: Number of data files and opened file descriptors are not decreasing after DROP TABLE.

2017-04-24 Thread Mike Percy
HI Jason,
I would strongly recommend upgrading to Kudu 1.3.1 as 1.3.0 has a serious
data-loss bug related to re-replication. Please see https://kudu.apache.org/
releases/1.3.1/docs/release_notes.html (if you are using the Cloudera
version of 1.3.0, no need to worry because it includes the fix for that
bug).

In 1.3.0 and 1.3.1 you should be able to use the "kudu fs check" tool to
see if you have orphaned blocks. If you do, you could use the --repair
argument to that tool to repair it if you bring your tablet server offline.

That said, Kudu uses hole punching to delete data and the same container
files may remain open even after removing data. After dropping tables, you
should see disk usage at the file system level drop.

I'm not sure that I've answered all your questions. If you have specific
concerns, please let us know what you are worried about.

Mike

On Sun, Apr 23, 2017 at 11:43 PM, Jason Heo  wrote:

> Hi.
>
> Before dropping, there were about 30 tables, 27,000 files in tablet_data
>  directory.
> I dropped most tables and there is ONLY one table which has 400 tablets in
> my test Kudu cluster.
> After dropping, there are still 27,000 files in tablet_data directory,
> and output of /sbin/lsof is the same before dropping. (kudu tserver opens
> almost 50M files)
>
> I'm curious that this can be resolved using "kudu fs check" which is
> available at Kudu 1.4.
>
> I used Kudu 1.2 when executing `DROP TABLE` and currently using Kudu 1.3.0
>
> Regards,
>
> Jason
>
>