Re: Number of data files and opened file descriptors are not decreasing after DROP TABLE.
On Mon, Apr 24, 2017 at 8:06 PM, Jason Heo wrote: > Thanks David > > Hi Mike. I'm using Kudu 1.3.0 bundled in "Cloudera Express 5.10.0 (#85 > built by jenkins on 20170120-1037 git: aa0b5cd5eceaefe2f971c13ab65702 > 0d96bb842a)" > > My concern is that something does not free up cleanly and something wastes > of my resources. eg) I dropped a 30TB table, but in tablet_data, there are > still 3TB files. And the output of "lsof" shows that tserver opens 50M > files. So I emailed to know how to remove unnecessarily files. > The leftover space usage could come from a couple of different root causes. For 1.4 we're working on tools (including the below-mentioned fs-check) to detect and repair the "orphaned" data usage. > > It seems I can't use "kudu fs check" though. > > $ kudu fs check > Invalid argument: unknown command 'check' > Usage: > /path/to/cloudera/parcels/KUDU-1.3.0-1.cdh5.11.0.p0.12/bin/../lib/kudu/bin/kudu > fs [] > > can be one of the following: > dump Dump a Kudu filesystem > format Format a new Kudu filesystem > > Then I'll try "kudu fs check" when it will be available in Cloudera Manager > Sorry, 'fs check' is coming in 1.4, You can build the 'kudu' tool from source, though, and run it against a 1.3 cluster. -Todd > > Thanks > > 2017-04-25 3:54 GMT+09:00 Mike Percy : > >> HI Jason, >> I would strongly recommend upgrading to Kudu 1.3.1 as 1.3.0 has a serious >> data-loss bug related to re-replication. Please see >> https://kudu.apache.org/releases/1.3.1/docs/release_notes.html (if you >> are using the Cloudera version of 1.3.0, no need to worry because it >> includes the fix for that bug). >> >> In 1.3.0 and 1.3.1 you should be able to use the "kudu fs check" tool to >> see if you have orphaned blocks. If you do, you could use the --repair >> argument to that tool to repair it if you bring your tablet server offline. >> >> That said, Kudu uses hole punching to delete data and the same container >> files may remain open even after removing data. After dropping tables, you >> should see disk usage at the file system level drop. >> >> I'm not sure that I've answered all your questions. If you have specific >> concerns, please let us know what you are worried about. >> >> Mike >> >> On Sun, Apr 23, 2017 at 11:43 PM, Jason Heo >> wrote: >> >>> Hi. >>> >>> Before dropping, there were about 30 tables, 27,000 files in tablet_data >>> directory. >>> I dropped most tables and there is ONLY one table which has 400 tablets >>> in my test Kudu cluster. >>> After dropping, there are still 27,000 files in tablet_data directory, >>> and output of /sbin/lsof is the same before dropping. (kudu tserver >>> opens almost 50M files) >>> >>> I'm curious that this can be resolved using "kudu fs check" which is >>> available at Kudu 1.4. >>> >>> I used Kudu 1.2 when executing `DROP TABLE` and currently using Kudu >>> 1.3.0 >>> >>> Regards, >>> >>> Jason >>> >>> >> > -- Todd Lipcon Software Engineer, Cloudera
Re: Number of data files and opened file descriptors are not decreasing after DROP TABLE.
Thanks David Hi Mike. I'm using Kudu 1.3.0 bundled in "Cloudera Express 5.10.0 (#85 built by jenkins on 20170120-1037 git: aa0b5cd5eceaefe2f971c13ab657020d96bb842a)" My concern is that something does not free up cleanly and something wastes of my resources. eg) I dropped a 30TB table, but in tablet_data, there are still 3TB files. And the output of "lsof" shows that tserver opens 50M files. So I emailed to know how to remove unnecessarily files. It seems I can't use "kudu fs check" though. $ kudu fs check Invalid argument: unknown command 'check' Usage: /path/to/cloudera/parcels/KUDU-1.3.0-1.cdh5.11.0.p0.12/bin/../lib/kudu/bin/kudu fs [] can be one of the following: dump Dump a Kudu filesystem format Format a new Kudu filesystem Then I'll try "kudu fs check" when it will be available in Cloudera Manager Thanks 2017-04-25 3:54 GMT+09:00 Mike Percy : > HI Jason, > I would strongly recommend upgrading to Kudu 1.3.1 as 1.3.0 has a serious > data-loss bug related to re-replication. Please see > https://kudu.apache.org/releases/1.3.1/docs/release_notes.html (if you > are using the Cloudera version of 1.3.0, no need to worry because it > includes the fix for that bug). > > In 1.3.0 and 1.3.1 you should be able to use the "kudu fs check" tool to > see if you have orphaned blocks. If you do, you could use the --repair > argument to that tool to repair it if you bring your tablet server offline. > > That said, Kudu uses hole punching to delete data and the same container > files may remain open even after removing data. After dropping tables, you > should see disk usage at the file system level drop. > > I'm not sure that I've answered all your questions. If you have specific > concerns, please let us know what you are worried about. > > Mike > > On Sun, Apr 23, 2017 at 11:43 PM, Jason Heo > wrote: > >> Hi. >> >> Before dropping, there were about 30 tables, 27,000 files in tablet_data >> directory. >> I dropped most tables and there is ONLY one table which has 400 tablets >> in my test Kudu cluster. >> After dropping, there are still 27,000 files in tablet_data directory, >> and output of /sbin/lsof is the same before dropping. (kudu tserver >> opens almost 50M files) >> >> I'm curious that this can be resolved using "kudu fs check" which is >> available at Kudu 1.4. >> >> I used Kudu 1.2 when executing `DROP TABLE` and currently using Kudu 1.3.0 >> >> Regards, >> >> Jason >> >> >
Re: Number of data files and opened file descriptors are not decreasing after DROP TABLE.
HI Jason, I would strongly recommend upgrading to Kudu 1.3.1 as 1.3.0 has a serious data-loss bug related to re-replication. Please see https://kudu.apache.org/ releases/1.3.1/docs/release_notes.html (if you are using the Cloudera version of 1.3.0, no need to worry because it includes the fix for that bug). In 1.3.0 and 1.3.1 you should be able to use the "kudu fs check" tool to see if you have orphaned blocks. If you do, you could use the --repair argument to that tool to repair it if you bring your tablet server offline. That said, Kudu uses hole punching to delete data and the same container files may remain open even after removing data. After dropping tables, you should see disk usage at the file system level drop. I'm not sure that I've answered all your questions. If you have specific concerns, please let us know what you are worried about. Mike On Sun, Apr 23, 2017 at 11:43 PM, Jason Heo wrote: > Hi. > > Before dropping, there were about 30 tables, 27,000 files in tablet_data > directory. > I dropped most tables and there is ONLY one table which has 400 tablets in > my test Kudu cluster. > After dropping, there are still 27,000 files in tablet_data directory, > and output of /sbin/lsof is the same before dropping. (kudu tserver opens > almost 50M files) > > I'm curious that this can be resolved using "kudu fs check" which is > available at Kudu 1.4. > > I used Kudu 1.2 when executing `DROP TABLE` and currently using Kudu 1.3.0 > > Regards, > > Jason > >