[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-10 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 13:

Wow, ASAN fails again.


--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 13
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 10 Jan 2019 08:40:13 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-09 Thread helifu (Code Review)
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12075

to look at the new patch set (#13).

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full containers which are dead
before hole punching. It can help to ease disk pressure,
save tserver startup time and etc. It uses the container
size and live blocks to determine whether the container
is dead or live.

Step1: wrapping up containers in scoped_refptr. [V]
Step2: supporting dead containers removal.  [V]

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 236 insertions(+), 125 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/13
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 13
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-09 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 12:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager-test.cc@311
PS12, Line 311: 0, 0, 1, 0, 0, 0, 0
> nit: can you add a comment to note what each metric means. e.g., /*METRIC_l
Sorry, that's not a good solution i think. My motivation for the changes here 
is to eliminate and simplify redundant code, so i would rather rollback the 
modification than add comments. What do you think? :p


http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager-test.cc@1696
PS12, Line 1696: TestDeleteDeadContainersByDeletionTransaction
> Can you also add a test for two log blocks refer to one log block container
Done


http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc@372
PS12, Line 372: ~LogBlockContainer();
> Add a comment to describe what happens in this destructor.
Done


http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc@760
PS12, Line 760: bool metadata_missing = false;
  : bool data_missing = false;
> What was the motivation for the changes here? Doesn't seem like the behavio
The motivation here is to support deleting incomplete files directly, otherwise 
opening this type of container will fail and need manual intervention. And i 
believe this will happen occasionally when rebooting cluster. As was said in 
last comments, it's not friendly for kudu newbee.
Well, i rollback this modification. :(


http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc@1342
PS12, Line 1342: // It's possible for multiple deletion transactions to end up 
here. For
   :   // example, if one transaction deletes the second to 
last block in a
   :   // container and the second deletes its last block, the 
destructors of
   :   // both transactions could run at a time that 
container's internal
   :   // bookkeeping reflects the deadness of the container.
> Can you add one more sentence to explain this situation is guarded by TrySe
Done


http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc@2366
PS12, Line 2366: // TODO(adar): this should be reported as an inconsistency 
once dead
   :   // container deletion is also done in real time. Until 
then, it would be
   :   // confusing to report it as such since it'll be a 
natural event at startup.
> Should this be updated?
Done



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 12
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 10 Jan 2019 07:29:57 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-09 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 12:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12075/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12075/12//COMMIT_MSG@9
PS12, Line 9: LBM supports deleting
Can you rephrase the sentence to say this patch intends to support deleting 
dead containers? This sounds like it is already supported today.


http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager-test.cc@311
PS12, Line 311: 0, 0, 1, 0, 0, 0, 0
nit: can you add a comment to note what each metric means. e.g., 
/*METRIC_log_block_manager_containers = */ 1. Same below.


http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager-test.cc@1696
PS12, Line 1696: TestDeleteDeadContainersByDeletionTransaction
Can you also add a test for two log blocks refer to one log block container and 
concurrently delete both and verify the metric looks right?


http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc@372
PS12, Line 372: ~LogBlockContainer();
Add a comment to describe what happens in this destructor.


http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc@1342
PS12, Line 1342: // It's possible for multiple deletion transactions to end up 
here. For
   :   // example, if one transaction deletes the second to 
last block in a
   :   // container and the second deletes its last block, the 
destructors of
   :   // both transactions could run at a time that 
container's internal
   :   // bookkeeping reflects the deadness of the container.
Can you add one more sentence to explain this situation is guarded by 
TrySetDead()?


http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc@2366
PS12, Line 2366: // TODO(adar): this should be reported as an inconsistency 
once dead
   :   // container deletion is also done in real time. Until 
then, it would be
   :   // confusing to report it as such since it'll be a 
natural event at startup.
Should this be updated?



-- 
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 12
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 09 Jan 2019 23:18:10 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-09 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/12/src/kudu/fs/log_block_manager.cc@760
PS12, Line 760: bool metadata_missing = false;
  : bool data_missing = false;
What was the motivation for the changes here? Doesn't seem like the behavior 
has changed, except now there's more state to track (i.e. these 'missing' 
booleans).



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 12
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 09 Jan 2019 17:22:08 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-09 Thread helifu (Code Review)
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12075

to look at the new patch set (#12).

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full containers which are dead
before hole punching. It can help to ease disk pressure,
save tserver startup time and etc. It uses the container
size and live blocks to determine whether the container
is dead or live.

Step1: wrapping up containers in scoped_refptr. [V]
Step2: supporting dead containers removal.  [V]

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 224 insertions(+), 125 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/12
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 12
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-09 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 11:

All right, calling 'SyncDir()' has been removed.


--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 11
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 09 Jan 2019 10:37:07 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-08 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 11:

(3 comments)

> And here are another three comments, may i fix them?

Go ahead. I agree with your first comment, disagree with the second, and don't 
care either way on the third.

http://gerrit.cloudera.org:8080/#/c/12075/11/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/11/src/kudu/fs/log_block_manager.cc@679
PS11, Line 679: 
CONTAINER_DISK_FAILURE(block_manager_->env_->SyncDir(data_dir_->dir()),
> Still, i think it wastes IO to call 'SyncDir' after file deletion. The rogu
Now that we're calling SyncDir for each deleted container instead of just once 
after Repair(), I agree.


http://gerrit.cloudera.org:8080/#/c/12075/11/src/kudu/fs/log_block_manager.cc@2452
PS11, Line 2452: string data_filename = StrCat(container->ToString(), 
kContainerDataFileSuffix);
   :   uint64_t reported_size;
   :   s = env_->GetFileSizeOnDisk(data_filename, 
&reported_size);
   :   if (!s.ok()) {
   : HANDLE_DISK_FAILURE(s, 
error_manager_->RunErrorNotificationCb(
   : ErrorHandlerType::DISK_ERROR, dir));
   : *result_status = s.CloneAndPrepend(Substitute(
   : "Could not get on-disk file size of container $0", 
container->ToString()));
   : return;
   :   }
   :   int64_t cleanup_threshold_size = 
container->live_bytes_aligned() *
   :   (1 + 
FLAGS_log_container_excess_space_before_cleanup_fraction);
   :   if (reported_size > cleanup_threshold_size) {
   : 
local_report.full_container_space_check->entries.emplace_back(
   : container->ToString(), reported_size - 
container->live_bytes_aligned());
   :
   : // If the container is to be deleted outright, don't 
bother repunching
   : // its blocks. The report entry remains, however, so 
it's clear that
   : // there was a space discrepancy.
   : if (container->live_blocks()) {
   :   need_repunching.insert(need_repunching.end(),
   :  dead_blocks.begin(), 
dead_blocks.end());
   : }
   :   }
> It's not necessary to clean up threshold size for a dead container, though
True, although the FsReport we generate tries to account for all 
inconsistencies even if they're cleaned up by deleting containers. Plus 
GetFileSizeOnDisk isn't particularly expensive.


http://gerrit.cloudera.org:8080/#/c/12075/11/src/kudu/fs/log_block_manager.cc@2514
PS11, Line 2514:   AddNewContainerUnlocked(container);
   :   MakeContainerAvailableUnlocked(std::move(container));
> It's a little bit strange to add the dead containers here and then remove t
Yeah but it's simpler in that all containers are treated the same way.

If you like, you could change the plumbing and avoid adding dead containers; 
when I updated your patch it felt a little bit out of scope for me so I didn't 
do it.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 11
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 09 Jan 2019 05:32:39 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-08 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 11:

(3 comments)

Thank you very much :)
And here are another three comments, may i fix them?

http://gerrit.cloudera.org:8080/#/c/12075/11/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/11/src/kudu/fs/log_block_manager.cc@679
PS11, Line 679: 
CONTAINER_DISK_FAILURE(block_manager_->env_->SyncDir(data_dir_->dir()),
Still, i think it wastes IO to call 'SyncDir' after file deletion. The rogue 
admin should be responsible for his operations. Besides, for the kudu newbee, 
it will bring barriers while rebooting the kudu cluster. Anyway, that's only my 
opinion.


http://gerrit.cloudera.org:8080/#/c/12075/11/src/kudu/fs/log_block_manager.cc@2452
PS11, Line 2452: string data_filename = StrCat(container->ToString(), 
kContainerDataFileSuffix);
   :   uint64_t reported_size;
   :   s = env_->GetFileSizeOnDisk(data_filename, 
&reported_size);
   :   if (!s.ok()) {
   : HANDLE_DISK_FAILURE(s, 
error_manager_->RunErrorNotificationCb(
   : ErrorHandlerType::DISK_ERROR, dir));
   : *result_status = s.CloneAndPrepend(Substitute(
   : "Could not get on-disk file size of container $0", 
container->ToString()));
   : return;
   :   }
   :   int64_t cleanup_threshold_size = 
container->live_bytes_aligned() *
   :   (1 + 
FLAGS_log_container_excess_space_before_cleanup_fraction);
   :   if (reported_size > cleanup_threshold_size) {
   : 
local_report.full_container_space_check->entries.emplace_back(
   : container->ToString(), reported_size - 
container->live_bytes_aligned());
   :
   : // If the container is to be deleted outright, don't 
bother repunching
   : // its blocks. The report entry remains, however, so 
it's clear that
   : // there was a space discrepancy.
   : if (container->live_blocks()) {
   :   need_repunching.insert(need_repunching.end(),
   :  dead_blocks.begin(), 
dead_blocks.end());
   : }
   :   }
It's not necessary to clean up threshold size for a dead container, though the 
number of the dead containers is very limited(we know that the dead containers 
will be deleted automatically at the runtime).


http://gerrit.cloudera.org:8080/#/c/12075/11/src/kudu/fs/log_block_manager.cc@2514
PS11, Line 2514:   AddNewContainerUnlocked(container);
   :   MakeContainerAvailableUnlocked(std::move(container));
It's a little bit strange to add the dead containers here and then remove them 
in the 'Repair()'. Maybe we could do something at Line2386.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 11
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 09 Jan 2019 03:22:22 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-08 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 11:

I pushed a new revision with a few cleanups:
- Properly rebased on top of the "wrapped up containers in scoped_refptr" 
change.
- The few remaining DOS line endings converted to UNIX line endings.
- A few formatting/style fixes.
- The only real change was to get read_only handling right: in that case, we 
should not delete dead containers at startup. To do this I moved the call to  
TrySetDead into Repair(), and changed the container vector plumbing from 
OpenDataDir() into Repair() to pass containers by scoped_refptr instead of by 
name.

Please take a look and let me know if anything seems wrong.


--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 11
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 08 Jan 2019 20:36:23 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-08 Thread Adar Dembo (Code Review)
Adar Dembo has uploaded a new patch set (#11) to the change originally created 
by helifu. ( http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full containers which are dead
before hole punching. It can help to ease disk pressure,
save tserver startup time and etc. It uses the container
size and live blocks to determine whether the container
is dead or live.

Step1: wrapping up containers in scoped_refptr. [V]
Step2: supporting dead containers removal.  [V]

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 235 insertions(+), 121 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/11
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 11
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-07 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 10:

> Something's not quite right with this patch; when I look at the full diff, I 
> see both the "wrapping up containers in scoped_refptr" patch as well as the 
> "deletion of dead/full containers" patch mixed together.

This is still the case as of PS10. Here's a workflow I like to use to resolve 
situations like these:
1) 'git checkout' a new branch based on master.
2) 'git cherry-pick' the scoped_refptr commit into this new branch.
3) 'git cherry-pick' this commit onto the new branch.
4) Step 3 will yield a lot of conflicts; manually resolve them.

When you're done, you should have a new branch with both patches in their 
desired state, and you can push the entirety of the new branch to gerrit.


--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 10
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Mon, 07 Jan 2019 22:31:31 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-07 Thread helifu (Code Review)
Hello Tidy Bot, Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12075

to look at the new patch set (#10).

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full containers which are dead
before hole punching. It can help to ease disk pressure,
save tserver startup time and etc. It uses the container
size and live blocks to determine whether the container
is dead or live.

Step1: wrapping up containers in scoped_refptr. [V]
Step2: supporting dead containers removal.  [V]

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 344 insertions(+), 255 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/10
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 10
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-05 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 8:

Something's not quite right with this patch; when I look at the full diff, I 
see both the "wrapping up containers in scoped_refptr" patch as well as the 
"deletion of dead/full containers" patch mixed together.


--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 8
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Sun, 06 Jan 2019 05:06:26 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-05 Thread helifu (Code Review)
Hello Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12075

to look at the new patch set (#8).

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full containers which are dead
before hole punching. It can help to ease disk pressure,
save tserver startup time and etc. It uses the container
size and live blocks to determine whether the container
is dead or live.

Step1: wrapping up containers in scoped_refptr. [V]
Step2: supporting dead containers removal.  [V]

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 344 insertions(+), 255 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/8
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 8
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2019-01-02 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12075/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12075/7//COMMIT_MSG@15
PS7, Line 15: Step1: wrapping up containers in scoped_refptr. [V]
: Step2: supporting dead containers removal.  [X]
> Could you do step 1 in a separate gerrit change rather than overwriting thi
Thanks. Now that you've created https://gerrit.cloudera.org/c/12121, you should 
rebase this change on top of that one.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 7
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 02 Jan 2019 23:19:14 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-22 Thread helifu (Code Review)
helifu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12121


Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full containers which are dead
before hole punching. It can help to ease disk pressure,
save tserver startup time and etc. It uses the container
size and live blocks to determine whether the container
is dead or live.

Step1: wrapping up containers in scoped_refptr. [V]
Step2: supporting dead containers removal.  [X]

Change-Id: I3c5c620014782b3d57dcbe047d0df73c949590c7
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 131 insertions(+), 130 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/12121/1
--
To view, visit http://gerrit.cloudera.org:8080/12121
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3c5c620014782b3d57dcbe047d0df73c949590c7
Gerrit-Change-Number: 12121
Gerrit-PatchSet: 1
Gerrit-Owner: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-21 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12075/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12075/7//COMMIT_MSG@15
PS7, Line 15: Step1: wrapping up containers in scoped_refptr. [V]
: Step2: supporting dead containers removal.  [X]
Could you do step 1 in a separate gerrit change rather than overwriting this 
one? That way this change will continue to reflect the work for KUDU-2636 and 
the interdiff will be relatively clean and easy to review.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 7
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Fri, 21 Dec 2018 17:39:48 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-21 Thread helifu (Code Review)
Hello Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12075

to look at the new patch set (#7).

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full containers which are dead
before hole punching. It can help to ease disk pressure,
save tserver startup time and etc. It uses the container
size and live blocks to determine whether the container
is dead or live.

Step1: wrapping up containers in scoped_refptr. [V]
Step2: supporting dead containers removal.  [X]

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 131 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/7
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 7
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-19 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 6:

> maybe that is not enough. please take a look at this case: two remaining 
> LogBlocks are deleted in different LogBlockDeletionTransaction concurrently, 
> then one will trigger hole punching and the other will trigger dead container 
> removal. but it is not safe to access the container for hole punching when 
> removing dead container is executed first.
> i think the key point is that the container point in 
> 'all_containers_by_name_' is a raw point which will never be deleted after 
> startup. in other words, the container point is not safe.
> i am going to use 'std::shared_ptr' and 'std::weak_ptr' to wrap it.
> what do you think?

That's a very good point.

It's hard to come up with a robust solution that doesn't involve shared 
ownership for containers via std::shared_ptr. I don't think you need weak_ptr 
though. You could take mimic the FileCache behavior and perform dead container 
deletion in the container's destructor. So the sequence of events could be 
something like this:
1. A bunch of deletion transactions run in different threads, all affecting the 
same container.
2. All but one of them trigger hole punching on the container. The hole punch 
closures capture a shared ref to the container.
3. One triggers container deletion, which actually just marks the container as 
deleted and removes it from the global map.
4. The hole punch closures run, and punch holes (unnecessary, but safe).
5. When all the closures are done, the container's ref count is now 0. Its 
destructor runs, and the container is safely deleted from disk.

Since wrapping up containers in std::shared_ptr may be an extensive change, 
would you mind creating a separate change for it, and then rebasing this change 
on top of it?


--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 6
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 20 Dec 2018 02:06:39 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-19 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 6:

maybe that is not enough. please take a look at this case: two remaining 
LogBlocks are deleted in different LogBlockDeletionTransaction concurrently, 
then one will trigger hole punching and the other will trigger dead container 
removal. but it is not safe to access the container for hole punching when 
removing dead container is executed first.
i think the key point is that the container point in 'all_containers_by_name_' 
is a raw point which will never be deleted after startup. in other words, the 
container point is not safe.
i am going to use 'std::shared_ptr' and 'std::weak_ptr' to wrap it.
what do you think?


--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 6
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 20 Dec 2018 01:24:06 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-19 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 6:

> There are more than one LogBlock which will reference the same 
> LogBlockContainer. If these LogBlocks are deleted in different 
> LogBlockDeletionTransaction concurrently, then the LogBlockContainer will be 
> deleted repeatedly.
> I am trying to fix it.

I see, this has to do with the non-atomicity of decrementing 
container->live_blocks_, then later fetching container->live_blocks() to 
compare it with 0. Yeah that is frustrating.

One idea is to modify BlockDeleted() to return whether or not the container 
become dead when this block was deleted, and then plumb that information up the 
various delete paths to use it within CommitDeletedBlocks() to determine 
whether the container should be deleted (rather than looking at live_blocks()).


--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 6
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 19 Dec 2018 19:20:18 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-19 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 6:

(11 comments)

There are more than one LogBlock which will reference the same 
LogBlockContainer. If these LogBlocks are deleted in different 
LogBlockDeletionTransaction concurrently, then the LogBlockContainer will be 
deleted repeatedly.
I am trying to fix it.

http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.h@276
PS6, Line 276:   // Removes dead containers from disk.
 :   //
 :   // Must call RemoveFullContainersUnlocked() first.
> This is a good place to expand about the interaction with FileCache. You co
Done


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1237
PS6, Line 1237:   // Due to the shared ownership of 
LogBlockDeletionTransaction, the destructor is
  :   // responsible for destroying all registered blocks. This 
includes:
  :   // 1. Punching holes in deleted blocks, and
  :   // 2. Deleting dead containers outright.
> The formatting issue persists. Take a look at the diff in gerrit; you'll se
done.
a new line character '\r\n' in MS-DOS causes this problem.


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1256
PS6, Line 1256:   std::unordered_map> deleted_interval_map_;
> Nit: while you're here, could you drop the unnecessary std:: prefixes here
Done


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1260
PS6, Line 1260:   unordered_map> dead_containers_by_dir_;
> The value would probably be more efficient as an unordered_set rather than
Done


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1631
PS6, Line 1631: // Access to container_ must be before calling 
log_block_.reset().
  : //
  : // Because log_block_.reset() may cause the LogBlock() to 
lose its last reference
  : // and be destroyed. This may cause a registered 
LogBlockDeletionTransaction to lose
  : // its last reference and be destroyed, which will trigger 
the asynchronous code that
  : // will eventually destroy the container which is dead. 
Thus, the ordering of operations
  : // here is very important, otherwise it is unsafe.
> Nit: reword a bit:
Done


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1965
PS6, Line 1965: // The Descriptor() in the FileCache() will be destroyed
  : // along with LogBlockContainer's destructor.
> Nit: I would drop this comment. It's not really necessary to improve the un
Done


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1980
PS6, Line 1980: DataDir* dir, const std::vector& names) {
> Don't need std:: prefixes.
Done


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1982
PS6, Line 1982:   for (const auto& name : names) {
> There are some extra line formatting issues here.
Done


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@2008
PS6, Line 2008:   // Remove from the memory.
> Nit: "Remove the containers from memory."
Done


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@2014
PS6, Line 2014:   // Remove from the disk.
> Nit: "Remove the containers from disk."
Done


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@2580
PS6, Line 2580:   // After the deletions, the data directory is sync'ed to 
reduce the chance
  :   // of a data file existing without its corresponding metadata 
file (or vice
  :   // versa) in the event of a crash. The block manager would 
treat such a case
  :   // as corruption and require manual intervention.
  :   //
  :   // TODO(adar) the above is not fool-proof; a crash could 
manifest in between
  :   // any pair of deletions. That said, the odds of it happening 
are incredibly
  :   // rare, and manual resolution isn't hard (just delete the 
existing file).
> Move this into the body of RemoveDeadContainersFromDisk, just above the cal
Done



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 6
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerri

[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-18 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 6:

(11 comments)

The dense_node-itest failure is interesting. Here's some useful bits from the 
log:

  I1219 04:37:17.664212   817 log_block_manager.cc:2002] Deleted 1 dead 
containers (1256 metadata bytes)
  ...
  I1219 04:37:19.328910   817 log_block_manager.cc:2002] Deleted 2 dead 
containers (2513 metadata bytes)
  F1219 04:37:19.336031   817 log_block_manager.cc:1969] Check failed: to_delete
  *** Check failure stack trace: ***
@ 0x7fc8c8bd162d  google::LogMessage::Fail() at ??:0
@ 0x7fc8c8bd364c  google::LogMessage::SendToLog() at ??:0
@ 0x7fc8c8bd1189  google::LogMessage::Flush() at ??:0
@ 0x7fc8c8bd3fdf  google::LogMessageFatal::~LogMessageFatal() at ??:0
@ 0x7fc8c68bf7eb  
kudu::fs::LogBlockManager::RemoveFullContainersUnlocked() at ??:0
@ 0x7fc8c68bf8d3  kudu::fs::LogBlockManager::RemoveDeadContainers() at 
??:0
@ 0x7fc8c938065e  kudu::ThreadPool::DispatchThread() at ??:0
@ 0x7fc8c937626e  kudu::Thread::SuperviseThread() at ??:0
@ 0x7fc8c6d95184  start_thread at ??:0
@ 0x7fc8c81dbffd  clone at ??:0

FYI, the test runs with only one data directory. But it seems like it's trying 
to delete the same set of containers maybe? Some more logging in 
log_block_manager.cc may help. You may also be able to reproduce the failure 
locally.

http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.h
File src/kudu/fs/log_block_manager.h:

http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.h@276
PS6, Line 276:   // Removes dead containers from disk.
 :   //
 :   // Must call RemoveFullContainersUnlocked() first.
This is a good place to expand about the interaction with FileCache. You could 
say:

  // Removes dead containers from disk.
  //
  // Must first call RemoveFullContainersUnlocked() to remove the containers 
from
  // memory. This guarantees that the containers' FileCache descriptors are 
expired.
  // Thus, the FileCache deletions done here will delete the files straight 
away,
  // which allows the syncing of the data dir to actually have some effect.


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1237
PS6, Line 1237:   // Due to the shared ownership of 
LogBlockDeletionTransaction, the destructor is
  :   // responsible for destroying all registered blocks. This 
includes:
  :   // 1. Punching holes in deleted blocks, and
  :   // 2. Deleting dead containers outright.
The formatting issue persists. Take a look at the diff in gerrit; you'll see 
that each line is offset from the next with an extra blank line.


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1256
PS6, Line 1256:   std::unordered_map> deleted_interval_map_;
Nit: while you're here, could you drop the unnecessary std:: prefixes here and 
on L1264?


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1260
PS6, Line 1260:   unordered_map> dead_containers_by_dir_;
The value would probably be more efficient as an unordered_set rather than a 
set.


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1631
PS6, Line 1631: // Access to container_ must be before calling 
log_block_.reset().
  : //
  : // Because log_block_.reset() may cause the LogBlock() to 
lose its last reference
  : // and be destroyed. This may cause a registered 
LogBlockDeletionTransaction to lose
  : // its last reference and be destroyed, which will trigger 
the asynchronous code that
  : // will eventually destroy the container which is dead. 
Thus, the ordering of operations
  : // here is very important, otherwise it is unsafe.
Nit: reword a bit:

  // We may not safely access container_ after calling log_block_.reset().
  //
  // log_block_.reset() may cause the LogBlock() to lose its last reference and 
be destroyed.
  // This may cause a registered LogBlockDeletionTransaction to lose its last 
reference and
  // be destroyed, which will trigger the asynchronous code in its destructor 
that will
  // eventually destroy the (now dead) container.


http://gerrit.cloudera.org:8080/#/c/12075/6/src/kudu/fs/log_block_manager.cc@1965
PS6, Line 1965: // The Descriptor() in the FileCache() will be destroyed
  : // along with LogBlockContainer's destructor.
Nit: I would drop this comment. It's not really necessary to improve the 
understanding of RemoveFullContainersUnlocked; it's more useful for 
RemoveDeadC

[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-18 Thread helifu (Code Review)
Hello Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12075

to look at the new patch set (#6).

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full containers which are dead
before hole punching. It can help to ease disk pressure,
save tserver startup time and etc. It uses the container
size and live blocks to determine whether the container
is dead or live.

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 211 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/6
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 6
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-18 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1979
PS4, Line 1979:   for (auto& name : names) {
> Maybe define a helper function like this?
done.
this comment was previously blocked by the following question, but now it is ok.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1980
PS4, Line 1980: string data_file_name = StrCat(name, 
kContainerDataFileSuffix);
> Looks like you missed this.
Done


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1983
PS4, Line 1983: file_cache_.Invalidate(metadata_file_name);
  : WARN_NOT_OK_LBM_DISK_FAILURE(env_->DeleteFi
> That's true, but the call to RemoveFullContainersUnlocked on L1975 will hav
ah,i can catch up with you at last.
yes, you are correct! i made a mistake earlier.the key point is std::weak_ptr. 
the file_cache_.DeleteFile is safe here. but the logic is a little complicated 
indeed :(


http://gerrit.cloudera.org:8080/#/c/12075/5/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/5/src/kudu/fs/log_block_manager.cc@1237
PS5, Line 1237:   // Due to the shared ownership of 
LogBlockDeletionTransaction, the
  :   // destructor is responsible for destroying all registered 
blocks.
  :   // This includes:
  :   // 1. Punching holes in deleted blocks, and
  :   // 2. Deleting dead containers outright.
> There's something wrong here with the formatting; looks like there's an ext
Done


http://gerrit.cloudera.org:8080/#/c/12075/5/src/kudu/fs/log_block_manager.cc@1632
PS5, Line 1632: log_block_.reset();
  : if (container_->metrics()) {
  :   
container_->metrics()->generic_metrics.blocks_open_reading->Decrement();
  : }
> I think this may be the source of the TSAN data race.
Great job! Done.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 5
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Wed, 19 Dec 2018 04:15:40 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-18 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12075/5/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/5/src/kudu/fs/log_block_manager.cc@1632
PS5, Line 1632: log_block_.reset();
  : if (container_->metrics()) {
  :   
container_->metrics()->generic_metrics.blocks_open_reading->Decrement();
  : }
I think this may be the source of the TSAN data race.

Given the ordering of these operations, log_block_.reset() may cause the 
LogBlock() to lose its last reference and be destroyed. This may cause a 
registered LogBlockDeletionTransaction to lose its last reference and be 
destroyed, which will trigger the asynchronous code that will eventually 
destroy the container. Thus, this access to container_ is unsafe.

It should be fixable by just swapping the order of the two operations here. 
Please also leave a comment with this example that explains how, once 
log_block_.reset() is called, it's no longer safe to access container_. In fact 
we may want to null out container_ to further emphasize this (if someone makes 
this mistake in the future, better to get a SIGSEGV on a 0x0 memory access than 
a data race).



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 5
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 18 Dec 2018 22:13:49 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-18 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 5:

(4 comments)

Looks like you have a TSAN data race to figure out too.

http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1979
PS4, Line 1979:   for (auto& name : names) {
> i have no idea :(
Maybe define a helper function like this?

  void LogBlockManager::DeleteDeadContainersFromDisk(DataDir* dir, const 
vector& names) {
int64_t deleted_metadata_bytes = 0;
for (const auto& n : names) {
  string data_file_name = StrCat(n, kContainerDataFileSuffix);
  string metadata_file_name = StrCat(n, kContainerMetadataFileSuffix);

  uint64_t metadata_size;
  Status s = env_->GetFileSize(metadata_file_name, &metadata_size);
  if (s.ok()) {
deleted_metadata_bytes += metadata_size;
  } else {
WARN_NOT_OK_LBM_DISK_FAILURE(s,
"Could not get size of dead container metadata file " + 
metadata_file_name);
  }

  WARN_NOT_OK_LBM_DISK_FAILURE(file_cache_.DeleteFile(data_file_name),
  "Could not delete dead container data file " + 
data_file_name);
  WARN_NOT_OK_LBM_DISK_FAILURE(file_cache_.DeleteFile(metadata_file_name),
  "Could not delete dead container metadata file " + 
metadata_file_name);
}
if (!names.empty()) {
  WARN_NOT_OK_LBM_DISK_FAILURE(env_->SyncDir(dir->dir()), "Could not sync 
data directory");
  LOG(INFO) << Substitute("Deleted $0 dead containers ($1 metadata bytes)",
  dead_containers.size(), deleted_metadata_bytes);
}
  }

You could call it here on L1979 and it could replace the code on L2553-2586. 
Make sure to preserve the useful comments though.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1980
PS4, Line 1980: string data_file_name = StrCat(name, 
kContainerDataFileSuffix);
> done.
Looks like you missed this.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1983
PS4, Line 1983: file_cache_.Invalidate(metadata_file_name);
  : WARN_NOT_OK_LBM_DISK_FAILURE(env_->DeleteFi
> The function 'FileCache::DeleteFile' does not delete file directl
That's true, but the call to RemoveFullContainersUnlocked on L1975 will have 
deleted the LogBlockContainer instances from memory, which means the associated 
descriptors from the FileCache will have been destroyed. So if we were to call 
FileCache::DeleteFile here to delete the container's files, I think we'll end 
up in the "There is no outstanding descriptor" case, evict the fd from the 
cache, and delete the file. Does that sound right to you? It's been a while 
since I've looked at the FileCache code so maybe I missed something.

Anyway, if you agree, we don't need the Invalidate calls and 
file_cache_.DeleteFile should be safe.


http://gerrit.cloudera.org:8080/#/c/12075/5/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/5/src/kudu/fs/log_block_manager.cc@1237
PS5, Line 1237:   // Due to the shared ownership of 
LogBlockDeletionTransaction, the
  :   // destructor is responsible for destroying all registered 
blocks.
  :   // This includes:
  :   // 1. Punching holes in deleted blocks, and
  :   // 2. Deleting dead containers outright.
There's something wrong here with the formatting; looks like there's an extra 
empty line for each line of text.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 5
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 18 Dec 2018 22:02:49 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-18 Thread helifu (Code Review)
Hello Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12075

to look at the new patch set (#5).

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full containers which are dead
before hole punching. It can help to ease disk pressure,
save tserver startup time and etc. It uses the container
size and live blocks to determine whether the container
is dead or live.

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 180 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/5
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 5
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-18 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12075/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12075/4//COMMIT_MSG@10
PS4, Line 10: presure
> pressure
done


http://gerrit.cloudera.org:8080/#/c/12075/4//COMMIT_MSG@12
PS4, Line 12: it
> Nit: "the container"
done.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 4
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 18 Dec 2018 11:02:55 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-18 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager-test.cc@1771
PS4, Line 1771:   CheckMetrics({0, 0, 0, 0, 0, 0, 0});
> I think you need to wrap calls to CheckMetrics in NO_FATALS also.
done.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 4
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 18 Dec 2018 10:59:04 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-17 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1237
PS4, Line 1237:   // Given the shared ownership of LogBlockDeletionTransaction, 
at this point
  :   // all registered blocks should be destructed. Thus, 
1)coalescing deletions
  :   // for blocks that are contiguous in a container and 
schedules hole punching;
  :   // 2)schedules containers removal by data dir.
> How about:
done.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1288
PS4, Line 1288: DataDir* dir = entry.first;
  : vector names(entry.second.begin(), 
entry.second.end());
> Nit: you can inline this into the Bind() call, the way it was done on L1280
done.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1979
PS4, Line 1979:   // Invalidate the cache and remove from the disk.
> Could you consolidate this with the equivalent code in Repair()?
i have no idea :(


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1980
PS4, Line 1980:   for (auto& name : names) {
> 'name' can be a cref.
done.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1983
PS4, Line 1983: file_cache_.Invalidate(data_file_name);
  : file_cache_.Invalidate(metadata_file_name);
> Why do we need to call Invalidate in this code path? We're not renaming any
The function 'FileCache::DeleteFile' does not delete file directly, 
because there is a case to mark the 'flags_' of class BaseDescriptor to be 
deleted only. Then the 'descriptor_expiry_thread_' of class FileCache erases 
the descriptor and the file is deleted in the destructor of class 
BaseDescriptor at last. The default of 'file_cache_expiry_period_ms' is 60 
seconds. Thus the 'env_->SyncDir()' later will be meaningless.

So, i think it is better to use 'file_cache_.Invalidate'+'env_->DeleteFile'.
What do you think?


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1985
PS4, Line 1985: 
WARN_NOT_OK_LBM_DISK_FAILURE(env_->DeleteFile(data_file_name),
  : "Could not delete dead container data file " + 
data_file_name);
  : 
WARN_NOT_OK_LBM_DISK_FAILURE(env_->DeleteFile(metadata_file_name),
  : "Could not delete dead container metadata file " + 
metadata_file_name);
> Shouldn't we delete through the file cache? I.e. file_cache_.DeleteFile?
the answer is listed above.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 4
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 18 Dec 2018 01:27:01 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-17 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/12075/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12075/4//COMMIT_MSG@10
PS4, Line 10: presure
pressure


http://gerrit.cloudera.org:8080/#/c/12075/4//COMMIT_MSG@12
PS4, Line 12: it
Nit: "the container"


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager-test.cc@1771
PS4, Line 1771:   CheckMetrics({0, 0, 0, 0, 0, 0, 0});
I think you need to wrap calls to CheckMetrics in NO_FATALS also.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1237
PS4, Line 1237:   // Given the shared ownership of LogBlockDeletionTransaction, 
at this point
  :   // all registered blocks should be destructed. Thus, 
1)coalescing deletions
  :   // for blocks that are contiguous in a container and 
schedules hole punching;
  :   // 2)schedules containers removal by data dir.
How about:

  // Due to the shared ownership of LogBlockDeletionTransaction, the destructor 
is
  // responsible for destroying all registered blocks. This includes:
  // 1. Punching holes in deleted blocks, and
  // 2. Deleting dead containers outright.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1288
PS4, Line 1288: DataDir* dir = entry.first;
  : vector names(entry.second.begin(), 
entry.second.end());
Nit: you can inline this into the Bind() call, the way it was done on L1280.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1979
PS4, Line 1979:   // Invalidate the cache and remove from the disk.
Could you consolidate this with the equivalent code in Repair()?


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1980
PS4, Line 1980:   for (auto& name : names) {
'name' can be a cref.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1983
PS4, Line 1983: file_cache_.Invalidate(data_file_name);
  : file_cache_.Invalidate(metadata_file_name);
Why do we need to call Invalidate in this code path? We're not renaming 
anything.


http://gerrit.cloudera.org:8080/#/c/12075/4/src/kudu/fs/log_block_manager.cc@1985
PS4, Line 1985: 
WARN_NOT_OK_LBM_DISK_FAILURE(env_->DeleteFile(data_file_name),
  : "Could not delete dead container data file " + 
data_file_name);
  : 
WARN_NOT_OK_LBM_DISK_FAILURE(env_->DeleteFile(metadata_file_name),
  : "Could not delete dead container metadata file " + 
metadata_file_name);
Shouldn't we delete through the file cache? I.e. file_cache_.DeleteFile?



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 4
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Mon, 17 Dec 2018 19:21:37 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-17 Thread helifu (Code Review)
Hello Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12075

to look at the new patch set (#4).

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full containers which are dead
before hole punching. It can help to ease disk presure,
save tserver startup time and etc. It uses the container
size and live blocks to determine whether it is dead or live.

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 180 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/4
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 4
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-17 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 3:

(11 comments)

thanks for your detailed comments.

http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager-test.cc@1754
PS3, Line 1754: TestDeleteDeadContainersAtPunchHole
> Probably "TestDeleteDeadContainersByDeletionTransaction" is more appropriat
done.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager-test.cc@1796
PS3, Line 1796:   }
> Can you add a variant of this test that opens the block for reading before
done.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager-test.cc@1798
PS3, Line 1798: removing
> Nit: removal
done.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@141
PS3, Line 141:   "Number of Dead Block Containers",
> "Number of Dead Block Containers Deleted"
done.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@143
PS3, Line 143:   "Number of dead block containers that were 
deleted");
> "Number of full (but dead) block containers that were deleted"
done.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1237
PS3, Line 1237:   // Given the shared ownership of LogBlockDeletionTransaction, 
at this point
  :   // all registered blocks should be destructed. Thus, 
coalescing deletions
  :   // for blocks that are contiguous in a container and 
schedules hole punching.
> Can you update this comment to reflect the changes to the LogBlockDeletionT
done.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1269
PS3, Line 1269:   std::unordered_map> 
containers_by_dir;
> Don't need std:: prefixes for the new code; we're "using ..." that stuff at
done.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1273
PS3, Line 1273: // Filter out the containers that are full and dead.
> FWIW, I think it'd be cleaner to identify full and dead containers in Commi
can not agree any more ;)


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1946
PS3, Line 1946: void LogBlockManager::RemoveFullContainerUnlocked(const string& 
container_name) {
> Can you convert this into RemoveFullContainersUnlocked? Looks like both cal
done.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1968
PS3, Line 1968:   // Invalidate the cache and remove from the disk.
> Please consolidate this with the equivalent code in Repair(). In particular
done.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1976
PS3, Line 1976: if (metrics()) {
  :   metrics()->dead_containers_deleted->Increment();
  : }
> You can pull this out of the loop and increment just once by names.size().
done.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 3
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Mon, 17 Dec 2018 12:20:53 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-15 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 3:

(11 comments)

Looks good! Hao did some recent work in the log block manager; Hao, could you 
also take a look?

http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager-test.cc
File src/kudu/fs/log_block_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager-test.cc@1754
PS3, Line 1754: TestDeleteDeadContainersAtPunchHole
Probably "TestDeleteDeadContainersByDeletionTransaction" is more appropriate 
now.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager-test.cc@1796
PS3, Line 1796:   }
Can you add a variant of this test that opens the block for reading before 
deleting it, tests that after the BlockDeletionTransaction goes out of scope, 
the block (and its container) is still alive, and only when the opened block 
goes out of scope is the block (and the container) deleted?

You should be able to reuse pretty much all of this code with some 
parameterization.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager-test.cc@1798
PS3, Line 1798: removing
Nit: removal


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@141
PS3, Line 141:   "Number of Dead Block Containers",
"Number of Dead Block Containers Deleted"


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@143
PS3, Line 143:   "Number of dead block containers that were 
deleted");
"Number of full (but dead) block containers that were deleted"


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1237
PS3, Line 1237:   // Given the shared ownership of LogBlockDeletionTransaction, 
at this point
  :   // all registered blocks should be destructed. Thus, 
coalescing deletions
  :   // for blocks that are contiguous in a container and 
schedules hole punching.
Can you update this comment to reflect the changes to the 
LogBlockDeletionTransaction destructor?


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1269
PS3, Line 1269:   std::unordered_map> 
containers_by_dir;
Don't need std:: prefixes for the new code; we're "using ..." that stuff at the 
top of the file (and if we're not for a particular class, add a "using::" 
definition for it).


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1273
PS3, Line 1273: // Filter out the containers that are full and dead.
FWIW, I think it'd be cleaner to identify full and dead containers in 
CommitDeletedBlocks (after the call to RemoveLogBlocks). Then, don't even call 
AddBlock() on them; just throw them into a new std::unordered_map of container 
names keyed by data dirs. Finally, in the destructor, you process the 
deleted_interval_map_ as before, then process the new unordered_map separately.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1946
PS3, Line 1946: void LogBlockManager::RemoveFullContainerUnlocked(const string& 
container_name) {
Can you convert this into RemoveFullContainersUnlocked? Looks like both callers 
would benefit from that right now: it'd let you do one metrics decrement 
instead of a per-container decrement.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1968
PS3, Line 1968:   // Invalidate the cache and remove from the disk.
Please consolidate this with the equivalent code in Repair(). In particular:
- We should WARN when a deletion fails.
- We should trigger the disk failure handling path when a deletion fails.


http://gerrit.cloudera.org:8080/#/c/12075/3/src/kudu/fs/log_block_manager.cc@1976
PS3, Line 1976: if (metrics()) {
  :   metrics()->dead_containers_deleted->Increment();
  : }
You can pull this out of the loop and increment just once by names.size().



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 3
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Sat, 15 Dec 2018 20:49:25 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-13 Thread helifu (Code Review)
Hello Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12075

to look at the new patch set (#3).

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full container which is dead
after hole punching. It can help to save startup time.
It uses the file's physical size to determine whether
the container is dead or live.

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 145 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/3
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 3
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-13 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc@1179
PS1, Line 1179:
  : ///
  : // LogBlockCreationTransaction
  : 
> I did miss the question :(
It's certainly possible to automatically delete one of a given file pair 
(xxx.data and xxx.metadata) if the other of the pair is missing. I'm not sure 
that this wise, though, given that we don't know at startup _why_ the files are 
missing. If a rogue admin accidentally deleted all of the .metadata files, do 
you want Kudu to delete the .data files as a result? Seems like failing to 
start and forcing the admin to investigate is the safer option.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 2
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Fri, 14 Dec 2018 01:09:59 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-13 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc@1179
PS1, Line 1179:
  : ///
  : // LogBlockCreationTransaction
  : 
> Point 2 is spelled out in more detail in Repair(). I'll copy the relevant c
I did miss the question :(
And, i have another idea. Is it possible to check the integrity of file 
pairs(xxx.data and xxx.metadata) at startup after a crash? In other words, the 
LBM needs to support the corruption, for example, delete the incomplete file 
pairs directly.What do you think?



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 2
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Fri, 14 Dec 2018 00:57:48 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-12 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc@1179
PS1, Line 1179:
  : ///
  : // LogBlockCreationTransaction
  : 
> Agree with point 1 and 3.
Point 2 is spelled out in more detail in Repair(). I'll copy the relevant 
comment:

  // Delete all dead containers.
  //
  // After the deletions, the data directory is sync'ed to reduce the chance
  // of a data file existing without its corresponding metadata file (or vice
  // versa) in the event of a crash. The block manager would treat such a case
  // as corruption and require manual intervention.
  //
  // TODO(adar) the above is not fool-proof; a crash could manifest in between
  // any pair of deletions. That said, the odds of it happening are incredibly
  // rare, and manual resolution isn't hard (just delete the existing file).

To go into more detail, the concern here is that once we start deleting files, 
a machine crash could lead to any subset of the deleted files coming back to 
life, regardless of the order in which they were deleted. Without a SyncDir() 
call, this vulnerability window is as long as the filesystem's journal commit 
period. On ext4 that's given by the 'commit' mount option, which defaults to 5s.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 2
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 13 Dec 2018 05:24:21 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-12 Thread helifu (Code Review)
Hello Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12075

to look at the new patch set (#2).

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full container which is dead
after hole punching. It can help to save startup time.
It uses the file's physical size to determine whether
the container is dead or live.

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
3 files changed, 136 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/2
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 2
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-12 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 1:

(3 comments)

Thanks for your comments.@adar

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc@1179
PS1, Line 1179:   // Though the result of deleting a data block is not the same 
as
  :   // before when there are empty data blocks in the same 
container
  :   // which is FULL, especially in the unit test, this design is 
still
  :   // acceptable.
> I don't agree with this assessment. This approach suffers from the followin
Agree with point 1 and 3.
But to point 2, i have a question, is it necessary to call SyncDir() after 
deleting the container? In my opinion it's ok that let the file system flush 
dirty data to disk.Then, the 'batch' containers deletion is unnecessary.


http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/util/env.h
File src/kudu/util/env.h:

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/util/env.h@648
PS1, Line 648:   virtual Status SizeOnDisk(uint64_t* size) const = 0;
> Could you add a brief unit test to env-test to cover this new method?
rollback.


http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/util/env_posix.cc@1299
PS1, Line 1299:   // From stat(2):
  :   //
  :   //   The st_blocks field indicates the number of blocks 
allocated to
  :   //   the file, 512-byte units. (This may be smaller than 
st_size/512
  :   //   when the file has holes.)
  :   *size = sbuf.st_blocks * 512;
> Can you reuse this block of code for RWFile::SizeOnDisk, using a decomposed
rollback.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 1
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Thu, 13 Dec 2018 03:40:11 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-12 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12075 )

Change subject: KUDU-2636: LBM supports deleting dead and full containers
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc
File src/kudu/fs/log_block_manager.cc:

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc@1179
PS1, Line 1179:   // Though the result of deleting a data block is not the same 
as
  :   // before when there are empty data blocks in the same 
container
  :   // which is FULL, especially in the unit test, this design is 
still
  :   // acceptable.
I don't agree with this assessment. This approach suffers from the following 
problems:
1. It goes to disk unnecessarily to figure out if a container is dead. If we 
can safely call full(), why can't we call live_blocks() and compare it to 0?
2. By chaining to ContainerDeletionAsync, we can't "batch" container deletion. 
LogBlockDeletionTransaction allows for the client to specify a group of blocks 
to be deleted together (i.e. deleting an entire tablet). The LBM uses this to 
optimize deletion where it can. In the case of deleting containers that are no 
longer necessary, we can delete all of the container files, then do one 
SyncDir() call per parent directories, instead of a SyncDir() per container.
3. Furthermore, if a LogBlockDeletionTransaction is going to delete all of the 
blocks in a container, it'd be cheaper to recognize that and, rather than 
punching any holes, delete the container files outright.


http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/fs/log_block_manager.cc@1966
PS1, Line 1966: void LogBlockManager::RemoveDeadFullContainer(const 
std::string& container_name) {
Would be good to reuse/consolidate the file deletion code here with the 
equivalent in Repair().


http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/util/env.h
File src/kudu/util/env.h:

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/util/env.h@648
PS1, Line 648:   virtual Status SizeOnDisk(uint64_t* size) const = 0;
Could you add a brief unit test to env-test to cover this new method?


http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

http://gerrit.cloudera.org:8080/#/c/12075/1/src/kudu/util/env_posix.cc@1299
PS1, Line 1299:   // From stat(2):
  :   //
  :   //   The st_blocks field indicates the number of blocks 
allocated to
  :   //   the file, 512-byte units. (This may be smaller than 
st_size/512
  :   //   when the file has holes.)
  :   *size = sbuf.st_blocks * 512;
Can you reuse this block of code for RWFile::SizeOnDisk, using a decomposed 
static helper? The motivation here is understandability: it'd be good for 
readers to see that comment.



--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 1
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Wed, 12 Dec 2018 19:09:53 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2636: LBM supports deleting dead and full containers

2018-12-12 Thread helifu (Code Review)
helifu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12075


Change subject: KUDU-2636: LBM supports deleting dead and full containers
..

KUDU-2636: LBM supports deleting dead and full containers

LBM supports deleting the full container which is dead
after hole punching. It can help to save startup time.
It uses the file's physical size to determine whether
the container is dead or live.

Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
---
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/util/env.h
M src/kudu/util/env_posix.cc
M src/kudu/util/file_cache.cc
6 files changed, 170 insertions(+), 24 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12075/1
--
To view, visit http://gerrit.cloudera.org:8080/12075
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7d3672ae3c3dd9acd489120d653c44a641537f10
Gerrit-Change-Number: 12075
Gerrit-PatchSet: 1
Gerrit-Owner: helifu