[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-03-13 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: HBASE-5270-92v11.patch

patchv11 for 0.92

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.2
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, HBASE-5270-92v11.patch, HBASE-5270v11.patch, 
> hbase-5270.patch, hbase-5270v10.patch, hbase-5270v2.patch, 
> hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, 
> hbase-5270v7.patch, hbase-5270v8.patch, hbase-5270v9.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-03-12 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5270:
-

Status: Open  (was: Patch Available)

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.2
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, HBASE-5270v11.patch, hbase-5270.patch, 
> hbase-5270v10.patch, hbase-5270v2.patch, hbase-5270v4.patch, 
> hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, 
> hbase-5270v8.patch, hbase-5270v9.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-03-12 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: HBASE-5270v11.patch

minor items addressed in patchv11
https://reviews.apache.org/r/4021/

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.2
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, HBASE-5270v11.patch, hbase-5270.patch, 
> hbase-5270v10.patch, hbase-5270v2.patch, hbase-5270v4.patch, 
> hbase-5270v5.patch, hbase-5270v6.patch, hbase-5270v7.patch, 
> hbase-5270v8.patch, hbase-5270v9.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-03-05 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5270:
--

Status: Patch Available  (was: Open)

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.2
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v10.patch, 
> hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, 
> hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, 
> hbase-5270v9.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-03-04 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: hbase-5270v10.patch

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.2
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v10.patch, 
> hbase-5270v2.patch, hbase-5270v4.patch, hbase-5270v5.patch, 
> hbase-5270v6.patch, hbase-5270v7.patch, hbase-5270v8.patch, 
> hbase-5270v9.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-03-01 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: hbase-5270v9.patch

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.1, 0.94.0
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, 
> hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, 
> hbase-5270v7.patch, hbase-5270v8.patch, hbase-5270v9.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-27 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: hbase-5270v8.patch

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.1, 0.94.0
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, 
> hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, 
> hbase-5270v7.patch, hbase-5270v8.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-27 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: hbase-5270v7.patch

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.1, 0.94.0
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, 
> hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, 
> hbase-5270v7.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-23 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: hbase-5270v6.patch

I‘m sorry for the mistake of ConcurrentHashSet.
Thank Ted.

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.1, 0.94.0
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, 
> hbase-5270v4.patch, hbase-5270v5.patch, hbase-5270v6.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-23 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: hbase-5270v5.patch

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.1, 0.94.0
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, 
> hbase-5270v4.patch, hbase-5270v5.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-22 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5270:
--

Comment: was deleted

(was: @Chunhui:
Since review board isn't used, do you mind highlighting the new changes in 
hbase-5270v4.patch ?

Thanks)

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
>Assignee: chunhui shen
> Fix For: 0.92.1, 0.94.0
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, 
> hbase-5270v4.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-22 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: hbase-5270v4.patch

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
> Fix For: 0.92.1, 0.94.0
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, 
> hbase-5270v4.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-21 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: 5270-90v3.patch

Takes Stack‘s comment in 5270-90v3.patch

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
> Fix For: 0.94.0, 0.92.1
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-90v3.patch, 5270-testcase.patch, 
> 5270-testcasev2.patch, hbase-5270.patch, hbase-5270v2.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-16 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: 5270-90v2.patch
5270-90-testcasev2.patch

Testcase and patch for 90 version

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
> Fix For: 0.94.0, 0.92.1
>
> Attachments: 5270-90-testcase.patch, 5270-90-testcasev2.patch, 
> 5270-90.patch, 5270-90v2.patch, 5270-testcase.patch, 5270-testcasev2.patch, 
> hbase-5270.patch, hbase-5270v2.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-16 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: 5270-testcasev2.patch
hbase-5270v2.patch

Optimize the testcase as Stack's sample.
And hbase-5270v2 is a patch to fix the issue for trunk including testcase.

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
> Fix For: 0.94.0, 0.92.1
>
> Attachments: 5270-90-testcase.patch, 5270-90.patch, 
> 5270-testcase.patch, 5270-testcasev2.patch, hbase-5270.patch, 
> hbase-5270v2.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-15 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5270:
-

Attachment: sampletest.txt

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
> Fix For: 0.94.0, 0.92.1
>
> Attachments: 5270-90-testcase.patch, 5270-90.patch, 
> 5270-testcase.patch, hbase-5270.patch, sampletest.txt
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-15 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5270:
--

Attachment: (was: mapreduce-3583-trunk-v2.txt)

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
> Fix For: 0.94.0, 0.92.1
>
> Attachments: 5270-90-testcase.patch, 5270-90.patch, 
> 5270-testcase.patch, hbase-5270.patch
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-15 Thread Zhihong Yu (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5270:
--

Attachment: mapreduce-3583-trunk-v2.txt

Reattaching patch v2 for TRUNK with --no-prefix

> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
> Fix For: 0.94.0, 0.92.1
>
> Attachments: 5270-90-testcase.patch, 5270-90.patch, 
> 5270-testcase.patch, hbase-5270.patch
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-02-15 Thread chunhui shen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chunhui shen updated HBASE-5270:


Attachment: 5270-90-testcase.patch
5270-90.patch
5270-testcase.patch
hbase-5270.patch

I have written a testcase for this issue to show the exist problem in 
5270-testcase.patch

And hbase-5270.patch is the combination of testcase and HBASE-5179 ‘s latest 
patch。

5270-90.patch and 5270-90-testcase.patch are for 90 version

Now, testcase may not contain all the situations which causes problems and runs 
slowly.

I will optimize the testcase later.




> Handle potential data loss due to concurrent processing of processFaileOver 
> and ServerShutdownHandler
> -
>
> Key: HBASE-5270
> URL: https://issues.apache.org/jira/browse/HBASE-5270
> Project: HBase
>  Issue Type: Sub-task
>  Components: master
>Reporter: Zhihong Yu
> Fix For: 0.94.0, 0.92.1
>
> Attachments: 5270-90-testcase.patch, 5270-90.patch, 
> 5270-testcase.patch, hbase-5270.patch
>
>
> This JIRA continues the effort from HBASE-5179. Starting with Stack's 
> comments about patches for 0.92 and TRUNK:
> Reviewing 0.92v17
> isDeadServerInProgress is a new public method in ServerManager but it does 
> not seem to be used anywhere.
> Does isDeadRootServerInProgress need to be public? Ditto for meta version.
> This method param names are not right 'definitiveRootServer'; what is meant 
> by definitive? Do they need this qualifier?
> Is there anything in place to stop us expiring a server twice if its carrying 
> root and meta?
> What is difference between asking assignment manager isCarryingRoot and this 
> variable that is passed in? Should be doc'd at least. Ditto for meta.
> I think I've asked for this a few times - onlineServers needs to be 
> explained... either in javadoc or in comment. This is the param passed into 
> joinCluster. How does it arise? I think I know but am unsure. God love the 
> poor noob that comes awandering this code trying to make sense of it all.
> It looks like we get the list by trawling zk for regionserver znodes that 
> have not checked in. Don't we do this operation earlier in master setup? Are 
> we doing it again here?
> Though distributed split log is configured, we will do in master single 
> process splitting under some conditions with this patch. Its not explained in 
> code why we would do this. Why do we think master log splitting 'high 
> priority' when it could very well be slower. Should we only go this route if 
> distributed splitting is not going on. Do we know if concurrent distributed 
> log splitting and master splitting works?
> Why would we have dead servers in progress here in master startup? Because a 
> servershutdownhandler fired?
> This patch is different to the patch for 0.90. Should go into trunk first 
> with tests, then 0.92. Should it be in this issue? This issue is really hard 
> to follow now. Maybe this issue is for 0.90.x and new issue for more work on 
> this trunk patch?
> This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira