[jira] [Updated] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup
[ https://issues.apache.org/jira/browse/KAFKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-3359: - Issue Type: Improvement (was: Bug) > Parallel log-recovery of un-flushed segments on startup > --- > > Key: KAFKA-3359 > URL: https://issues.apache.org/jira/browse/KAFKA-3359 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.8.2.2, 0.9.0.1 >Reporter: Vamsi Subhash Achanta >Assignee: Jay Kreps >Priority: Minor > > On startup, currently the log segments within a logDir are loaded > sequentially when there is a un-clean shutdown. This will take a lot of time > for the segments to be loaded as the logSegment.recover(..) is called for > every segment and for brokers which have many partitions, the time taken will > be very high (we have noticed ~40mins for 2k partitions). > https://github.com/apache/kafka/pull/1035 > This pull request will make the log-segment load parallel with two > configurable properties "log.recovery.threads" and > "log.recovery.max.interval.ms". > Logic: > 1. Have a threadpool defined of fixed length (log.recovery.threads) > 2. Submit the logSegment recovery as a job to the threadpool and add the > future returned to a job list > 3. Wait till all the jobs are done within req. time > (log.recovery.max.interval.ms - default set to Long.Max). > 4. If they are done and the futures are all null (meaning that the jobs are > successfully completed), it is considered done. > 5. If any of the recovery jobs failed, then it is logged and > LogRecoveryFailedException is thrown > 6. If the timeout is reached, LogRecoveryFailedException is thrown. > The logic is backward compatible with the current sequential implementation > as the default thread count is set to 1. > PS: I am new to Scala and the code might look Java-ish but I will be happy to > modify the code review changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup
[ https://issues.apache.org/jira/browse/KAFKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-3359: - Reviewer: Grant Henke Fix Version/s: 0.10.0.0 Status: Patch Available (was: Open) https://github.com/apache/kafka/pull/1035 > Parallel log-recovery of un-flushed segments on startup > --- > > Key: KAFKA-3359 > URL: https://issues.apache.org/jira/browse/KAFKA-3359 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.9.0.1, 0.8.2.2 >Reporter: Vamsi Subhash Achanta >Assignee: Jay Kreps >Priority: Minor > Fix For: 0.10.0.0 > > > On startup, currently the log segments within a logDir are loaded > sequentially when there is a un-clean shutdown. This will take a lot of time > for the segments to be loaded as the logSegment.recover(..) is called for > every segment and for brokers which have many partitions, the time taken will > be very high (we have noticed ~40mins for 2k partitions). > https://github.com/apache/kafka/pull/1035 > This pull request will make the log-segment load parallel with two > configurable properties "log.recovery.threads" and > "log.recovery.max.interval.ms". > Logic: > 1. Have a threadpool defined of fixed length (log.recovery.threads) > 2. Submit the logSegment recovery as a job to the threadpool and add the > future returned to a job list > 3. Wait till all the jobs are done within req. time > (log.recovery.max.interval.ms - default set to Long.Max). > 4. If they are done and the futures are all null (meaning that the jobs are > successfully completed), it is considered done. > 5. If any of the recovery jobs failed, then it is logged and > LogRecoveryFailedException is thrown > 6. If the timeout is reached, LogRecoveryFailedException is thrown. > The logic is backward compatible with the current sequential implementation > as the default thread count is set to 1. > PS: I am new to Scala and the code might look Java-ish but I will be happy to > modify the code review changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup
[ https://issues.apache.org/jira/browse/KAFKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vamsi Subhash Achanta updated KAFKA-3359: - Priority: Major (was: Minor) > Parallel log-recovery of un-flushed segments on startup > --- > > Key: KAFKA-3359 > URL: https://issues.apache.org/jira/browse/KAFKA-3359 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.8.2.2, 0.9.0.1 >Reporter: Vamsi Subhash Achanta >Assignee: Jay Kreps > Fix For: 0.10.0.0 > > > On startup, currently the log segments within a logDir are loaded > sequentially when there is a un-clean shutdown. This will take a lot of time > for the segments to be loaded as the logSegment.recover(..) is called for > every segment and for brokers which have many partitions, the time taken will > be very high (we have noticed ~40mins for 2k partitions). > https://github.com/apache/kafka/pull/1035 > This pull request will make the log-segment load parallel with two > configurable properties "log.recovery.threads" and > "log.recovery.max.interval.ms". > Logic: > 1. Have a threadpool defined of fixed length (log.recovery.threads) > 2. Submit the logSegment recovery as a job to the threadpool and add the > future returned to a job list > 3. Wait till all the jobs are done within req. time > (log.recovery.max.interval.ms - default set to Long.Max). > 4. If they are done and the futures are all null (meaning that the jobs are > successfully completed), it is considered done. > 5. If any of the recovery jobs failed, then it is logged and > LogRecoveryFailedException is thrown > 6. If the timeout is reached, LogRecoveryFailedException is thrown. > The logic is backward compatible with the current sequential implementation > as the default thread count is set to 1. > PS: I am new to Scala and the code might look Java-ish but I will be happy to > modify the code review changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3359) Parallel log-recovery of un-flushed segments on startup
[ https://issues.apache.org/jira/browse/KAFKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson updated KAFKA-3359: --- Fix Version/s: (was: 0.10.1.0) 0.10.2.0 > Parallel log-recovery of un-flushed segments on startup > --- > > Key: KAFKA-3359 > URL: https://issues.apache.org/jira/browse/KAFKA-3359 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.8.2.2, 0.9.0.1 >Reporter: Vamsi Subhash Achanta >Assignee: Jay Kreps > Fix For: 0.10.2.0 > > > On startup, currently the log segments within a logDir are loaded > sequentially when there is a un-clean shutdown. This will take a lot of time > for the segments to be loaded as the logSegment.recover(..) is called for > every segment and for brokers which have many partitions, the time taken will > be very high (we have noticed ~40mins for 2k partitions). > https://github.com/apache/kafka/pull/1035 > This pull request will make the log-segment load parallel with two > configurable properties "log.recovery.threads" and > "log.recovery.max.interval.ms". > Logic: > 1. Have a threadpool defined of fixed length (log.recovery.threads) > 2. Submit the logSegment recovery as a job to the threadpool and add the > future returned to a job list > 3. Wait till all the jobs are done within req. time > (log.recovery.max.interval.ms - default set to Long.Max). > 4. If they are done and the futures are all null (meaning that the jobs are > successfully completed), it is considered done. > 5. If any of the recovery jobs failed, then it is logged and > LogRecoveryFailedException is thrown > 6. If the timeout is reached, LogRecoveryFailedException is thrown. > The logic is backward compatible with the current sequential implementation > as the default thread count is set to 1. > PS: I am new to Scala and the code might look Java-ish but I will be happy to > modify the code review changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)