Hi, I would like to make this into 0.0.10.0 so can someone look into this and review?
On Wed, Mar 9, 2016 at 10:29 PM, Achanta Vamsi Subhash < achanta.va...@flipkart.com> wrote: > Hi all, > > https://github.com/apache/kafka/pull/1035 > This pull request will make the log-segment load parallel with two > configurable properties "log.recovery.threads" and " > log.recovery.max.interval.ms". > > On startup, currently the log segments within a logDir are loaded > sequentially when there is a un-clean shutdown. This will take a lot of > time for the segments to be loaded as the logSegment.recover(..) is called > for every segment and for brokers which have many partitions, the time > taken will be very high (we have noticed ~40mins for 2k partitions). > > Logic: > 1. Have a threadpool defined of fixed length (log.recovery.threads) > 2. Submit the logSegment recovery as a job to the threadpool and add the > future returned to a job list > 3. Wait till all the jobs are done within req. time ( > log.recovery.max.interval.ms - default set to Long.Max). > 4. If they are done and the futures are all null (meaning that the jobs > are successfully completed), it is considered done. > 5. If any of the recovery jobs failed, then it is logged and > LogRecoveryFailedException is thrown > 6. If the timeout is reached, LogRecoveryFailedException is thrown. > The logic is backward compatible with the current sequential > implementation as the default thread count is set to 1. > > JIRA link is here: > https://issues.apache.org/jira/browse/KAFKA-3359 > > Please review and give me suggestions. Will make them and contribute. > Thanks. > > > On Wed, Mar 9, 2016 at 7:57 PM, vamsi-subhash <g...@git.apache.org> wrote: > >> GitHub user vamsi-subhash opened a pull request: >> >> https://github.com/apache/kafka/pull/1035 >> >> Parallel log-recovery of un-flushed segments on startup >> >> Did not find any tests for the method. Will be adding them >> >> You can merge this pull request into a Git repository by running: >> >> $ git pull https://github.com/vamsi-subhash/kafka trunk >> >> Alternatively you can review and apply these changes as the patch at: >> >> https://github.com/apache/kafka/pull/1035.patch >> >> To close this pull request, make a commit to your master/trunk branch >> with (at least) the following in the commit message: >> >> This closes #1035 >> >> ---- >> commit ecab815203a2b6396703660d5a2f9d9bb00efcf3 >> Author: Vamsi Subhash Achanta <vamsi...@gmail.com> >> Date: 2016-03-09T14:24:37Z >> >> Made log-recovery parallel >> >> ---- >> >> >> --- >> If your project is set up for it, you can reply to this email and have >> your >> reply appear on GitHub as well. If your project does not have this feature >> enabled and wishes so, or if the feature is enabled but not working, >> please >> contact infrastructure at infrastruct...@apache.org or file a JIRA ticket >> with INFRA. >> --- >> > > > > -- > Regards > Vamsi Subhash > -- Regards Vamsi Subhash