Hi, Igor,
Thanks for this great work, left some questions,

1. We have a version field in meta.properties, currently it’s 1, and we can set 
it to 2 in this KIP, and we can give an example of server.properties and it’s 
corresponding meta.properties generated by the storage command tool.

2. When using storage format tool we should specify cluster-id, it will be an 
arduous work if we should manually specify directory-ids for all log dirs, I 
think you can make it more clear about this change that the directory-ids are 
generated automatically.

3. When controller place a replica, it will select a broker as well as a log 
directory, the latter is currently accomplished in the broker side, so this 
will be a big change?

4. When handling log directory failures we will update Leader and ISR using the 
existing replica state machine, what is this state machine referring to, do you 
mean the AlterPartition RPC?

--
Best,
Ziming


> On Aug 9, 2022, at 12:10 AM, Jason Gustafson <ja...@confluent.io.INVALID> 
> wrote:
> 
> Hi Igor,
> 
> Thanks for the KIP. It looks like it's on a good track. I have a few
> suggestions to throw into the mix:
> 
> 1. (nit): Instead of "storage id," maybe we could call it "directory id"?
> It seems a little clear since each log dir gets a unique id.
> 2. Rather than introducing a new RPC to communicate offline directories,
> would it be reasonable to add it to BrokerHeartbeat? My thinking is that we
> can let broker registration include the complete list of log dirs. The
> heartbeat can then be used to communicate online/offline status of each log
> dir.
> 3. I think we have an opportunity to consolidate normal partition
> reassignment and log dir reassignment here. Since we are modifying the
> `Assignment` to consist of both replica id and storage id, couldn't we make
> a similar change to the `AlterPartitionReassignment` API? Basically that
> would let us treat all reassignments as moving from one log dir to another.
> The case handled by `AlterReplicaLogDir` currently would just become a
> special case where the replica id happens to be the same. This might also
> let us get rid of all the custom logic surrounding JBOD, such as the
> additional fetcher, which has proven difficult to maintain.
> 
> Thanks,
> Jason
> 
> On Tue, Aug 2, 2022 at 2:24 AM Igor Soarez <i...@soarez.me> wrote:
> 
>> Hi José,
>> 
>> Thanks for having a look at this KIP and thanks for pointing this out,
>> I've had a look at KIP-856.
>> 
>> It's good to see there's some overlap in our proposals. we're both
>> proposing:
>> 
>> - Identifying log directories with a UUID
>> - Extending the storage tool to ensure each log directory has a UUID
>> assigned
>> - Expanding the topic partition identity to include the log directory UUID
>> 
>> There were differences in our proposals as to how the UUID is to be
>> persisted, but I've changed my proposal to match yours — I think adding
>> storage.id to meta.properties makes sense.
>> 
>> --
>> Igor
>> 
>> 
>> On Wed, Jul 27, 2022, at 4:42 PM, José Armando García Sancio wrote:
>>> Hi Igor,
>>> 
>>> Thanks for the KIP. Looking forward to this improvement. I'll review
>> your KIP.
>>> 
>>> I should mention that I started a discussion thread on KIP-856: KRaft
>>> Disk Failure Recovery at
>>> https://lists.apache.org/thread/ytv0t18cplwwwqcp77h6vry7on378jzj
>>> 
>>> Both keep introducing similar concepts. For example both KIP introduce
>>> a storage uuid that is stored in meta.properties. At first glance
>>> there are some minor differences. I suggest that we review each
>>> other's KIP so that we can remove these minor differences. What do you
>>> think?
>>> 
>>> Thanks!
>>> --
>>> -José
>> 

Reply via email to