Hi Tom, I said this in the voting thread, but can the authors include a section > about new ACLs if there are going to be ACLs for TransactionalId. It's > mentioned in the google doc, but I think new ACLs should be in a KIP > directly.
We've updated the wiki. Can you take a look and let us know if you have additional concerns? Thanks, Jason On Fri, Feb 3, 2017 at 1:52 PM, Rajini Sivaram <rajinisiva...@gmail.com> wrote: > Hi Jason, > > Thank you for the responses. Agree that authorizing transactional.id in > the > producer requests will be good enough for version 1. And making it tighter > in future based on delegation tokens sounds good too. > > Regards, > > Rajini > > > On Fri, Feb 3, 2017 at 8:04 PM, Jason Gustafson <ja...@confluent.io> > wrote: > > > Hey Rajini, > > > > Thanks for the questions. Responses below: > > > > > > > 1. Will the transaction coordinator check topic ACLs based on the > > > requesting client's credentials? Access to transaction logs, topics > being > > > added for transaction etc? > > > > > > Good question. I think it makes sense to check topic Write permission > when > > adding partitions to the transaction. I'll add this to the document. > > Perhaps authorization to the transaction log itself, however, can be > > assumed from having access to the ProducerTransactionalId resource? This > > would be similar to how access to __consumer_offsets is assumed if the > > client has access to the Group resource. > > > > 2. If I create a transactional produce request (by hand, not using the > > > producer API) with a random PID (random, hence unlikely to be in use), > > will > > > the broker append a transactional message to the logs, preventing LSO > > from > > > moving forward? What validation will broker do for PIDs? > > > > > > Yes, that is correct. Validation of the TransactionalId to PID binding > is a > > known gap in the current proposal, and is discussed in the design > document. > > Now that I'm thinking about it a bit more, I think there is a good case > for > > including the TransactionalId in the ProduceRequest (I think Jun > suggested > > this previously). Verifying it does not ensure that the included PID is > > correct, but it does ensure that the client is authorized to use > > transactions. If the client wanted to do an "endless transaction attack," > > having Write access to the topic and an authorized transactionalID is all > > they would need anyway even if we could authorize the PID itself. This > > seems like a worthwhile improvement. > > > > For future work, my half-baked idea to authorize the PID binding is to > > leverage the delegation work in KIP-48. When the PID is generated, we can > > give the producer a token which is then used in produce requests (say an > > hmac covering the TransactionalId and PID). > > > > > > > 3. Will every broker check that a client sending transactional produce > > > requests at least has write access to transaction log topic since it is > > not > > > validating transactional.id (for every produce request)? > > > > 4. I understand that brokers cannot authorize the transactional id for > > each > > > produce request since requests contain only the PID. But since there > is a > > > one-to-one mapping between PID and transactional.id, and a connection > is > > > never expected to change its transactional.id, perhaps it is feasible > to > > > add authorization and cache the results in the Session? Perhaps not for > > > version 1, but feels like it will be good to close the security gap > here. > > > Obviously it would be simpler if transactional.id was in the produce > > > request if the overhead was acceptable. > > > > > > I think my response above addresses both of these. We should include the > > TransactionalId in the ProduceRequest. Of course it need not be included > in > > the message format, so I'm not too concerned about the additional > overhead > > it adds. > > > > Thanks, > > Jason > > > > > > On Fri, Feb 3, 2017 at 6:52 AM, Ismael Juma <ism...@juma.me.uk> wrote: > > > > > Comments inline. > > > > > > On Thu, Feb 2, 2017 at 6:28 PM, Jason Gustafson <ja...@confluent.io> > > > wrote: > > > > > > > Took me a while to remember why we didn't do this. The timestamp that > > is > > > > included at the message set level is the max timestamp of all > messages > > in > > > > the message set as is the case in the current message format (I will > > > update > > > > the document to make this explicit). We could make the message > > timestamps > > > > relative to the max timestamp, but that makes serialization a bit > > awkward > > > > since the timestamps are not assumed to be increasing sequentially or > > > > monotonically. Once the messages in the message set had been > > determined, > > > we > > > > would need to go back and adjust the relative timestamps. > > > > > > > > > > Yes, I thought this would be a bit tricky and hence why I mentioned the > > > option of adding a new field at the message set level for the first > > > timestamp even though that's not ideal either. > > > > > > Here's one idea. We let the timestamps in the messages be varints, but > we > > > > make their values be relative to the timestamp of the previous > message, > > > > with the timestamp of the first message being absolute. For example, > if > > > we > > > > had timestamps 500, 501, 499, then we would write 500 for the first > > > > message, 1 for the next, and -2 for the final message. Would that > work? > > > Let > > > > me think a bit about it and see if there are any problems. > > > > > > > > > > It's an interesting idea. Comparing to the option of having the first > > > timestamp in the message set, It's a little more space efficient as we > > > don't have both a full timestamp in the message set _and_ a varint in > the > > > first message (which would always be 0, so we avoid the extra byte) and > > > also the deltas could be a little smaller in the common case. The main > > > downside is that it introduces a semantics inconsistency between the > > first > > > message and the rest. Not ideal, but maybe we can live with that. > > > > > > Ismael > > > > > >