[jira] [Created] (KAFKA-8571) Not complete delayed produce requests when processing StopReplicaRequest causing high produce latency for acks=all

2019-06-19 Thread Zhanxiang (Patrick) Huang (JIRA)
Zhanxiang (Patrick) Huang created KAFKA-8571:


 Summary: Not complete delayed produce requests when processing 
StopReplicaRequest causing high produce latency for acks=all
 Key: KAFKA-8571
 URL: https://issues.apache.org/jira/browse/KAFKA-8571
 Project: Kafka
  Issue Type: Bug
Reporter: Zhanxiang (Patrick) Huang
Assignee: Zhanxiang (Patrick) Huang


Currently a broker will only attempt to complete delayed requests upon 
highwater mark changes and receiving LeaderAndIsrRequest. When a broker 
receives StopReplicaRequest, it will not try to complete delayed operations 
including delayed produce for acks=all, which can cause the producer to timeout 
even though the producer should have attempted to talk to the new leader faster 
if a NotLeaderForPartition error is sent.

This can happen during partition reassignment when controller is trying to kick 
the previous leader out of the replica set. It this case, controller will only 
send StopReplicaRequest (not LeaderAndIsrRequest) to the previous leader in the 
replica set shrink phase. Here is an example:
{noformat}
During Reassign the replica set of partition A from [B1, B2] to [B2, B3]:
t0: Controller expands the replica set to [B1, B2, B3]

t1: B1 receives produce request PR on partition A with acks=all and timetout T. 
B1 puts PR into the DelayedProducePurgatory with timeout T.

t2: Controller elects B2 as the new leader and shrinks the replica set fo [B2, 
B3]. LeaderAndIsrRequests are sent to B2 and B3. StopReplicaRequest is sent to 
B!.

t3: B1 receives StopReplicaRequest but doesn't try to comeplete PR.

If PR cannot be fullfilled by t3, and t1 + T > t3, PR will eventually time out 
in the purgatory and producer will eventually time out the produce 
request.{noformat}
Since it is possible for the leader to receive only a StopReplicaRequest 
(without receiving any LeaderAndIsrRequest) to leave the replica set, a fix for 
this issue is to also try to complete delay operations in processing 
StopReplicaRequest.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Partition Reassignment in Cloud

2019-06-19 Thread Varun Kumar
Hi

I have been trying a small experiment with partition reassignment in cloud. 
where instead of copying data between brokers using network, I moved the disk 
between the 2 brokers and ran the partition reassignment. This actually 
increased the speed of partition reassignment significantly. (As it had to 
catchup/fetch only down time data)


I tried this experiment this in Kafka 2.2.1 and it worked. I validated the 
data-consistency using "kafka-replica-verification.sh" script.

Few more details of the experiment:

  *   Both the brokers from and to which the partitions are moving had to be 
shutdown.
  *   All the partitions in the disk are moved at once to new broker.
  *   Had to update broker.id property in meta.properties file for the moved 
log directory before broker restart .
  *   Had to re-balance Leaders after brokers restart.

Can you please let me know if this approach will work in production ? Is there 
any scenario where it might truncate/delete all data in moved disk and copy 
complete partition over network ?

Thanks
Varun




Partition Reassignment in Cloud

2019-06-19 Thread Varun Kumar
Hi

I have been trying a small experiment with partition reassignment in cloud. 
where instead of copying data between brokers using network, I moved the disk 
between the 2 brokers and ran the partition reassignment. This actually 
increased the speed of partition reassignment significantly. (As it had to 
catchup/fetch only down time data)


I tried this experiment with Kafka 2.2.1 and it worked. I validated the 
data-consistency using "kafka-replica-verification.sh" script and also by 
comparing md5 hash of the log and index files.

Few more details of the experiment:

  *   Both the brokers from and to which the partitions are moving had to be 
shutdown.
  *   All the partitions in the disk are moved at once to new broker.
  *   Had to update broker.id property in meta.properties file for the moved 
log directory before broker restart .
  *   Had to re-balance Leaders after brokers restart.

Can you please let me know if this approach will work in production ? Is there 
any scenario where it might truncate/delete all log files in moved disk and 
fetch the complete data from the leader partition ?

Thanks
Varun


Jenkins build is back to normal : kafka-2.3-jdk8 #54

2019-06-19 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-8570) Downconversion could fail when log contains out of order message formats

2019-06-19 Thread Dhruvil Shah (JIRA)
Dhruvil Shah created KAFKA-8570:
---

 Summary: Downconversion could fail when log contains out of order 
message formats
 Key: KAFKA-8570
 URL: https://issues.apache.org/jira/browse/KAFKA-8570
 Project: Kafka
  Issue Type: Bug
Reporter: Dhruvil Shah
Assignee: Dhruvil Shah


When the log contains out of order message formats (for example a v2 message 
followed by a v1 message), it is possible for down-conversion to fail in 
certain scenarios where batches compressed and greater than 1kB in size. 
Down-conversion fails with a stack like the following:

java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:275)
at 
org.apache.kafka.common.record.FileLogInputStream$FileChannelRecordBatch.writeTo(FileLogInputStream.java:176)
at 
org.apache.kafka.common.record.AbstractRecords.downConvert(AbstractRecords.java:107)
at org.apache.kafka.common.record.FileRecords.downConvert(FileRecords.java:242)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-2.2-jdk8 #140

2019-06-19 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-8564; Fix NPE on deleted partition dir when no segments remain

--
[...truncated 2.75 MB...]

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionEnabled 
STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionEnabled 
PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElectionPreferredReplicaNotInIsrNotLive 
STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElectionPreferredReplicaNotInIsrNotLive 
PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionDisabled 
STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionDisabled 
PASSED

kafka.controller.PartitionStateMachineTest > 
testNonexistentPartitionToNewPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testNonexistentPartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionErrorCodeFromCreateStates STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionErrorCodeFromCreateStates PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOfflineTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOfflineTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOfflinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOfflinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > testUpdatingOfflinePartitionsCount 
STARTED

kafka.controller.PartitionStateMachineTest > testUpdatingOfflinePartitionsCount 
PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOnlinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOfflinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOfflinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionZkUtilsExceptionFromCreateStates 
STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionZkUtilsExceptionFromCreateStates 
PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNewPartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNewPartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNewPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testUpdatingOfflinePartitionsCountDuringTopicDeletion STARTED

kafka.controller.PartitionStateMachineTest > 
testUpdatingOfflinePartitionsCountDuringTopicDeletion PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionErrorCodeFromStateLookup STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionErrorCodeFromStateLookup PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransitionForControlledShutdown STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransitionForControlledShutdown PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionZkUtilsExceptionFromStateLookup 
STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionZkUtilsExceptionFromStateLookup 
PASSED

kafka.controller.PartitionStateMachineTest > 
testNoOfflinePartitionsChangeForTopicsBeingDeleted STARTED

kafka.controller.PartitionStateMachineTest > 
test

Build failed in Jenkins: kafka-trunk-jdk11 #645

2019-06-19 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-8564; Fix NPE on deleted partition dir when no segments remain

--
[...truncated 2.68 MB...]
org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaBooleanTrue PASSED

org.apache.kafka.connect.transforms.CastTest > castWholeRecordDefaultValue 
STARTED

org.apache.kafka.connect.transforms.CastTest > castWholeRecordDefaultValue 
PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaInt8 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaInt8 PASSED

org.apache.kafka.connect.transforms.CastTest > castWholeRecordKeyWithSchema 
STARTED

org.apache.kafka.connect.transforms.CastTest > castWholeRecordKeyWithSchema 
PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessInt8 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessInt8 PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessBooleanFalse STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessBooleanFalse PASSED

org.apache.kafka.connect.transforms.CastTest > testConfigInvalidSchemaType 
STARTED

org.apache.kafka.connect.transforms.CastTest > testConfigInvalidSchemaType 
PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessString STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessString PASSED

org.apache.kafka.connect.transforms.CastTest > castFieldsWithSchema STARTED

org.apache.kafka.connect.transforms.CastTest > castFieldsWithSchema PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessFieldConversion STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessFieldConversion PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testConfigInvalidTargetType STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testConfigInvalidTargetType PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessStringToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessStringToTimestamp PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > testKey STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > testKey PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessDateToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessDateToTimestamp PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToString STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToString PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimeToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimeToTimestamp PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaTimestampToDate STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaTimestampToDate PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaTimestampToTime STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaTimestampToTime PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaTimestampToUnix STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaTimestampToUnix PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaUnixToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaUnixToTimestamp PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessIdentity STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessIdentity PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaFieldConversion STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaFieldConversion PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaDateToTimestamp STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaDateToTimestamp PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaIdentity STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testWithSchemaIdentity PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToDate STARTED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToDate PASSED

org.apache.kafka.connect.transforms.TimestampConverterTest > 
testSchemalessTimestampToTime ST

Build failed in Jenkins: kafka-trunk-jdk8 #3737

2019-06-19 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-8564; Fix NPE on deleted partition dir when no segments remain

--
[...truncated 2.51 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestampWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
PASSED

org.apache.kafka.stre

Jenkins build is back to normal : kafka-2.1-jdk8 #206

2019-06-19 Thread Apache Jenkins Server
See 




[VOTE] 2.3.0 RC3

2019-06-19 Thread Colin McCabe
Hi all,

We discovered some problems with the second release candidate (RC2) of 2.3.0.  
Specifically, KAFKA-8564.  I've created a new RC which includes the fix for 
this issue.

Check out the release notes for the 2.3.0 release here:
https://home.apache.org/~cmccabe/kafka-2.3.0-rc3/RELEASE_NOTES.html

The vote will go until Saturday, June 22nd, or until we create another RC.

* Kafka's KEYS file containing PGP keys we use to sign the release can be found 
here:
https://kafka.apache.org/KEYS

* The release artifacts to be voted upon (source and binary) are here:
https://home.apache.org/~cmccabe/kafka-2.3.0-rc3/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~cmccabe/kafka-2.3.0-rc3/javadoc/

* The tag to be voted upon (off the 2.3 branch) is the 2.3.0 tag:
https://github.com/apache/kafka/releases/tag/2.3.0-rc3

best,
Colin

C.


Build failed in Jenkins: kafka-trunk-jdk8 #3736

2019-06-19 Thread Apache Jenkins Server
See 


Changes:

[bbejeck] KAFKA-8452: Compressed BufferValue review follow-up (#6940)

--
[...truncated 4.67 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnIsOpen 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnIsOpen 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldDeleteAndReturnPlainValue STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldDeleteAndReturnPlainValue PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnName 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnName 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutAllWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutAllWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldReturnIsPersistent STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldReturnIsPersistent PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutIfAbsentWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutIfAbsentWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldForwardClose 
STARTED

org.apache.kafka.streams.internals.KeyVa

Re: [VOTE] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread Patrik Kleindl
+1 (non-binding)
Thanks!
Best regards 
Patrik 

> Am 19.06.2019 um 21:55 schrieb Bill Bejeck :
> 
> +1 (binding)
> 
> Thanks,
> Bill
> 
>> On Wed, Jun 19, 2019 at 1:19 PM John Roesler  wrote:
>> 
>> I'm +1 (nonbinding)
>> 
>> Thanks!
>> -John
>> 
>>> On Tue, Jun 18, 2019 at 7:48 AM Bruno Cadonna  wrote:
>>> 
>>> Hi,
>>> 
>>> I would like to start the voting on KIP-471:
>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471%3A+Expose+RocksDB+Metrics+in+Kafka+Streams
>>> 
>>> You can find the discussion here:
>>> 
>> https://lists.apache.org/thread.html/125bdd984fe0667962018da6ce10bce6d5895c5103955a8e4c730fef@%3Cdev.kafka.apache.org%3E
>>> 
>>> Best,
>>> Bruno
>> 


[jira] [Resolved] (KAFKA-7857) Add AbstractCoordinatorConfig class to consolidate consumer coordinator configs between Consumer and Connect

2019-06-19 Thread Boyang Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-7857.

Resolution: Fixed

[https://github.com/apache/kafka/pull/6854] fixes it

> Add AbstractCoordinatorConfig class to consolidate consumer coordinator 
> configs between Consumer and Connect
> 
>
> Key: KAFKA-7857
> URL: https://issues.apache.org/jira/browse/KAFKA-7857
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Major
>
> Right now there are a lot of duplicate configuration concerning client 
> coordinator shared across ConsumerConfig and DistributedConfig (connect 
> config). It makes sense to extract all coordinator related configs into a 
> separate config class to reduce code redundancy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-trunk-jdk11 #644

2019-06-19 Thread Apache Jenkins Server
See 


Changes:

[bbejeck] KAFKA-8452: Compressed BufferValue review follow-up (#6940)

--
[...truncated 2.52 MB...]
org.apache.kafka.connect.transforms.MaskFieldTest > schemaless PASSED

org.apache.kafka.connect.transforms.ExtractFieldTest > withSchema STARTED

org.apache.kafka.connect.transforms.ExtractFieldTest > withSchema PASSED

org.apache.kafka.connect.transforms.ExtractFieldTest > testNullWithSchema 
STARTED

org.apache.kafka.connect.transforms.ExtractFieldTest > testNullWithSchema PASSED

org.apache.kafka.connect.transforms.ExtractFieldTest > schemaless STARTED

org.apache.kafka.connect.transforms.ExtractFieldTest > schemaless PASSED

org.apache.kafka.connect.transforms.ExtractFieldTest > testNullSchemaless 
STARTED

org.apache.kafka.connect.transforms.ExtractFieldTest > testNullSchemaless PASSED

org.apache.kafka.connect.transforms.TimestampRouterTest > defaultConfiguration 
STARTED

org.apache.kafka.connect.transforms.TimestampRouterTest > defaultConfiguration 
PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeDateRecordValueWithSchemaString STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeDateRecordValueWithSchemaString PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessBooleanTrue STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessBooleanTrue PASSED

org.apache.kafka.connect.transforms.CastTest > testConfigInvalidTargetType 
STARTED

org.apache.kafka.connect.transforms.CastTest > testConfigInvalidTargetType 
PASSED

org.apache.kafka.connect.transforms.CastTest > 
testConfigMixWholeAndFieldTransformation STARTED

org.apache.kafka.connect.transforms.CastTest > 
testConfigMixWholeAndFieldTransformation PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessFloat32 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessFloat32 PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessFloat64 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessFloat64 PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessUnsupportedType STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessUnsupportedType PASSED

org.apache.kafka.connect.transforms.CastTest > castWholeRecordKeySchemaless 
STARTED

org.apache.kafka.connect.transforms.CastTest > castWholeRecordKeySchemaless 
PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaFloat32 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaFloat32 PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaFloat64 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaFloat64 PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaString STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaString PASSED

org.apache.kafka.connect.transforms.CastTest > testConfigEmpty STARTED

org.apache.kafka.connect.transforms.CastTest > testConfigEmpty PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessInt16 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessInt16 PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessInt32 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessInt32 PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessInt64 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueSchemalessInt64 PASSED

org.apache.kafka.connect.transforms.CastTest > castFieldsSchemaless STARTED

org.apache.kafka.connect.transforms.CastTest > castFieldsSchemaless PASSED

org.apache.kafka.connect.transforms.CastTest > testUnsupportedTargetType STARTED

org.apache.kafka.connect.transforms.CastTest > testUnsupportedTargetType PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaInt16 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaInt16 PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaInt32 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaInt32 PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaInt64 STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeRecordValueWithSchemaInt64 PASSED

org.apache.kafka.connect.transforms.CastTest > 
castWholeBigDecimalRecordValueWithSchemaString STARTED

org.apache.kafka.connect.transforms.CastTest > 
castWholeBigDecimalRecordValueWithSchemaString PASSED

org.apache.kafka.connect.trans

[jira] [Created] (KAFKA-8569) Integrate the poll timeout warning with leave group call

2019-06-19 Thread Boyang Chen (JIRA)
Boyang Chen created KAFKA-8569:
--

 Summary: Integrate the poll timeout warning with leave group call
 Key: KAFKA-8569
 URL: https://issues.apache.org/jira/browse/KAFKA-8569
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen
Assignee: Boyang Chen


Under static membership, we may be polluting our log by seeing a bunch of 
consecutive warning message upon poll timeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8500) member.id should always update upon static member rejoin despite of group state

2019-06-19 Thread Boyang Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boyang Chen resolved KAFKA-8500.

Resolution: Fixed

> member.id should always update upon static member rejoin despite of group 
> state
> ---
>
> Key: KAFKA-8500
> URL: https://issues.apache.org/jira/browse/KAFKA-8500
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, streams
>Affects Versions: 2.3.0
>Reporter: Boyang Chen
>Assignee: Boyang Chen
>Priority: Blocker
>
> A blocking bug was detected by [~guozhang] that the `member.id` wasn't get 
> updated upon static member rejoining when the group is not in stable state. 
> This could make duplicate member fencing harder and potentially yield 
> incorrect processing outputs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread Bill Bejeck
+1 (binding)

Thanks,
Bill

On Wed, Jun 19, 2019 at 1:19 PM John Roesler  wrote:

> I'm +1 (nonbinding)
>
> Thanks!
> -John
>
> On Tue, Jun 18, 2019 at 7:48 AM Bruno Cadonna  wrote:
> >
> > Hi,
> >
> > I would like to start the voting on KIP-471:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471%3A+Expose+RocksDB+Metrics+in+Kafka+Streams
> >
> > You can find the discussion here:
> >
> https://lists.apache.org/thread.html/125bdd984fe0667962018da6ce10bce6d5895c5103955a8e4c730fef@%3Cdev.kafka.apache.org%3E
> >
> > Best,
> > Bruno
>


Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread Bill Bejeck
Hi Bruno,

Just getting caught up on this KIP thread.  Looks good to me, and I don't
have any additional comments to what's already been presented.

Thanks,
Bill

On Wed, Jun 19, 2019 at 1:42 PM Bruno Cadonna  wrote:

> John and Guozhang,
>
> thank you for your comments.
>
> @Guozhang could you please also vote on the voting thread so that we
> have all votes in one place.
>
> @John, the only situation I can think of where a non-uniform
> configuration of segments would make sense is to account for
> seasonality. But this would be a really advanced configuration IMO.
>
> Best,
> Bruno
>
> On Wed, Jun 19, 2019 at 7:18 PM John Roesler  wrote:
> >
> > One last thought. I think it makes sense what you propose for merging
> > the metrics when a logical store is composed of multiple physical
> > stores.
> >
> > The basic standard for these metrics is that they should be relevant
> > to performance, and they should be controllable via configurations,
> > specifically via RocksDBConfigSetter. The config setter takes as input
> > the store name. For Key/Value stores, this name is visible in the tags
> > as "rocksdb-state-id", so the workflow is that I notice (e.g.) the
> > metric rocksdb-state-id=STORE_X is showing a low block cache hit
> > ratio, so I add a condition in my config setter to increase the block
> > cache size for stores named STORE_X.
> >
> > For this case where the logical store has multiple physical stores,
> > we're really talking about segmented stores, window stores and
> > segmented stores. In these stores, every segment has a different name
> > (logicalStoreName + "." + segmentId * segmentInterval), so the config
> > setter would need to use a prefix match with respect to the metric tag
> > (e.g.) rocksdb-window-state-id. But at the same time, this is totally
> > doable.
> >
> > It's also perfectly reasonable, since segments get rotated out all the
> > time, it's implausible that you'd ever want a non-uniform
> > configuration over the segments in a store. For this reason,
> > specifically, it makes more sense just to summarize the metrics than
> > to present them individually. Might be worth documenting the
> > relationship, though.
> >
> > Thanks again, I'll vote now,
> > -John
> >
> > On Wed, Jun 19, 2019 at 12:03 PM John Roesler  wrote:
> > >
> > > Just taking a look over the metrics again, I had one thought...
> > >
> > > Stuff that happens in a background thread (like compaction metrics)
> > > can't directly identify compactions as a bottleneck from Streams'
> > > perspective. I.e., a DB might do a lot of compactions, but if those
> > > compactions never delay a write or read, then they cannot be a
> > > bottleneck.
> > >
> > > Thus, the "stall" metric should be the starting point for bottleneck
> > > identification, and then the flush/compaction metrics can be used to
> > > secondarily identify what to do to relieve the bottleneck.
> > >
> > > This doesn't affect the metrics you proposed, but I'd suggest saying
> > > something to this effect in whatever documentation or descriptions we
> > > provide.
> > >
> > > Thanks,
> > > -John
> > >
> > > On Wed, Jun 19, 2019 at 11:25 AM John Roesler 
> wrote:
> > > >
> > > > Thanks for the updates.
> > > >
> > > > Personally, I'd be in favor of not going out on a limb with
> > > > unsupported metrics APIs. We should take care to make sure that what
> > > > we add in KIP-471 is stable and well supported, even if it's not the
> > > > complete picture. We can always do follow-on work to tackle complex
> > > > metrics as an isolated design exercise.
> > > >
> > > > Just my two cents.
> > > > Thanks,
> > > > -John
> > > >
> > > > On Wed, Jun 19, 2019 at 6:02 AM Bruno Cadonna 
> wrote:
> > > > >
> > > > > Hi Guozhang,
> > > > >
> > > > > Regarding your comments about the wiki page:
> > > > >
> > > > > 1) Exactly, I rephrased the paragraph to make it more clear.
> > > > >
> > > > > 2) Yes, I used the wrong term. All hit related metrics are ratios.
> I
> > > > > corrected the names of the affected metrics.
> > > > >
> > > > > Regarding your meta comments:
> > > > >
> > > > > 1) The plan is to expose the hit ratio. I used the wrong term. The
> > > > > formulas compute ratios. Regarding your question about a metric to
> > > > > know from where a read is served when it is not in the memtable,
> there
> > > > > are metrics in RocksDB that give you the number of get() queries
> that
> > > > > are served from L0, L1, and L2_AND_UP. I could not find any metric
> > > > > that give you information about whether a query was served from
> disk
> > > > > vs. OS cache. One metric that could be used to indirectly measure
> > > > > whether disk or OS cache is accessed seems to be
> READ_BLOCK_GET_MICROS
> > > > > that gives you the time for an IO read of a block. If it is high,
> it
> > > > > was read from disk, otherwise from the OS cache. A similar
> strategy to
> > > > > monitor the performance is described in [1]. DISCLAIMER:
> > > > > READ_BLOCK_GET_MICROS is not d

Re: [DISCUSS] KIP-478 Strongly Typed Processor API

2019-06-19 Thread John Roesler
Hi all,

In response to the feedback so far, I changed the package name from
`processor2` to `processor.generic`.

Thanks,
-John

On Mon, Jun 17, 2019 at 4:49 PM John Roesler  wrote:
>
> Thanks for the feedback, Sophie!
>
> I actually felt a little uneasy when I wrote that remark, because it's
> not restricted at all in the API, it's just available to you if you
> choose to give your stores and context the same parameters. So, I
> think your use case is valid, and also perfectly permissable under the
> current KIP. Sorry for sowing confusion on my own discussion thread!
>
> I'm not crazy about the package name, either. I went with it only
> because there's seemingly nothing special about the new package except
> that it can't have the same name as the old one. Otherwise, the
> existing "processor" and "Processor" names for the package and class
> are perfectly satisfying. Rather than pile on additional semantics, it
> seemed cleaner to just add a number to the package name.
>
> This wouldn't be the first project to do something like this... Apache
> Commons, for example, has added a "2" to the end of some of their
> packages for exactly the same reason.
>
> I'm open to any suggestions. For example, we could do something like
> org.apache.kafka.streams.typedprocessor.Processor or
> org.apache.kafka.streams.processor.typed.Processor , which would have
> just about the same effect. One microscopic thought is that, if
> there's another interface in the "processor" package that we wish to
> do the same thing to, would _could_ pile it in to "processor2", but we
> couldn't do the same if we use a package that has "typed" in the name,
> unless that change is _also_ related to types in some way. But this
> seems like a very minor concern.
>
> What's your preference?
> -John
>
> On Mon, Jun 17, 2019 at 3:56 PM Sophie Blee-Goldman  
> wrote:
> >
> > Hey John, thanks for writing this up! I like the proposal but there's one
> > point that I think may be too restrictive:
> >
> > "A processor that happens to use a typed store is actually emitting the
> > same types that it is storing."
> >
> > I can imagine someone could want to leverage this new type safety without
> > also limiting how they can interact with/use their store. As an (admittedly
> > contrived) example, say you have an input stream of purchases of a certain
> > type (entertainment, food, etc), and on seeing a new record you want to
> > output how many types of purchase a shopper has made more than 5 purchases
> > of in the last month. Your state store will probably be holding some more
> > complicated PurchaseHistory object (keyed by user), but your output is just
> > a 
> >
> > I'm also not crazy about "processor2" as the package name ... not sure what
> > a better one would be though (something with "typed"?)
> >
> > On Mon, Jun 17, 2019 at 12:47 PM John Roesler  wrote:
> >
> > > Hi all,
> > >
> > > I'd like to propose KIP-478 (https://cwiki.apache.org/confluence/x/2SkLBw
> > > ).
> > >
> > > This proposal would add output type bounds to the Processor interface
> > > in Kafka Streams, which enables static checking of a number of useful
> > > properties:
> > > * A processor B that consumes the output of processor A is actually
> > > expecting the same types that processor A produces.
> > > * A processor that happens to use a typed store is actually emitting
> > > the same types that it is storing.
> > > * A processor is simply forwarding the expected types in all code paths.
> > > * Processors added via the Streams DSL, which are not permitted to
> > > forward results at all are statically prevented from doing so by the
> > > compiler
> > >
> > > Internally, we can use the above properties to achieve a much higher
> > > level of confidence in the Streams DSL implementation's correctness.
> > > Actually, while doing the POC, I found a few bugs and mistakes, which
> > > become structurally impossible with KIP-478.
> > >
> > > Additionally, the stronger types dramatically improve the
> > > self-documentation of our Streams internal implementations, which
> > > makes it much easier for new contributors to ramp up with confidence.
> > >
> > > Thanks so much for your consideration!
> > > -John
> > >


Re: [DISCUSS] KIP-479: Add Materialized to Join

2019-06-19 Thread John Roesler
Hi Bill,

Thanks for the KIP! Awesome job catching this unexpected consequence
of the prior KIPs before it was released.

The proposal looks good to me. On top of just fixing the problem, it
seems to address two other pain points:
* that naming a state store automatically causes it to become queriable.
* that there's currently no way to configure the bytes store for join windows.

It's awesome that we can fix this issue and two others with one feature.

I'm wondering about a missing quadrant from the truth table involving
whether a Materialized is stored or not and querying is
enabled/disabled... What should be the behavior if there is no store
configured (e.g., if Materialized with only serdes) and querying is
enabled?

It seems we have two choices:
1. we can force creation of a state store in this case, so the store
can be used to serve the queries
2. we can provide just a queriable view, basically letting IQ query
into the "KTableValueGetter", which would transparently construct the
query response by applying the operator logic to the upstream state if
the operator state isn't already stored.

Offhand, it seems like the second is actually a pretty awesome
capability. But it might have an awkward interaction with the current
semantics. Presently, if I provide a Materialized.withName, it implies
that querying should be enabled AND that the view should actually be
stored in a state store. Under option 2 above, this behavior would
change to NOT provision a state store and instead just consult the
ValueGetter. To get back to the current behavior, users would have to
add a "bytes store supplier" to the Materialized to indicate that,
yes, they really want a state store there.

Behavior changes are always kind of scary, but I think in this case,
it might actually be preferable. In the event where only the name is
provided, it means that people just wanted to make the operation
result queriable. If we automatically convert this to a non-stored
view, then simply upgrading results in the same observable behavior
and semantics, but a linear reduction in local storage requirements
and disk i/o, as well as a corresponding linear reduction in memory
usage both on and off heap.

What do you think?
-John

On Tue, Jun 18, 2019 at 9:21 PM Bill Bejeck  wrote:
>
> All,
>
> I'd like to start a discussion for adding a Materialized configuration
> object to KStream.join for naming state stores involved in joins.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-479%3A+Add+Materialized+to+Join
>
> Your comments and suggestions are welcome.
>
> Thanks,
> Bill


Re: Become a contributer

2019-06-19 Thread Jun Rao
Hi, Garvit,

Thanks for your interest. Added you to the contributor list.

Jun

On Wed, Jun 19, 2019 at 10:18 AM Garvit Sharma  wrote:

> Hi,
>
> I would like to contribute to Apache Kafka, Could you please give me
> contributor permission?
> JIRA ID: garvitlnmiit
> Github ID: garvitlnmiit
> Github Email: garvit...@gmail.com
>
> Thanks,
>
> --
>
> Garvit Sharma
> github.com/garvitlnmiit/
>
> No Body is a Scholar by birth, its only hard work and strong determination
> that makes him master.
>


[jira] [Resolved] (KAFKA-1889) Refactor shell wrapper scripts

2019-06-19 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques resolved KAFKA-1889.
---
Resolution: Won't Fix

> Refactor shell wrapper scripts
> --
>
> Key: KAFKA-1889
> URL: https://issues.apache.org/jira/browse/KAFKA-1889
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Minor
> Attachments: refactor-scripts-v1.patch, refactor-scripts-v2.patch
>
>
> Shell scripts in bin/ need love.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread Bruno Cadonna
John and Guozhang,

thank you for your comments.

@Guozhang could you please also vote on the voting thread so that we
have all votes in one place.

@John, the only situation I can think of where a non-uniform
configuration of segments would make sense is to account for
seasonality. But this would be a really advanced configuration IMO.

Best,
Bruno

On Wed, Jun 19, 2019 at 7:18 PM John Roesler  wrote:
>
> One last thought. I think it makes sense what you propose for merging
> the metrics when a logical store is composed of multiple physical
> stores.
>
> The basic standard for these metrics is that they should be relevant
> to performance, and they should be controllable via configurations,
> specifically via RocksDBConfigSetter. The config setter takes as input
> the store name. For Key/Value stores, this name is visible in the tags
> as "rocksdb-state-id", so the workflow is that I notice (e.g.) the
> metric rocksdb-state-id=STORE_X is showing a low block cache hit
> ratio, so I add a condition in my config setter to increase the block
> cache size for stores named STORE_X.
>
> For this case where the logical store has multiple physical stores,
> we're really talking about segmented stores, window stores and
> segmented stores. In these stores, every segment has a different name
> (logicalStoreName + "." + segmentId * segmentInterval), so the config
> setter would need to use a prefix match with respect to the metric tag
> (e.g.) rocksdb-window-state-id. But at the same time, this is totally
> doable.
>
> It's also perfectly reasonable, since segments get rotated out all the
> time, it's implausible that you'd ever want a non-uniform
> configuration over the segments in a store. For this reason,
> specifically, it makes more sense just to summarize the metrics than
> to present them individually. Might be worth documenting the
> relationship, though.
>
> Thanks again, I'll vote now,
> -John
>
> On Wed, Jun 19, 2019 at 12:03 PM John Roesler  wrote:
> >
> > Just taking a look over the metrics again, I had one thought...
> >
> > Stuff that happens in a background thread (like compaction metrics)
> > can't directly identify compactions as a bottleneck from Streams'
> > perspective. I.e., a DB might do a lot of compactions, but if those
> > compactions never delay a write or read, then they cannot be a
> > bottleneck.
> >
> > Thus, the "stall" metric should be the starting point for bottleneck
> > identification, and then the flush/compaction metrics can be used to
> > secondarily identify what to do to relieve the bottleneck.
> >
> > This doesn't affect the metrics you proposed, but I'd suggest saying
> > something to this effect in whatever documentation or descriptions we
> > provide.
> >
> > Thanks,
> > -John
> >
> > On Wed, Jun 19, 2019 at 11:25 AM John Roesler  wrote:
> > >
> > > Thanks for the updates.
> > >
> > > Personally, I'd be in favor of not going out on a limb with
> > > unsupported metrics APIs. We should take care to make sure that what
> > > we add in KIP-471 is stable and well supported, even if it's not the
> > > complete picture. We can always do follow-on work to tackle complex
> > > metrics as an isolated design exercise.
> > >
> > > Just my two cents.
> > > Thanks,
> > > -John
> > >
> > > On Wed, Jun 19, 2019 at 6:02 AM Bruno Cadonna  wrote:
> > > >
> > > > Hi Guozhang,
> > > >
> > > > Regarding your comments about the wiki page:
> > > >
> > > > 1) Exactly, I rephrased the paragraph to make it more clear.
> > > >
> > > > 2) Yes, I used the wrong term. All hit related metrics are ratios. I
> > > > corrected the names of the affected metrics.
> > > >
> > > > Regarding your meta comments:
> > > >
> > > > 1) The plan is to expose the hit ratio. I used the wrong term. The
> > > > formulas compute ratios. Regarding your question about a metric to
> > > > know from where a read is served when it is not in the memtable, there
> > > > are metrics in RocksDB that give you the number of get() queries that
> > > > are served from L0, L1, and L2_AND_UP. I could not find any metric
> > > > that give you information about whether a query was served from disk
> > > > vs. OS cache. One metric that could be used to indirectly measure
> > > > whether disk or OS cache is accessed seems to be READ_BLOCK_GET_MICROS
> > > > that gives you the time for an IO read of a block. If it is high, it
> > > > was read from disk, otherwise from the OS cache. A similar strategy to
> > > > monitor the performance is described in [1]. DISCLAIMER:
> > > > READ_BLOCK_GET_MICROS is not documented. I had to look into the C++
> > > > code to understand its meaning. I could have missed something.
> > > >
> > > > 2) There are some additional compaction statistics that contain sizes
> > > > of files on disk and numbers about write amplification that you can
> > > > get programmatically in RocksDB, but they are for debugging purposes
> > > > [2]. To get this data and publish it into a metric, one has to parse a
> > > > string. Si

Re: [VOTE] 2.3.0 RC2

2019-06-19 Thread Colin McCabe
Hi Ismael,

Good find.  Yes, I think this qualifies as a blocker.  Let's sink this R  I'll 
create an RC3 today with the fix.

best,
Colin

On Wed, Jun 19, 2019, at 08:12, Ismael Juma wrote:
> Hi Colin,
> 
> Thanks for this. One potential blocker (recent regression in the log layer)
> was filed yesterday:
> 
> https://issues.apache.org/jira/browse/KAFKA-8564
> 
> Do you think this qualifies as a blocker?
> 
> Ismael
> 
> On Wed, Jun 12, 2019 at 3:55 PM Colin McCabe  wrote:
> 
> > Hi all,
> >
> > We discovered some problems with the first release candidate (RC1) of
> > 2.3.0.  Specifically, KAFKA-8484 and KAFKA-8500.  I have created a new
> > release candidate that includes fixes for these issues.
> >
> > Check out the release notes for the 2.3.0 release here:
> > https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/RELEASE_NOTES.html
> >
> > The vote will go until Friday, June 7th, or until we create another R
> >
> > * Kafka's KEYS file containing PGP keys we use to sign the release can be
> > found here:
> > https://kafka.apache.org/KEYS
> >
> > * The release artifacts to be voted upon (source and binary) are here:
> > https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/javadoc/
> >
> > * The tag to be voted upon (off the 2.3 branch) is the 2.3.0 tag:
> > https://github.com/apache/kafka/releases/tag/2.3.0-rc2
> >
> > best,
> > Colin
> >
>


Re: [VOTE] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread John Roesler
I'm +1 (nonbinding)

Thanks!
-John

On Tue, Jun 18, 2019 at 7:48 AM Bruno Cadonna  wrote:
>
> Hi,
>
> I would like to start the voting on KIP-471:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471%3A+Expose+RocksDB+Metrics+in+Kafka+Streams
>
> You can find the discussion here:
> https://lists.apache.org/thread.html/125bdd984fe0667962018da6ce10bce6d5895c5103955a8e4c730fef@%3Cdev.kafka.apache.org%3E
>
> Best,
> Bruno


Become a contributer

2019-06-19 Thread Garvit Sharma
Hi,

I would like to contribute to Apache Kafka, Could you please give me
contributor permission?
JIRA ID: garvitlnmiit
Github ID: garvitlnmiit
Github Email: garvit...@gmail.com

Thanks,

-- 

Garvit Sharma
github.com/garvitlnmiit/

No Body is a Scholar by birth, its only hard work and strong determination
that makes him master.


Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread John Roesler
One last thought. I think it makes sense what you propose for merging
the metrics when a logical store is composed of multiple physical
stores.

The basic standard for these metrics is that they should be relevant
to performance, and they should be controllable via configurations,
specifically via RocksDBConfigSetter. The config setter takes as input
the store name. For Key/Value stores, this name is visible in the tags
as "rocksdb-state-id", so the workflow is that I notice (e.g.) the
metric rocksdb-state-id=STORE_X is showing a low block cache hit
ratio, so I add a condition in my config setter to increase the block
cache size for stores named STORE_X.

For this case where the logical store has multiple physical stores,
we're really talking about segmented stores, window stores and
segmented stores. In these stores, every segment has a different name
(logicalStoreName + "." + segmentId * segmentInterval), so the config
setter would need to use a prefix match with respect to the metric tag
(e.g.) rocksdb-window-state-id. But at the same time, this is totally
doable.

It's also perfectly reasonable, since segments get rotated out all the
time, it's implausible that you'd ever want a non-uniform
configuration over the segments in a store. For this reason,
specifically, it makes more sense just to summarize the metrics than
to present them individually. Might be worth documenting the
relationship, though.

Thanks again, I'll vote now,
-John

On Wed, Jun 19, 2019 at 12:03 PM John Roesler  wrote:
>
> Just taking a look over the metrics again, I had one thought...
>
> Stuff that happens in a background thread (like compaction metrics)
> can't directly identify compactions as a bottleneck from Streams'
> perspective. I.e., a DB might do a lot of compactions, but if those
> compactions never delay a write or read, then they cannot be a
> bottleneck.
>
> Thus, the "stall" metric should be the starting point for bottleneck
> identification, and then the flush/compaction metrics can be used to
> secondarily identify what to do to relieve the bottleneck.
>
> This doesn't affect the metrics you proposed, but I'd suggest saying
> something to this effect in whatever documentation or descriptions we
> provide.
>
> Thanks,
> -John
>
> On Wed, Jun 19, 2019 at 11:25 AM John Roesler  wrote:
> >
> > Thanks for the updates.
> >
> > Personally, I'd be in favor of not going out on a limb with
> > unsupported metrics APIs. We should take care to make sure that what
> > we add in KIP-471 is stable and well supported, even if it's not the
> > complete picture. We can always do follow-on work to tackle complex
> > metrics as an isolated design exercise.
> >
> > Just my two cents.
> > Thanks,
> > -John
> >
> > On Wed, Jun 19, 2019 at 6:02 AM Bruno Cadonna  wrote:
> > >
> > > Hi Guozhang,
> > >
> > > Regarding your comments about the wiki page:
> > >
> > > 1) Exactly, I rephrased the paragraph to make it more clear.
> > >
> > > 2) Yes, I used the wrong term. All hit related metrics are ratios. I
> > > corrected the names of the affected metrics.
> > >
> > > Regarding your meta comments:
> > >
> > > 1) The plan is to expose the hit ratio. I used the wrong term. The
> > > formulas compute ratios. Regarding your question about a metric to
> > > know from where a read is served when it is not in the memtable, there
> > > are metrics in RocksDB that give you the number of get() queries that
> > > are served from L0, L1, and L2_AND_UP. I could not find any metric
> > > that give you information about whether a query was served from disk
> > > vs. OS cache. One metric that could be used to indirectly measure
> > > whether disk or OS cache is accessed seems to be READ_BLOCK_GET_MICROS
> > > that gives you the time for an IO read of a block. If it is high, it
> > > was read from disk, otherwise from the OS cache. A similar strategy to
> > > monitor the performance is described in [1]. DISCLAIMER:
> > > READ_BLOCK_GET_MICROS is not documented. I had to look into the C++
> > > code to understand its meaning. I could have missed something.
> > >
> > > 2) There are some additional compaction statistics that contain sizes
> > > of files on disk and numbers about write amplification that you can
> > > get programmatically in RocksDB, but they are for debugging purposes
> > > [2]. To get this data and publish it into a metric, one has to parse a
> > > string. Since this data is for debugging purposes, I do not know how
> > > stable the output format is. One thing, we could do, is to dump the
> > > string with the compaction statistics into our log files at DEBUG
> > > level. But that is outside of the scope of this KIP.
> > >
> > > Best,
> > > Bruno
> > >
> > > [1] 
> > > https://github.com/facebook/rocksdb/wiki/Perf-Context-and-IO-Stats-Context#block-cache-and-os-page-cache-efficiency
> > > [2] 
> > > https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#rocksdb-statistics
> > >
> > > On Tue, Jun 18, 2019 at 8:24 PM Guozhang Wang  wrote:
> 

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread Guozhang Wang
Bruno, thanks for the clarification. I agree that we should not rely on
parsing strings to expose as metrics since they are 1) not very reliable
and also 2) may evolve its format / representation over time.

I think we can potentially add some documentations aligned with your
explanations above to educate users how to predicate their capacity on
memory and disks, rather than exposing the metrics; these would be not be
required in this KIP then. Maybe we can create a JIRA ticket for adding web
docs to educate users investigating their rocksDB behaviors, also including
John's comments above?

As for the KIP itself, I'm +1.


Guozhang

On Wed, Jun 19, 2019 at 10:04 AM John Roesler  wrote:

> Just taking a look over the metrics again, I had one thought...
>
> Stuff that happens in a background thread (like compaction metrics)
> can't directly identify compactions as a bottleneck from Streams'
> perspective. I.e., a DB might do a lot of compactions, but if those
> compactions never delay a write or read, then they cannot be a
> bottleneck.
>
> Thus, the "stall" metric should be the starting point for bottleneck
> identification, and then the flush/compaction metrics can be used to
> secondarily identify what to do to relieve the bottleneck.
>
> This doesn't affect the metrics you proposed, but I'd suggest saying
> something to this effect in whatever documentation or descriptions we
> provide.
>
> Thanks,
> -John
>
> On Wed, Jun 19, 2019 at 11:25 AM John Roesler  wrote:
> >
> > Thanks for the updates.
> >
> > Personally, I'd be in favor of not going out on a limb with
> > unsupported metrics APIs. We should take care to make sure that what
> > we add in KIP-471 is stable and well supported, even if it's not the
> > complete picture. We can always do follow-on work to tackle complex
> > metrics as an isolated design exercise.
> >
> > Just my two cents.
> > Thanks,
> > -John
> >
> > On Wed, Jun 19, 2019 at 6:02 AM Bruno Cadonna 
> wrote:
> > >
> > > Hi Guozhang,
> > >
> > > Regarding your comments about the wiki page:
> > >
> > > 1) Exactly, I rephrased the paragraph to make it more clear.
> > >
> > > 2) Yes, I used the wrong term. All hit related metrics are ratios. I
> > > corrected the names of the affected metrics.
> > >
> > > Regarding your meta comments:
> > >
> > > 1) The plan is to expose the hit ratio. I used the wrong term. The
> > > formulas compute ratios. Regarding your question about a metric to
> > > know from where a read is served when it is not in the memtable, there
> > > are metrics in RocksDB that give you the number of get() queries that
> > > are served from L0, L1, and L2_AND_UP. I could not find any metric
> > > that give you information about whether a query was served from disk
> > > vs. OS cache. One metric that could be used to indirectly measure
> > > whether disk or OS cache is accessed seems to be READ_BLOCK_GET_MICROS
> > > that gives you the time for an IO read of a block. If it is high, it
> > > was read from disk, otherwise from the OS cache. A similar strategy to
> > > monitor the performance is described in [1]. DISCLAIMER:
> > > READ_BLOCK_GET_MICROS is not documented. I had to look into the C++
> > > code to understand its meaning. I could have missed something.
> > >
> > > 2) There are some additional compaction statistics that contain sizes
> > > of files on disk and numbers about write amplification that you can
> > > get programmatically in RocksDB, but they are for debugging purposes
> > > [2]. To get this data and publish it into a metric, one has to parse a
> > > string. Since this data is for debugging purposes, I do not know how
> > > stable the output format is. One thing, we could do, is to dump the
> > > string with the compaction statistics into our log files at DEBUG
> > > level. But that is outside of the scope of this KIP.
> > >
> > > Best,
> > > Bruno
> > >
> > > [1]
> https://github.com/facebook/rocksdb/wiki/Perf-Context-and-IO-Stats-Context#block-cache-and-os-page-cache-efficiency
> > > [2]
> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#rocksdb-statistics
> > >
> > > On Tue, Jun 18, 2019 at 8:24 PM Guozhang Wang 
> wrote:
> > > >
> > > > Hello Bruno,
> > > >
> > > > I've read through the aggregation section and I think they look good
> to me.
> > > > There are a few minor comments about the wiki page itself:
> > > >
> > > > 1) A state store might consist of multiple state stores -> You mean a
> > > > `logical` state store be consistent of multiple `physical` store
> instances?
> > > >
> > > > 2) The "Hit Rates" calculation seems to be referring to the `Hit
> Ratio`
> > > > (which is a percentage) than `Hit Rate`?
> > > >
> > > > And a couple further meta comments:
> > > >
> > > > 1) For memtable / block cache, instead of the hit-rate do you think
> we
> > > > should expose the hit-ratio? I felt it is more useful for users to
> debug
> > > > what's the root cause of unexpected slow performance.
> > > >
> > > > And for block cache misses, i

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread John Roesler
Just taking a look over the metrics again, I had one thought...

Stuff that happens in a background thread (like compaction metrics)
can't directly identify compactions as a bottleneck from Streams'
perspective. I.e., a DB might do a lot of compactions, but if those
compactions never delay a write or read, then they cannot be a
bottleneck.

Thus, the "stall" metric should be the starting point for bottleneck
identification, and then the flush/compaction metrics can be used to
secondarily identify what to do to relieve the bottleneck.

This doesn't affect the metrics you proposed, but I'd suggest saying
something to this effect in whatever documentation or descriptions we
provide.

Thanks,
-John

On Wed, Jun 19, 2019 at 11:25 AM John Roesler  wrote:
>
> Thanks for the updates.
>
> Personally, I'd be in favor of not going out on a limb with
> unsupported metrics APIs. We should take care to make sure that what
> we add in KIP-471 is stable and well supported, even if it's not the
> complete picture. We can always do follow-on work to tackle complex
> metrics as an isolated design exercise.
>
> Just my two cents.
> Thanks,
> -John
>
> On Wed, Jun 19, 2019 at 6:02 AM Bruno Cadonna  wrote:
> >
> > Hi Guozhang,
> >
> > Regarding your comments about the wiki page:
> >
> > 1) Exactly, I rephrased the paragraph to make it more clear.
> >
> > 2) Yes, I used the wrong term. All hit related metrics are ratios. I
> > corrected the names of the affected metrics.
> >
> > Regarding your meta comments:
> >
> > 1) The plan is to expose the hit ratio. I used the wrong term. The
> > formulas compute ratios. Regarding your question about a metric to
> > know from where a read is served when it is not in the memtable, there
> > are metrics in RocksDB that give you the number of get() queries that
> > are served from L0, L1, and L2_AND_UP. I could not find any metric
> > that give you information about whether a query was served from disk
> > vs. OS cache. One metric that could be used to indirectly measure
> > whether disk or OS cache is accessed seems to be READ_BLOCK_GET_MICROS
> > that gives you the time for an IO read of a block. If it is high, it
> > was read from disk, otherwise from the OS cache. A similar strategy to
> > monitor the performance is described in [1]. DISCLAIMER:
> > READ_BLOCK_GET_MICROS is not documented. I had to look into the C++
> > code to understand its meaning. I could have missed something.
> >
> > 2) There are some additional compaction statistics that contain sizes
> > of files on disk and numbers about write amplification that you can
> > get programmatically in RocksDB, but they are for debugging purposes
> > [2]. To get this data and publish it into a metric, one has to parse a
> > string. Since this data is for debugging purposes, I do not know how
> > stable the output format is. One thing, we could do, is to dump the
> > string with the compaction statistics into our log files at DEBUG
> > level. But that is outside of the scope of this KIP.
> >
> > Best,
> > Bruno
> >
> > [1] 
> > https://github.com/facebook/rocksdb/wiki/Perf-Context-and-IO-Stats-Context#block-cache-and-os-page-cache-efficiency
> > [2] 
> > https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#rocksdb-statistics
> >
> > On Tue, Jun 18, 2019 at 8:24 PM Guozhang Wang  wrote:
> > >
> > > Hello Bruno,
> > >
> > > I've read through the aggregation section and I think they look good to 
> > > me.
> > > There are a few minor comments about the wiki page itself:
> > >
> > > 1) A state store might consist of multiple state stores -> You mean a
> > > `logical` state store be consistent of multiple `physical` store 
> > > instances?
> > >
> > > 2) The "Hit Rates" calculation seems to be referring to the `Hit Ratio`
> > > (which is a percentage) than `Hit Rate`?
> > >
> > > And a couple further meta comments:
> > >
> > > 1) For memtable / block cache, instead of the hit-rate do you think we
> > > should expose the hit-ratio? I felt it is more useful for users to debug
> > > what's the root cause of unexpected slow performance.
> > >
> > > And for block cache misses, is it easy to provide a metric as of "target
> > > read" of where a read is served (from which level, either in OS cache or 
> > > in
> > > SST files), similar to Fig.11 in
> > > http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf?
> > >
> > > 2) As @Patrik mentioned, is there a good way we can expose the total 
> > > amount
> > > of memory and disk usage for each state store as well? I think it would
> > > also be very helpful for users to understand their capacity needs and read
> > > / write amplifications.
> > >
> > >
> > > Guozhang
> > >
> > > On Fri, Jun 14, 2019 at 6:55 AM Bruno Cadonna  wrote:
> > >
> > > > Hi,
> > > >
> > > > I decided to go for the option in which metrics are exposed for each
> > > > logical state store. I revisited the KIP correspondingly and added a
> > > > section on how to aggregate metrics over multiple physical RocksDB
> > > > in

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread John Roesler
Thanks for the updates.

Personally, I'd be in favor of not going out on a limb with
unsupported metrics APIs. We should take care to make sure that what
we add in KIP-471 is stable and well supported, even if it's not the
complete picture. We can always do follow-on work to tackle complex
metrics as an isolated design exercise.

Just my two cents.
Thanks,
-John

On Wed, Jun 19, 2019 at 6:02 AM Bruno Cadonna  wrote:
>
> Hi Guozhang,
>
> Regarding your comments about the wiki page:
>
> 1) Exactly, I rephrased the paragraph to make it more clear.
>
> 2) Yes, I used the wrong term. All hit related metrics are ratios. I
> corrected the names of the affected metrics.
>
> Regarding your meta comments:
>
> 1) The plan is to expose the hit ratio. I used the wrong term. The
> formulas compute ratios. Regarding your question about a metric to
> know from where a read is served when it is not in the memtable, there
> are metrics in RocksDB that give you the number of get() queries that
> are served from L0, L1, and L2_AND_UP. I could not find any metric
> that give you information about whether a query was served from disk
> vs. OS cache. One metric that could be used to indirectly measure
> whether disk or OS cache is accessed seems to be READ_BLOCK_GET_MICROS
> that gives you the time for an IO read of a block. If it is high, it
> was read from disk, otherwise from the OS cache. A similar strategy to
> monitor the performance is described in [1]. DISCLAIMER:
> READ_BLOCK_GET_MICROS is not documented. I had to look into the C++
> code to understand its meaning. I could have missed something.
>
> 2) There are some additional compaction statistics that contain sizes
> of files on disk and numbers about write amplification that you can
> get programmatically in RocksDB, but they are for debugging purposes
> [2]. To get this data and publish it into a metric, one has to parse a
> string. Since this data is for debugging purposes, I do not know how
> stable the output format is. One thing, we could do, is to dump the
> string with the compaction statistics into our log files at DEBUG
> level. But that is outside of the scope of this KIP.
>
> Best,
> Bruno
>
> [1] 
> https://github.com/facebook/rocksdb/wiki/Perf-Context-and-IO-Stats-Context#block-cache-and-os-page-cache-efficiency
> [2] 
> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#rocksdb-statistics
>
> On Tue, Jun 18, 2019 at 8:24 PM Guozhang Wang  wrote:
> >
> > Hello Bruno,
> >
> > I've read through the aggregation section and I think they look good to me.
> > There are a few minor comments about the wiki page itself:
> >
> > 1) A state store might consist of multiple state stores -> You mean a
> > `logical` state store be consistent of multiple `physical` store instances?
> >
> > 2) The "Hit Rates" calculation seems to be referring to the `Hit Ratio`
> > (which is a percentage) than `Hit Rate`?
> >
> > And a couple further meta comments:
> >
> > 1) For memtable / block cache, instead of the hit-rate do you think we
> > should expose the hit-ratio? I felt it is more useful for users to debug
> > what's the root cause of unexpected slow performance.
> >
> > And for block cache misses, is it easy to provide a metric as of "target
> > read" of where a read is served (from which level, either in OS cache or in
> > SST files), similar to Fig.11 in
> > http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf?
> >
> > 2) As @Patrik mentioned, is there a good way we can expose the total amount
> > of memory and disk usage for each state store as well? I think it would
> > also be very helpful for users to understand their capacity needs and read
> > / write amplifications.
> >
> >
> > Guozhang
> >
> > On Fri, Jun 14, 2019 at 6:55 AM Bruno Cadonna  wrote:
> >
> > > Hi,
> > >
> > > I decided to go for the option in which metrics are exposed for each
> > > logical state store. I revisited the KIP correspondingly and added a
> > > section on how to aggregate metrics over multiple physical RocksDB
> > > instances within one logical state store. Would be great, if you could
> > > take a look and give feedback. If nobody has complaints about the
> > > chosen option I would proceed with voting on this KIP since this was
> > > the last open question.
> > >
> > > Best,
> > > Bruno
> > >
> > > On Fri, Jun 7, 2019 at 9:38 PM Patrik Kleindl  wrote:
> > > >
> > > > Hi Sophie
> > > > This will be a good change, I have been thinking about proposing
> > > something similar or even passing the properties per store.
> > > > RocksDB should probably know how much memory was reserved but maybe does
> > > not expose it.
> > > > We are limiting it already as you suggested but this is a rather crude
> > > tool.
> > > > Especially in a larger topology with mixed loads par topic it would be
> > > helpful to get more insights which store puts a lot of load on memory.
> > > > Regarding the limiting capability, I think I remember reading that those
> > > only affect some parts of the memor

Re: Github PR 5454 needs a review/merge (Scala 2.13 gradle build)

2019-06-19 Thread Ismael Juma
I added a comment.

On Wed, Jun 19, 2019 at 8:10 AM Dejan Stojadinović 
wrote:

> Skip to this comment, please:
> https://github.com/apache/kafka/pull/5454#issuecomment-501969180
>
> Regards,
> Dejan
>


Re: [VOTE] 2.3.0 RC2

2019-06-19 Thread Ismael Juma
Hi Colin,

Thanks for this. One potential blocker (recent regression in the log layer)
was filed yesterday:

https://issues.apache.org/jira/browse/KAFKA-8564

Do you think this qualifies as a blocker?

Ismael

On Wed, Jun 12, 2019 at 3:55 PM Colin McCabe  wrote:

> Hi all,
>
> We discovered some problems with the first release candidate (RC1) of
> 2.3.0.  Specifically, KAFKA-8484 and KAFKA-8500.  I have created a new
> release candidate that includes fixes for these issues.
>
> Check out the release notes for the 2.3.0 release here:
> https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/RELEASE_NOTES.html
>
> The vote will go until Friday, June 7th, or until we create another R
>
> * Kafka's KEYS file containing PGP keys we use to sign the release can be
> found here:
> https://kafka.apache.org/KEYS
>
> * The release artifacts to be voted upon (source and binary) are here:
> https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~cmccabe/kafka-2.3.0-rc2/javadoc/
>
> * The tag to be voted upon (off the 2.3 branch) is the 2.3.0 tag:
> https://github.com/apache/kafka/releases/tag/2.3.0-rc2
>
> best,
> Colin
>


Github PR 5454 needs a review/merge (Scala 2.13 gradle build)

2019-06-19 Thread Dejan Stojadinović
Skip to this comment, please:
https://github.com/apache/kafka/pull/5454#issuecomment-501969180

Regards,
Dejan


Re: Preliminary blog post for the Apache Kafka 2.3.0 release

2019-06-19 Thread Ron Dagostino
Looks great, Colin.

I have also enjoyed Stephane Maarek's "What's New in Kafka..." series of
videos (e.g. https://www.youtube.com/watch?v=kaWbp1Cnfo4&t=10s).  Having
summaries like this in both formats -- blog and video -- for every release
would be helpful as different people have different preferences.

Ron

On Tue, Jun 18, 2019 at 8:20 PM Colin McCabe  wrote:

> Thanks, Konstantine.  I reworked the wording a bit -- take a look.
>
> best,
> C.
>
> On Tue, Jun 18, 2019, at 14:31, Konstantine Karantasis wrote:
> > Thanks Colin.
> > Great initiative!
> >
> > Here's a small correction (between **) for KIP-415 with a small
> suggestion
> > as well (between _ _):
> >
> > In Kafka Connect, worker tasks are distributed among the available worker
> > nodes. When a connector is reconfigured or a new connector is deployed
> _as
> > well as when a worker is added or removed_, the *tasks* must be
> rebalanced
> > across the Connect cluster to help ensure that all of the worker nodes
> are
> > doing a fair share of the Connect work. In 2.2 and earlier, a Connect
> > rebalance caused all worker threads to pause while the rebalance
> proceeded.
> > As of KIP-415, rebalancing is no longer a stop-the-world affair, making
> > configuration changes a more pleasant thing.
> >
> > Cheers,
> > Konstantine
> >
> > On Tue, Jun 18, 2019 at 1:50 PM Swen Moczarski  >
> > wrote:
> >
> > > Nice overview!
> > >
> > > I found some typos:
> > > rbmainder
> > > bmits
> > > implbmentation
> > >
> > > Am Di., 18. Juni 2019 um 22:43 Uhr schrieb Boyang Chen <
> > > bche...@outlook.com
> > > >:
> > >
> > > > One typo:
> > > > KIP-428: Add in-mbmory window store
> > > > should be
> > > > KIP-428: Add in-memory window store
> > > >
> > > >
> > > > 
> > > > From: Colin McCabe 
> > > > Sent: Wednesday, June 19, 2019 4:22 AM
> > > > To: dev@kafka.apache.org
> > > > Subject: Re: Preliminary blog post for the Apache Kafka 2.3.0 release
> > > >
> > > > Sorry, I copied the wrong URL at first.  Try this URL instead:
> > > >
> > >
> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache
> > > >
> > > > best,
> > > > Colin
> > > >
> > > > On Tue, Jun 18, 2019, at 13:17, Colin McCabe wrote:
> > > > > Hmm.  I'm looking to see if there's any way to open up the
> > > > permissions... :|
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jun 18, 2019, at 13:12, M. Manna wrote:
> > > > > > It’s asking for credentials...?
> > > > > >
> > > > > > On Tue, 18 Jun 2019 at 15:10, Colin McCabe 
> > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I've written up a preliminary blog post about the upcoming
> Apache
> > > > Kafka
> > > > > > > 2.3.0 release.  Take a look and let me know what you think.
> > > > > > >
> > > > > > >
> > > > > > >
> > > >
> > >
> https://blogs.apache.org/roller-ui/authoring/preview/kafka/?previewEntry=what-s-new-in-apache
> > > > > > >
> > > > > > > cheers,
> > > > > > > Colin
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (KAFKA-8568) MirrorMaker 2.0 resource leak

2019-06-19 Thread JIRA
Péter Gergő Barna created KAFKA-8568:


 Summary: MirrorMaker 2.0 resource leak
 Key: KAFKA-8568
 URL: https://issues.apache.org/jira/browse/KAFKA-8568
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect, mirrormaker
Affects Versions: 2.2.2
Reporter: Péter Gergő Barna


This issue produced by the branch  KIP-382 (I am not sure which version is 
affected by that)

While MirrorMaker 2.0 is running, the following command returns a number that 
is getting larger and larger. 

 
{noformat}
lsof -p  | grep ESTABLISHED | wc -l{noformat}
 

In the error log, NullPointers pops up from the MirrorSourceTask.cleanup, 
because either the consumer or the producer is null when the cleanup method 
tries to close it.

 
{noformat}
Exception in thread "Thread-790" java.lang.NullPointerException
 at 
org.apache.kafka.connect.mirror.MirrorSourceTask.cleanup(MirrorSourceTask.java:110)
 at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-792" java.lang.NullPointerException
 at 
org.apache.kafka.connect.mirror.MirrorSourceTask.cleanup(MirrorSourceTask.java:110)
 at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-791" java.lang.NullPointerException
 at 
org.apache.kafka.connect.mirror.MirrorSourceTask.cleanup(MirrorSourceTask.java:116)
 at java.lang.Thread.run(Thread.java:748)
Exception in thread "Thread-793" java.lang.NullPointerException
 at 
org.apache.kafka.connect.mirror.MirrorSourceTask.cleanup(MirrorSourceTask.java:110)
 at java.lang.Thread.run(Thread.java:748){noformat}
When the number of the established connections (returned by lsof) reaches a 
certain limit, new exceptions start to pop up in the logs: Too many open files

 

 
{noformat}
[2019-06-19 12:56:43,949] ERROR WorkerSourceTask{id=MirrorHeartbeatConnector-0} 
failed to send record to heartbeats: {} 
(org.apache.kafka.connect.runtime.WorkerSourceTask)
org.apache.kafka.common.errors.SaslAuthenticationException: An error: 
(java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Too many open 
files)]) occurred when evaluating SASL token received from the Kafka Broker. 
Kafka Client will go to A
UTHENTICATION_FAILED state.
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Too many open 
files)]
        at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
        at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.lambda$createSaslToken$1(SaslClientAuthenticator.java:461)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.createSaslToken(SaslClientAuthenticator.java:461)
        at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.sendSaslClientToken(SaslClientAuthenticator.java:370)
        at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.sendInitialToken(SaslClientAuthenticator.java:290)
        at 
org.apache.kafka.common.security.authenticator.SaslClientAuthenticator.authenticate(SaslClientAuthenticator.java:230)
        at 
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:173)
        at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:536)
        at org.apache.kafka.common.network.Selector.poll(Selector.java:472)
        at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:535)
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:311)
        at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
        at java.lang.Thread.run(Thread.java:748)
Caused by: GSSException: No valid credentials provided (Mechanism level: Too 
many open files)
        at 
sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:775)
        at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:248)
        at 
sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
        at 
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
        ... 14 more
Caused by: java.net.SocketException: Too many open files
        at java.net.Socket.createImpl(Socket.java:460)
        at java.net.Socket.connect(Socket.java:587)
        at sun.security.krb5.internal.TCPClient.(NetClient.java:63)
        at sun.security.krb5.internal.NetClient.getInstance(NetClient.java:43)
        at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:393)
        at sun.security.krb5.KdcComm$KdcCommunication.run(KdcComm.java:364)
        at java.security.AccessController.doPrivileged(Native 

Build failed in Jenkins: kafka-trunk-jdk8 #3735

2019-06-19 Thread Apache Jenkins Server
See 


Changes:

[ismael] KAFKA-8559: Allocate ArrayList with correct size in PartitionStates

--
[...truncated 2.14 MB...]
kafka.log.LogCleanerParameterizedIntegrationTest > cleanerTest[4] PASSED

kafka.log.LogCleanerParameterizedIntegrationTest > 
testCleanerWithMessageFormatV0[4] STARTED

kafka.log.LogCleanerParameterizedIntegrationTest > 
testCleanerWithMessageFormatV0[4] PASSED

kafka.log.TimeIndexTest > testTruncate STARTED

kafka.log.TimeIndexTest > testTruncate PASSED

kafka.log.TimeIndexTest > testEntry STARTED

kafka.log.TimeIndexTest > testEntry PASSED

kafka.log.TimeIndexTest > testAppend STARTED

kafka.log.TimeIndexTest > testAppend PASSED

kafka.log.TimeIndexTest > testEntryOverflow STARTED

kafka.log.TimeIndexTest > testEntryOverflow PASSED

kafka.log.TimeIndexTest > testLookUp STARTED

kafka.log.TimeIndexTest > testLookUp PASSED

kafka.log.TimeIndexTest > testSanityCheck STARTED

kafka.log.TimeIndexTest > testSanityCheck PASSED

kafka.log.LogCleanerLagIntegrationTest > cleanerTest[0] STARTED

kafka.log.LogCleanerLagIntegrationTest > cleanerTest[0] PASSED

kafka.log.LogCleanerLagIntegrationTest > cleanerTest[1] STARTED

kafka.log.LogCleanerLagIntegrationTest > cleanerTest[1] PASSED

kafka.log.LogCleanerLagIntegrationTest > cleanerTest[2] STARTED

kafka.log.LogCleanerLagIntegrationTest > cleanerTest[2] PASSED

kafka.log.LogCleanerLagIntegrationTest > cleanerTest[3] STARTED

kafka.log.LogCleanerLagIntegrationTest > cleanerTest[3] PASSED

kafka.log.LogCleanerLagIntegrationTest > cleanerTest[4] STARTED

kafka.log.LogCleanerLagIntegrationTest > cleanerTest[4] PASSED

kafka.log.ProducerStateManagerTest > 
testProducerSequenceWithWrapAroundBatchRecord STARTED

kafka.log.ProducerStateManagerTest > 
testProducerSequenceWithWrapAroundBatchRecord PASSED

kafka.log.ProducerStateManagerTest > testCoordinatorFencing STARTED

kafka.log.ProducerStateManagerTest > testCoordinatorFencing PASSED

kafka.log.ProducerStateManagerTest > testTruncate STARTED

kafka.log.ProducerStateManagerTest > testTruncate PASSED

kafka.log.ProducerStateManagerTest > testLoadFromTruncatedSnapshotFile STARTED

kafka.log.ProducerStateManagerTest > testLoadFromTruncatedSnapshotFile PASSED

kafka.log.ProducerStateManagerTest > testRemoveExpiredPidsOnReload STARTED

kafka.log.ProducerStateManagerTest > testRemoveExpiredPidsOnReload PASSED

kafka.log.ProducerStateManagerTest > 
testOutOfSequenceAfterControlRecordEpochBump STARTED

kafka.log.ProducerStateManagerTest > 
testOutOfSequenceAfterControlRecordEpochBump PASSED

kafka.log.ProducerStateManagerTest > testFirstUnstableOffsetAfterTruncation 
STARTED

kafka.log.ProducerStateManagerTest > testFirstUnstableOffsetAfterTruncation 
PASSED

kafka.log.ProducerStateManagerTest > testTakeSnapshot STARTED

kafka.log.ProducerStateManagerTest > testTakeSnapshot PASSED

kafka.log.ProducerStateManagerTest > testDeleteSnapshotsBefore STARTED

kafka.log.ProducerStateManagerTest > testDeleteSnapshotsBefore PASSED

kafka.log.ProducerStateManagerTest > 
testNonMatchingTxnFirstOffsetMetadataNotCached STARTED

kafka.log.ProducerStateManagerTest > 
testNonMatchingTxnFirstOffsetMetadataNotCached PASSED

kafka.log.ProducerStateManagerTest > testAppendEmptyControlBatch STARTED

kafka.log.ProducerStateManagerTest > testAppendEmptyControlBatch PASSED

kafka.log.ProducerStateManagerTest > testFirstUnstableOffsetAfterEviction 
STARTED

kafka.log.ProducerStateManagerTest > testFirstUnstableOffsetAfterEviction PASSED

kafka.log.ProducerStateManagerTest > testNoValidationOnFirstEntryWhenLoadingLog 
STARTED

kafka.log.ProducerStateManagerTest > testNoValidationOnFirstEntryWhenLoadingLog 
PASSED

kafka.log.ProducerStateManagerTest > testLoadFromEmptySnapshotFile STARTED

kafka.log.ProducerStateManagerTest > testLoadFromEmptySnapshotFile PASSED

kafka.log.ProducerStateManagerTest > 
testProducersWithOngoingTransactionsDontExpire STARTED

kafka.log.ProducerStateManagerTest > 
testProducersWithOngoingTransactionsDontExpire PASSED

kafka.log.ProducerStateManagerTest > testBasicIdMapping STARTED

kafka.log.ProducerStateManagerTest > testBasicIdMapping PASSED

kafka.log.ProducerStateManagerTest > updateProducerTransactionState STARTED

kafka.log.ProducerStateManagerTest > updateProducerTransactionState PASSED

kafka.log.ProducerStateManagerTest > testRecoverFromSnapshot STARTED

kafka.log.ProducerStateManagerTest > testRecoverFromSnapshot PASSED

kafka.log.ProducerStateManagerTest > testPrepareUpdateDoesNotMutate STARTED

kafka.log.ProducerStateManagerTest > testPrepareUpdateDoesNotMutate PASSED

kafka.log.ProducerStateManagerTest > 
testSequenceNotValidatedForGroupMetadataTopic STARTED

kafka.log.ProducerStateManagerTest > 
testSequenceNotValidatedForGroupMetadataTopic PASSED

kafka.log.ProducerStateManagerTest > testLastStableOffsetCompletedTxn STARTED

kafka.log.ProducerStateManagerTest > testLastS

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

2019-06-19 Thread Bruno Cadonna
Hi Guozhang,

Regarding your comments about the wiki page:

1) Exactly, I rephrased the paragraph to make it more clear.

2) Yes, I used the wrong term. All hit related metrics are ratios. I
corrected the names of the affected metrics.

Regarding your meta comments:

1) The plan is to expose the hit ratio. I used the wrong term. The
formulas compute ratios. Regarding your question about a metric to
know from where a read is served when it is not in the memtable, there
are metrics in RocksDB that give you the number of get() queries that
are served from L0, L1, and L2_AND_UP. I could not find any metric
that give you information about whether a query was served from disk
vs. OS cache. One metric that could be used to indirectly measure
whether disk or OS cache is accessed seems to be READ_BLOCK_GET_MICROS
that gives you the time for an IO read of a block. If it is high, it
was read from disk, otherwise from the OS cache. A similar strategy to
monitor the performance is described in [1]. DISCLAIMER:
READ_BLOCK_GET_MICROS is not documented. I had to look into the C++
code to understand its meaning. I could have missed something.

2) There are some additional compaction statistics that contain sizes
of files on disk and numbers about write amplification that you can
get programmatically in RocksDB, but they are for debugging purposes
[2]. To get this data and publish it into a metric, one has to parse a
string. Since this data is for debugging purposes, I do not know how
stable the output format is. One thing, we could do, is to dump the
string with the compaction statistics into our log files at DEBUG
level. But that is outside of the scope of this KIP.

Best,
Bruno

[1] 
https://github.com/facebook/rocksdb/wiki/Perf-Context-and-IO-Stats-Context#block-cache-and-os-page-cache-efficiency
[2] 
https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#rocksdb-statistics

On Tue, Jun 18, 2019 at 8:24 PM Guozhang Wang  wrote:
>
> Hello Bruno,
>
> I've read through the aggregation section and I think they look good to me.
> There are a few minor comments about the wiki page itself:
>
> 1) A state store might consist of multiple state stores -> You mean a
> `logical` state store be consistent of multiple `physical` store instances?
>
> 2) The "Hit Rates" calculation seems to be referring to the `Hit Ratio`
> (which is a percentage) than `Hit Rate`?
>
> And a couple further meta comments:
>
> 1) For memtable / block cache, instead of the hit-rate do you think we
> should expose the hit-ratio? I felt it is more useful for users to debug
> what's the root cause of unexpected slow performance.
>
> And for block cache misses, is it easy to provide a metric as of "target
> read" of where a read is served (from which level, either in OS cache or in
> SST files), similar to Fig.11 in
> http://cidrdb.org/cidr2017/papers/p82-dong-cidr17.pdf?
>
> 2) As @Patrik mentioned, is there a good way we can expose the total amount
> of memory and disk usage for each state store as well? I think it would
> also be very helpful for users to understand their capacity needs and read
> / write amplifications.
>
>
> Guozhang
>
> On Fri, Jun 14, 2019 at 6:55 AM Bruno Cadonna  wrote:
>
> > Hi,
> >
> > I decided to go for the option in which metrics are exposed for each
> > logical state store. I revisited the KIP correspondingly and added a
> > section on how to aggregate metrics over multiple physical RocksDB
> > instances within one logical state store. Would be great, if you could
> > take a look and give feedback. If nobody has complaints about the
> > chosen option I would proceed with voting on this KIP since this was
> > the last open question.
> >
> > Best,
> > Bruno
> >
> > On Fri, Jun 7, 2019 at 9:38 PM Patrik Kleindl  wrote:
> > >
> > > Hi Sophie
> > > This will be a good change, I have been thinking about proposing
> > something similar or even passing the properties per store.
> > > RocksDB should probably know how much memory was reserved but maybe does
> > not expose it.
> > > We are limiting it already as you suggested but this is a rather crude
> > tool.
> > > Especially in a larger topology with mixed loads par topic it would be
> > helpful to get more insights which store puts a lot of load on memory.
> > > Regarding the limiting capability, I think I remember reading that those
> > only affect some parts of the memory and others can still exceed this
> > limit. I‘ll try to look up the difference.
> > > Best regards
> > > Patrik
> > >
> > > > Am 07.06.2019 um 21:03 schrieb Sophie Blee-Goldman <
> > sop...@confluent.io>:
> > > >
> > > > Hi Patrik,
> > > >
> > > > As of 2.3 you will be able to use the RocksDBConfigSetter to
> > effectively
> > > > bound the total memory used by RocksDB for a single app instance. You
> > > > should already be able to limit the memory used per rocksdb store,
> > though
> > > > as you mention there can be a lot of them. I'm not sure you can
> > monitor the
> > > > memory usage if you are not li

[jira] [Created] (KAFKA-8567) Application kafka-configs.sh can’t describe the confings of a topic

2019-06-19 Thread Alvaro Peris (JIRA)
Alvaro Peris created KAFKA-8567:
---

 Summary: Application kafka-configs.sh can’t describe the confings 
of a topic
 Key: KAFKA-8567
 URL: https://issues.apache.org/jira/browse/KAFKA-8567
 Project: Kafka
  Issue Type: Bug
  Components: clients, config
Affects Versions: 2.2.1
Reporter: Alvaro Peris


The CLI application kafka-configs.sh can’t describe the confings of a topic 
when we try to connect vía bootstrap-server using ACLs in the 2.2 version. This 
is posible by using the Java API.


 The problem may be in the ConfigCommand.scala object, maybe you have to adjust 
the following validation:
{quote} if (entityTypeVals.contains(ConfigType.Client) || 
entityTypeVals.contains(ConfigType.Topic) || 
entityTypeVals.contains(ConfigType.User))

    CommandLineUtils.checkRequiredArgs(parser, options, zkConnectOpt, 
entityType)
{quote}
Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-455: Create an Administrative API for Replica Reassignment

2019-06-19 Thread Stanislav Kozlovski
Hey there Colin,

Thanks for the work on this KIP. It is a much-needed improvement and I'm
excited to see it. Sorry for coming in so late to the discussion, I have
one question to better understand the change and a small suggestion.

I see we allow reassignment cancellation at the partition level - what is
the motivation behind that? I think that having the back-end structures
support it is a good choice since it allows us more flexibility in the
future but what are the reasons for allowing a user to cancel at a
partition level? I think allowing it might let users shoot themselves in
the foot easier and make tools harder to implement (needing to guard
against it).

In all regards, what do you think about an ease of use improvement where we
allow a user to cancel all reassignments for a topic without specifying its
partitions? Essentially, we could cancel all reassignments for a topic if
the Partitions field in AlterPartitionAssignmentsRequest is null.

Best,
Stanislav

On Mon, May 6, 2019 at 5:42 PM Colin McCabe  wrote:

> On Mon, May 6, 2019, at 07:39, Ismael Juma wrote:
> > Hi Colin,
> >
> > A quick comment.
> >
> > On Sat, May 4, 2019 at 11:18 PM Colin McCabe  wrote:
> >
> > > The big advantage of doing batching on the controller is that the
> > > controller has more information about what is going on in the
> cluster.  So
> > > it can schedule reassignments in a more optimal way.  For instance, it
> can
> > > schedule reassignments so that the load is distributed evenly across
> > > nodes.  This advantage is lost if we have to adhere to a rigid ordering
> > > that is set up in advance.  We don't know exactly when anything will
> > > complete in any case.  Just because one partition reassignment was
> started
> > > before another doesn't mean it will finish before another.
> >
> >
> > This is not quite true, right? The Controller doesn't know about
> partition
> > sizes, throughput per partition and other such information that external
> > tools like Cruise Control track.
>
> Hi Ismael,
>
> That's a good point, and one I should have included.
>
> I guess when I think about "do batching in the controller" versus "do
> batching in an external system" I tend to think about the information the
> controller could theoretically collect, rather than what it actually does
> :)  But certainly, adding this information to the controller would be a
> significant change, and maybe one we don't want to do if the external
> systems work well enough.
>
> Thinking about this a little bit more, I can see three advantages to
> controller-side batching.  Firstly, doing batching in the controller saves
> memory because we don't use a separate JVM, and don't duplicate the
> in-memory map of all the partitions.  Secondly, the information we're
> acting on would also be more up-to-date.  (I'm not sure how important this
> would be.)  Finally, it's one less thing to deploy.  I don't know if those
> are really enough to motivate switching now, but in a greenfield system I
> would probably choose controller-side rebalancing.
>
> In any case, this KIP is orthogonal to controller-side rebalancing versus
> external rebalancing.  That's why the KIP states that we will continue to
> perform all the given partition rebalances immediately.  I was just
> responding to the idea that maybe we should have an "ordering" of
> rebalancing partitions.  I don't think we want that, for controller-side
> rebalancing or externally batched rebalancing.
>
> best,
> Colin
>


-- 
Best,
Stanislav


[jira] [Created] (KAFKA-8566) Force Topic Deletion

2019-06-19 Thread Viktor Somogyi-Vass (JIRA)
Viktor Somogyi-Vass created KAFKA-8566:
--

 Summary: Force Topic Deletion
 Key: KAFKA-8566
 URL: https://issues.apache.org/jira/browse/KAFKA-8566
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


Forcing topic deletion is sometimes useful if normal topic deletion gets 
stucked. In this case we want to remove the topic data from zookeeper anyway 
and also the segment files from the online brokers. On offline brokers it could 
be done on startup somehow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8565) Random IllegalStateException in producer

2019-06-19 Thread Patrick Oswald (JIRA)
Patrick Oswald created KAFKA-8565:
-

 Summary: Random IllegalStateException in producer
 Key: KAFKA-8565
 URL: https://issues.apache.org/jira/browse/KAFKA-8565
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.2.0
 Environment: confluent-consumer 2.12
Cluster with:
* 5 Kafka vms
* 3 Zookeeper vms
* 1 rest-proxy vm
* 1 schema-registry vm
Reporter: Patrick Oswald


While writing into a topic i get an Uncaught Error in kafka producer I/O thread.
{noformat}
[2019-06-19 10:51:40,489] ERROR [Producer clientId=console-producer] Uncaught 
error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.IllegalStateException: No entry found for connection 0
at 
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:339)
at 
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:143)
at 
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:926)
at 
org.apache.kafka.clients.NetworkClient.access$700(NetworkClient.java:67)
at 
org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1090)
at 
org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:976)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:533)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:311)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
at java.base/java.lang.Thread.run(Thread.java:834)

{noformat}

The complete output looks like this:
{noformat}
bin/kafka-topics.sh --list --bootstrap-server 172.23.84.155:9092
test
patrick.oswald:10:24 
-> ~/kafka/kafka_2.12-2.2.0 $  bin/kafka-console-producer.sh --broker-list 
kafka02:9092 --topic test
>message 1
[2019-06-19 10:24:46,690] ERROR [Producer clientId=console-producer] Uncaught 
error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.IllegalStateException: No entry found for connection 0
at 
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:339)
at 
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:143)
at 
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:926)
at 
org.apache.kafka.clients.NetworkClient.access$700(NetworkClient.java:67)
at 
org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1090)
at 
org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:976)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:533)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:311)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
at java.base/java.lang.Thread.run(Thread.java:834)
>^[[A^Cpatrick.oswald:10:25 
-> ~/kafka/kafka_2.12-2.2.0 $ 130 bin/kafka-console-producer.sh --broker-list 
kafka01.services-lpz.rsint.net:9092 --topic test
>message 1
[2019-06-19 10:25:23,799] ERROR [Producer clientId=console-producer] Uncaught 
error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.IllegalStateException: No entry found for connection 0
at 
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:339)
at 
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:143)
at 
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:926)
at 
org.apache.kafka.clients.NetworkClient.access$700(NetworkClient.java:67)
at 
org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:1090)
at 
org.apache.kafka.clients.NetworkClient$DefaultMetadataUpdater.maybeUpdate(NetworkClient.java:976)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:533)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:311)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
at java.base/java.lang.Thread.run(Thread.java:834)
[2019-06-19 10:25:23,828] ERROR [Producer clientId=console-producer] Uncaught 
error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.IllegalStateException: No entry found for connection 0
at 
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:339)
at 
org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:143)
at 
org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkCli

[jira] [Created] (KAFKA-8564) NullPointerException when loading logs at startup

2019-06-19 Thread Mickael Maison (JIRA)
Mickael Maison created KAFKA-8564:
-

 Summary: NullPointerException when loading logs at startup
 Key: KAFKA-8564
 URL: https://issues.apache.org/jira/browse/KAFKA-8564
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 2.2.1, 2.3.0
Reporter: Mickael Maison


If brokers restart when topics are being deleted, it's possible to end up with 
a partition folder with the deleted suffix but without any log segments:

{{ls -la ./kafka-logs/3part3rep5-1.f2ce83b86df9416abe50d2e2299009c2-delete/
total 8
drwxr-xr-x@  4 mickael  staff   128  6 Jun 14:35 .
drwxr-xr-x@ 61 mickael  staff  1952  6 Jun 14:35 ..
-rw-r--r--@  1 mickael  staff10  6 Jun 14:32 23261863.snapshot
-rw-r--r--@  1 mickael  staff 0  6 Jun 14:35 leader-epoch-checkpoint}}

>From 2.2.1, brokers fail to start when loading such folders:

{{[2019-06-19 09:40:48,123] ERROR There was an error in one of the threads 
during logs loading: java.lang.NullPointerException (kafka.log.LogManager)
[2019-06-19 09:40:48,126] ERROR [KafkaServer id=1] Fatal error during 
KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.NullPointerException
at kafka.log.Log.activeSegment(Log.scala:1896)
at kafka.log.Log.(Log.scala:295)
at kafka.log.Log$.apply(Log.scala:2186)
at kafka.log.LogManager.loadLog(LogManager.scala:275)
at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:345)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:63)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)}}

With 2.2.0, upon loading such folders, brokers create a new empty log segment 
and load that successfully.

The change of behaviour was introduced in 
https://github.com/apache/kafka/commit/f000dab5442ce49c4852823c257b4fb0cdfe15aa




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : kafka-trunk-jdk11 #643

2019-06-19 Thread Apache Jenkins Server
See