[jira] [Created] (KAFKA-3395) prefix job id to internal topic names
Yasuhiro Matsuda created KAFKA-3395: --- Summary: prefix job id to internal topic names Key: KAFKA-3395 URL: https://issues.apache.org/jira/browse/KAFKA-3395 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.0.1 Reporter: Yasuhiro Matsuda Fix For: 0.10.0.0 Names of internal repartition topics are not prefixed by a job id right now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3271) Notification upon unclean leader election
Yasuhiro Matsuda created KAFKA-3271: --- Summary: Notification upon unclean leader election Key: KAFKA-3271 URL: https://issues.apache.org/jira/browse/KAFKA-3271 Project: Kafka Issue Type: New Feature Components: clients, core Reporter: Yasuhiro Matsuda Priority: Minor It is a legitimate restriction that unclean leader election results in some message loss. That said, it is always good to try to minimize the message loss. A notification of unclean leader election can reduce message loss in the following scenario. 1. The latest offset is L. 2. A consumer is at C, where C < L 3. A slow broker (not in ISR) is at S, where S < C 4. All brokers in ISR die. 5. The slow broker becomes a leader by unclean leader election. 6. Now the offset of S. 7. The new messages get offsets S, S+1, S+2, and so on. Currently the consumer won't receive new messages of offsets between S and C. However, if the consumer is notified when unclean leader election happened and resets its offset to S, it can receive new messages between S and C. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3262) Make KafkaStreams debugging friendly
[ https://issues.apache.org/jira/browse/KAFKA-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-3262: Description: Current KafkaStreams polls records in the same thread as the data processing thread. This makes debugging user code, as well as KafkaStreams itself, difficult. When the thread is suspended by the debugger, the next heartbeat of the consumer tie to the thread won't be send until the thread is resumed. This often results in missed heartbeats and causes a group rebalance. So it may will be a completely different context then the thread hits the break point the next time. We should consider using separate threads for polling and processing. > Make KafkaStreams debugging friendly > > > Key: KAFKA-3262 > URL: https://issues.apache.org/jira/browse/KAFKA-3262 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Affects Versions: 0.9.1.0 >Reporter: Yasuhiro Matsuda > > Current KafkaStreams polls records in the same thread as the data processing > thread. This makes debugging user code, as well as KafkaStreams itself, > difficult. When the thread is suspended by the debugger, the next heartbeat > of the consumer tie to the thread won't be send until the thread is resumed. > This often results in missed heartbeats and causes a group rebalance. So it > may will be a completely different context then the thread hits the break > point the next time. > We should consider using separate threads for polling and processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3262) Make KafkaStreams debugging friendly
Yasuhiro Matsuda created KAFKA-3262: --- Summary: Make KafkaStreams debugging friendly Key: KAFKA-3262 URL: https://issues.apache.org/jira/browse/KAFKA-3262 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.1.0 Reporter: Yasuhiro Matsuda -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3245) need a way to specify the number of replicas for change log topics
Yasuhiro Matsuda created KAFKA-3245: --- Summary: need a way to specify the number of replicas for change log topics Key: KAFKA-3245 URL: https://issues.apache.org/jira/browse/KAFKA-3245 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.1.0 Reporter: Yasuhiro Matsuda Currently the number of replicas of auto-created change log topics is one. This make stream processing not fault tolerant. A way to specify the number of replicas in config is desired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3153) Serializer/Deserializer Registration and Type inference
Yasuhiro Matsuda created KAFKA-3153: --- Summary: Serializer/Deserializer Registration and Type inference Key: KAFKA-3153 URL: https://issues.apache.org/jira/browse/KAFKA-3153 Project: Kafka Issue Type: Sub-task Components: kafka streams Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Fix For: 0.9.1.0 This changes the way serializer/deserializer are selected by the framework. The new scheme requires the app dev to register serializers/deserializers for types using API. The framework infers the type of data from topology and uses appropriate serializer/deserializer. This is best effort. Type inference is not always possible due to Java's type erasure. If a type cannot be determined, a user code can supply more information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3108) KStream custom Partitioner for windowed key
Yasuhiro Matsuda created KAFKA-3108: --- Summary: KStream custom Partitioner for windowed key Key: KAFKA-3108 URL: https://issues.apache.org/jira/browse/KAFKA-3108 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.1.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3108) KStream custom StreamPartitioner for windowed key
[ https://issues.apache.org/jira/browse/KAFKA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-3108: Priority: Minor (was: Major) > KStream custom StreamPartitioner for windowed key > - > > Key: KAFKA-3108 > URL: https://issues.apache.org/jira/browse/KAFKA-3108 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Affects Versions: 0.9.1.0 >Reporter: Yasuhiro Matsuda >Assignee: Yasuhiro Matsuda >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3108) KStream custom StreamPartitioner for windowed key
[ https://issues.apache.org/jira/browse/KAFKA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-3108: Summary: KStream custom StreamPartitioner for windowed key (was: KStream custom Partitioner for windowed key) > KStream custom StreamPartitioner for windowed key > - > > Key: KAFKA-3108 > URL: https://issues.apache.org/jira/browse/KAFKA-3108 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Affects Versions: 0.9.1.0 >Reporter: Yasuhiro Matsuda >Assignee: Yasuhiro Matsuda > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3060) Refactor MeteredXXStore
Yasuhiro Matsuda created KAFKA-3060: --- Summary: Refactor MeteredXXStore Key: KAFKA-3060 URL: https://issues.apache.org/jira/browse/KAFKA-3060 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.0.1 Reporter: Yasuhiro Matsuda Priority: Minor ** copied from a github comment by Guozhang Wang ** The original motivation of having the MeteredXXStore is to wrap all metrics / logging semantics into one place so they do not need to be re-implemented again, but this seems to be an obstacle with the current pattern now, for example MeteredWindowStore.putAndReturnInternalKey is only used for logging, and MeteredWindowStore.putInternal / MeteredWindowStore.getInternal are never used since only its inner will trigger this function. So how about refactoring this piece as follows: 1. WindowStore only expose two APIs: put(K, V) and get(K, long). 2. Add a RollingRocksDBStores that does not extend any interface, but only implements putInternal, getInternal and putAndReturnInternalKey that uses underlying RocksDBStoreas Segments. 3. RocksDBWindowStore implements WindowStore with an RollingRocksDBStores inner. 4. Let MeteredXXStore only maintain the metrics recording logic, and let different stores implement their own logging logic, since this is now different across different types and are better handled separately. Also some types of stores may not even have a loggingEnabled flag, if it will always log, or will never log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3016) Add KStream-KStream window joins
Yasuhiro Matsuda created KAFKA-3016: --- Summary: Add KStream-KStream window joins Key: KAFKA-3016 URL: https://issues.apache.org/jira/browse/KAFKA-3016 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.0.1 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2984) KTable should send old values along with new values to downstreams
[ https://issues.apache.org/jira/browse/KAFKA-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2984: Summary: KTable should send old values along with new values to downstreams (was: KTable should send old values to downstreams) > KTable should send old values along with new values to downstreams > -- > > Key: KAFKA-2984 > URL: https://issues.apache.org/jira/browse/KAFKA-2984 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Affects Versions: 0.9.0.1 >Reporter: Yasuhiro Matsuda >Assignee: Yasuhiro Matsuda > > Old values are necessary for implementing aggregate functions. KTable should > augment an event with its old value. Basically KTable stream is a stream of > (key, (new value, old value)) internally. The old value may be omitted when > it is not used in the topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2984) KTable should send old values to downstreams
Yasuhiro Matsuda created KAFKA-2984: --- Summary: KTable should send old values to downstreams Key: KAFKA-2984 URL: https://issues.apache.org/jira/browse/KAFKA-2984 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.0.1 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Old values are necessary for implementing aggregate functions. KTable should augment an event with its old value. Basically KTable stream is a stream of (key, (new value, old value)) internally. The old value may be omitted when it is not used in the topology. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2962) Add Join API
Yasuhiro Matsuda created KAFKA-2962: --- Summary: Add Join API Key: KAFKA-2962 URL: https://issues.apache.org/jira/browse/KAFKA-2962 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.1.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2962) Add Simple Join API
[ https://issues.apache.org/jira/browse/KAFKA-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2962: Description: Stream-Table and Table-Table joins > Add Simple Join API > --- > > Key: KAFKA-2962 > URL: https://issues.apache.org/jira/browse/KAFKA-2962 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Affects Versions: 0.9.1.0 >Reporter: Yasuhiro Matsuda >Assignee: Yasuhiro Matsuda > > Stream-Table and Table-Table joins -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2856) add KTable
Yasuhiro Matsuda created KAFKA-2856: --- Summary: add KTable Key: KAFKA-2856 URL: https://issues.apache.org/jira/browse/KAFKA-2856 Project: Kafka Issue Type: Sub-task Components: kafka streams Reporter: Yasuhiro Matsuda KTable is a special type of the stream that represents a changelog of a database table (or a key-value store). A changelog has to meet the following requirements. * Key-value mapping is surjective in the database table (the key must be the primary key). * All insert/update/delete events are delivered in order for the same key * An update event has the whole data (not just delta). * A delete event is represented by the null value. KTable does not necessarily materialized as a local store. It may be materialized when necessary. (see below) KTable supports look-up by key. KTable is materialized implicitly when look-up is necessary. * KTable may be created from a topic. (Base KTable) * KTable may be created from another KTable by filter(), filterOut(), mapValues(). (Derived KTable) * A call to the user supplied function is skipped when the value is null since such an event represents a deletion. * Instead of dropping, events filtered out by filter() or filterOut() are converted to delete events. (Can we avoid this?) * map(), flatMap() and flatMapValues() are not supported since they may violate the changelog requirements A derived KTable may be persisted to a topic by to() or through(). through() creates another base KTable. KTable can be converted to KStream by the toStream() method. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2763) Reduce stream task migrations and initialization costs
[ https://issues.apache.org/jira/browse/KAFKA-2763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda reassigned KAFKA-2763: --- Assignee: Yasuhiro Matsuda > Reduce stream task migrations and initialization costs > -- > > Key: KAFKA-2763 > URL: https://issues.apache.org/jira/browse/KAFKA-2763 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Affects Versions: 0.9.0.0 >Reporter: Yasuhiro Matsuda >Assignee: Yasuhiro Matsuda > > Stream task assignment is not aware of either the previous task assignment or > local states of participating clients. By making the assignment logic aware > of them, we can reduce task migrations and initialization cost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2811) Add standby tasks
[ https://issues.apache.org/jira/browse/KAFKA-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda reassigned KAFKA-2811: --- Assignee: Yasuhiro Matsuda > Add standby tasks > - > > Key: KAFKA-2811 > URL: https://issues.apache.org/jira/browse/KAFKA-2811 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Reporter: Yasuhiro Matsuda >Assignee: Yasuhiro Matsuda > > Restoring local state from state change-log topics can be expensive. To > alleviate this, we want to have an option to keep replications of local > states that are kept up to date. The task assignment logic should be aware of > existence of such replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2811) Add standby tasks
Yasuhiro Matsuda created KAFKA-2811: --- Summary: Add standby tasks Key: KAFKA-2811 URL: https://issues.apache.org/jira/browse/KAFKA-2811 Project: Kafka Issue Type: Sub-task Components: kafka streams Reporter: Yasuhiro Matsuda Restoring local state from state change-log topics can be expensive. To alleviate this, we want to have an option to keep replications of local states that are kept up to date. The task assignment logic should be aware of existence of such replicas. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2763) Reduce stream task migrations and initialization costs
Yasuhiro Matsuda created KAFKA-2763: --- Summary: Reduce stream task migrations and initialization costs Key: KAFKA-2763 URL: https://issues.apache.org/jira/browse/KAFKA-2763 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.0.0 Reporter: Yasuhiro Matsuda Stream task assignment is not aware of either the previous task assignment or local states of participating clients. By making the assignment logic aware of them, we can reduce task migrations and initialization cost. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2727) initialize only the part of the topology relevant to the task
[ https://issues.apache.org/jira/browse/KAFKA-2727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2727: Description: Currently each streaming task initializes the entire topology regardless of the assigned topic-partitions. This is wasteful especially when the topology has local state stores. All local state stores are restored from their change log topics even when are not actually used in the task execution. To fix this, the task initialization should be aware of the relevant subgraph of the topology and initializes only processors and state stores in the subgraph. > initialize only the part of the topology relevant to the task > - > > Key: KAFKA-2727 > URL: https://issues.apache.org/jira/browse/KAFKA-2727 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Affects Versions: 0.9.0.0 >Reporter: Yasuhiro Matsuda >Assignee: Yasuhiro Matsuda > > Currently each streaming task initializes the entire topology regardless of > the assigned topic-partitions. This is wasteful especially when the topology > has local state stores. All local state stores are restored from their change > log topics even when are not actually used in the task execution. To fix > this, the task initialization should be aware of the relevant subgraph of the > topology and initializes only processors and state stores in the subgraph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2727) initialize only the part of the topology relevant to the task
Yasuhiro Matsuda created KAFKA-2727: --- Summary: initialize only the part of the topology relevant to the task Key: KAFKA-2727 URL: https://issues.apache.org/jira/browse/KAFKA-2727 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.0.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2706) Make state stores first class citizens in the processor DAG
Yasuhiro Matsuda created KAFKA-2706: --- Summary: Make state stores first class citizens in the processor DAG Key: KAFKA-2706 URL: https://issues.apache.org/jira/browse/KAFKA-2706 Project: Kafka Issue Type: Sub-task Components: kafka streams Affects Versions: 0.9.0.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2707) Make KStream processor names deterministic
[ https://issues.apache.org/jira/browse/KAFKA-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2707: Summary: Make KStream processor names deterministic (was: Make KStream processor names deteministic) > Make KStream processor names deterministic > -- > > Key: KAFKA-2707 > URL: https://issues.apache.org/jira/browse/KAFKA-2707 > Project: Kafka > Issue Type: Sub-task >Reporter: Yasuhiro Matsuda > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2707) Make KStream processor names deterministic
[ https://issues.apache.org/jira/browse/KAFKA-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2707: Description: Currently KStream processor names are generated from AtomicInteger static member of KStreamImpl. It is incremented every time a new processor is created. The problem is the name depends on the usage history of its use in the same JVM, thus the corresponding processors may have different names in different processes. It makes it difficult to debug. > Make KStream processor names deterministic > -- > > Key: KAFKA-2707 > URL: https://issues.apache.org/jira/browse/KAFKA-2707 > Project: Kafka > Issue Type: Sub-task >Reporter: Yasuhiro Matsuda > > Currently KStream processor names are generated from AtomicInteger static > member of KStreamImpl. It is incremented every time a new processor is > created. The problem is the name depends on the usage history of its use in > the same JVM, thus the corresponding processors may have different names in > different processes. It makes it difficult to debug. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2707) Make KStream processor names deteministic
Yasuhiro Matsuda created KAFKA-2707: --- Summary: Make KStream processor names deteministic Key: KAFKA-2707 URL: https://issues.apache.org/jira/browse/KAFKA-2707 Project: Kafka Issue Type: Sub-task Reporter: Yasuhiro Matsuda -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-2592) Stop Writing the Change-log in store.put() / delete() for Non-transactional Store
[ https://issues.apache.org/jira/browse/KAFKA-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda resolved KAFKA-2592. - Resolution: Won't Fix > Stop Writing the Change-log in store.put() / delete() for Non-transactional > Store > - > > Key: KAFKA-2592 > URL: https://issues.apache.org/jira/browse/KAFKA-2592 > Project: Kafka > Issue Type: Sub-task >Reporter: Guozhang Wang >Assignee: Yasuhiro Matsuda > Fix For: 0.9.0.0 > > > Today we keep a dirty threshold and try to send to change-log in store.put() > / delete() when the threshold has been exceeded. Doing this will largely > increase the likelihood of inconsistent state upon unclean shutdown. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2694) Make a task id be a composite id of a task group id and a partition id
Yasuhiro Matsuda created KAFKA-2694: --- Summary: Make a task id be a composite id of a task group id and a partition id Key: KAFKA-2694 URL: https://issues.apache.org/jira/browse/KAFKA-2694 Project: Kafka Issue Type: Task Components: kafka streams Reporter: Yasuhiro Matsuda Fix For: 0.9.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2694) Make a task id be a composite id of a topic group id and a partition id
[ https://issues.apache.org/jira/browse/KAFKA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2694: Summary: Make a task id be a composite id of a topic group id and a partition id (was: Make a task id be a composite id of a task group id and a partition id) > Make a task id be a composite id of a topic group id and a partition id > --- > > Key: KAFKA-2694 > URL: https://issues.apache.org/jira/browse/KAFKA-2694 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Reporter: Yasuhiro Matsuda > Fix For: 0.9.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-2694) Make a task id be a composite id of a topic group id and a partition id
[ https://issues.apache.org/jira/browse/KAFKA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda reassigned KAFKA-2694: --- Assignee: Yasuhiro Matsuda > Make a task id be a composite id of a topic group id and a partition id > --- > > Key: KAFKA-2694 > URL: https://issues.apache.org/jira/browse/KAFKA-2694 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Reporter: Yasuhiro Matsuda >Assignee: Yasuhiro Matsuda > Fix For: 0.9.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2694) Make a task id be a composite id of a task group id and a partition id
[ https://issues.apache.org/jira/browse/KAFKA-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2694: Issue Type: Sub-task (was: Task) Parent: KAFKA-2590 > Make a task id be a composite id of a task group id and a partition id > -- > > Key: KAFKA-2694 > URL: https://issues.apache.org/jira/browse/KAFKA-2694 > Project: Kafka > Issue Type: Sub-task > Components: kafka streams >Reporter: Yasuhiro Matsuda > Fix For: 0.9.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2350) Add KafkaConsumer pause capability
[ https://issues.apache.org/jira/browse/KAFKA-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640958#comment-14640958 ] Yasuhiro Matsuda commented on KAFKA-2350: - Throwing exceptions makes sense. In addition, I think a consumer should not keep pause/unpause states across rebalance. Forgetting the states makes the consumer/application logic cleaner. Add KafkaConsumer pause capability -- Key: KAFKA-2350 URL: https://issues.apache.org/jira/browse/KAFKA-2350 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson Assignee: Jason Gustafson There are some use cases in stream processing where it is helpful to be able to pause consumption of a topic. For example, when joining two topics, you may need to delay processing of one topic while you wait for the consumer of the other topic to catch up. The new consumer currently doesn't provide a nice way to do this. If you skip poll() or if you unsubscribe, then a rebalance will be triggered and your partitions will be reassigned. One way to achieve this would be to add two new methods to KafkaConsumer: {code} void pause(String... topics); void unpause(String... topics); {code} When a topic is paused, a call to KafkaConsumer.poll will not initiate any new fetches for that topic. After it is unpaused, fetches will begin again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2350) Add KafkaConsumer pause capability
[ https://issues.apache.org/jira/browse/KAFKA-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639460#comment-14639460 ] Yasuhiro Matsuda commented on KAFKA-2350: - Suppose we are using auto assignment. {code} subscribe(topic) // A unsubscribe(partition) // B subscribe(partition) // C {code} If rebalancing happens and the co-ordinator assigns the partition to a different instance, is the client still subscribe to the partition? Rebalance can happen 1) between A and B, 2) between B and C, or 3) after C. Add KafkaConsumer pause capability -- Key: KAFKA-2350 URL: https://issues.apache.org/jira/browse/KAFKA-2350 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson Assignee: Jason Gustafson There are some use cases in stream processing where it is helpful to be able to pause consumption of a topic. For example, when joining two topics, you may need to delay processing of one topic while you wait for the consumer of the other topic to catch up. The new consumer currently doesn't provide a nice way to do this. If you skip poll() or if you unsubscribe, then a rebalance will be triggered and your partitions will be reassigned. One way to achieve this would be to add two new methods to KafkaConsumer: {code} void pause(String... topics); void unpause(String... topics); {code} When a topic is paused, a call to KafkaConsumer.poll will not initiate any new fetches for that topic. After it is unpaused, fetches will begin again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2350) Add KafkaConsumer pause capability
[ https://issues.apache.org/jira/browse/KAFKA-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637219#comment-14637219 ] Yasuhiro Matsuda commented on KAFKA-2350: - I think overloading subscribe/unsubscribe is very confusing. Subscribe/unsubscribe and pause/unpause are two very different behaviors. And overloading same method names is not really simplifying the API. I want pause/unpause to be a pure flow control. It shouldn't be mix up with subscription. Add KafkaConsumer pause capability -- Key: KAFKA-2350 URL: https://issues.apache.org/jira/browse/KAFKA-2350 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson Assignee: Jason Gustafson There are some use cases in stream processing where it is helpful to be able to pause consumption of a topic. For example, when joining two topics, you may need to delay processing of one topic while you wait for the consumer of the other topic to catch up. The new consumer currently doesn't provide a nice way to do this. If you skip poll() or if you unsubscribe, then a rebalance will be triggered and your partitions will be reassigned. One way to achieve this would be to add two new methods to KafkaConsumer: {code} void pause(String... topics); void unpause(String... topics); {code} When a topic is paused, a call to KafkaConsumer.poll will not initiate any new fetches for that topic. After it is unpaused, fetches will begin again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2350) Add KafkaConsumer pause capability
[ https://issues.apache.org/jira/browse/KAFKA-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634229#comment-14634229 ] Yasuhiro Matsuda commented on KAFKA-2350: - Can we have TopicPartition rather than String for finer control? {code} void pause(TopicPartition... partitions) void unpause(TopicPartition... partitions) {code} Add KafkaConsumer pause capability -- Key: KAFKA-2350 URL: https://issues.apache.org/jira/browse/KAFKA-2350 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson Assignee: Jason Gustafson There are some use cases in stream processing where it is helpful to be able to pause consumption of a topic. For example, when joining two topics, you may need to delay processing of one topic while you wait for the consumer of the other topic to catch up. The new consumer currently doesn't provide a nice way to do this. If you skip poll() or if you unsubscribe, then a rebalance will be triggered and your partitions will be reassigned. One way to achieve this would be to add two new methods to KafkaConsumer: {code} void pause(String... topics); void unpause(String... topics); {code} When a topic is paused, a call to KafkaConsumer.poll will not initiate any new fetches for that topic. After it is unpaused, fetches will begin again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2350) Add KafkaConsumer pause capability
[ https://issues.apache.org/jira/browse/KAFKA-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634322#comment-14634322 ] Yasuhiro Matsuda commented on KAFKA-2350: - To me the state preservation is not a requirement. The application code should track pausing states. I feel that a consumer should reset pausing state after a rebalance. Add KafkaConsumer pause capability -- Key: KAFKA-2350 URL: https://issues.apache.org/jira/browse/KAFKA-2350 Project: Kafka Issue Type: Improvement Reporter: Jason Gustafson Assignee: Jason Gustafson There are some use cases in stream processing where it is helpful to be able to pause consumption of a topic. For example, when joining two topics, you may need to delay processing of one topic while you wait for the consumer of the other topic to catch up. The new consumer currently doesn't provide a nice way to do this. If you skip poll() or if you unsubscribe, then a rebalance will be triggered and your partitions will be reassigned. One way to achieve this would be to add two new methods to KafkaConsumer: {code} void pause(String... topics); void unpause(String... topics); {code} When a topic is paused, a call to KafkaConsumer.poll will not initiate any new fetches for that topic. After it is unpaused, fetches will begin again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2260) Allow specifying expected offset on produce
[ https://issues.apache.org/jira/browse/KAFKA-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632259#comment-14632259 ] Yasuhiro Matsuda edited comment on KAFKA-2260 at 7/18/15 4:16 AM: -- Here is the outline of the variant Jay mentioned. - A broker holds a fixed size array of offsets for each partition. Array indexes are hash of keys. In a sense, an array element works as a sub-partition. Sub-partitions do not hold data (messages). All they have are the high water marks. - The broker maintains high water marks for each sub-partition. A sub-partition high water mark is updated when a message whose key belongs to the sub-partition is appended to the log. - An application maintains the high water mark of each partition (not sub-partition!) as it consumes messages. It doesn't need to know anything about sub-partitions in a broker. A produce request is processed as follows. 1. The producer sends the known high water mark of the partition with a message. 2. The broker compares the high water mark in the produce request and the high water mark of the sub-partition corresponding the message key. 3. If the former is greater than or equal to the latter, the broker accepts the produce request. (Note that this is not equality test!) 4. Otherwise, the broker rejects the request. A nice thing about this is that it is easy to increase the concurrency without re-partitioning, and its overhead is predictable. When changing the number of sub-partitions, the broker doesn't have to recompute sub-partition high water marks. It can initialize all array elements with the partition's high water mark. was (Author: yasuhiro.matsuda): Here is the outline of the variant Jay mentioned. - A broker holds a fixed size array of offsets for each partition. Array indexes are hash of keys. In a sense, an array element works as a sub-partition. Sub-partitions do not hold data (messages). All they have are the high water marks. - The broker maintains high water marks for each sub-partition. A sub-partition high water mark is updated when a message whose key belongs to the sub-partition is appended to the log. - An application maintains the high water mark of each partition (not sub-partition!) as it consumes messages. It doesn't need to know anything about sub-partitions in a broker. A produce request is processed as follows. 1. The producer sends the known high water mark of the partition with a message. 2. The broker compares the high water mark in the produce request and the high water mark of the sub-partition corresponding the message key. 3. If the former is greater than the latter, the broker accepts the produce request. (Note that this is not equality test!) 4. Otherwise, the broker rejects the request. A nice thing about this is that it is easy to increase the concurrency without re-partitioning, and its overhead is predictable. When changing the number of sub-partitions, the broker doesn't have to recompute sub-partition high water marks. It can initialize all array elements with the partition's high water mark. Allow specifying expected offset on produce --- Key: KAFKA-2260 URL: https://issues.apache.org/jira/browse/KAFKA-2260 Project: Kafka Issue Type: Improvement Reporter: Ben Kirwin Assignee: Ewen Cheslack-Postava Priority: Minor Attachments: expected-offsets.patch I'd like to propose a change that adds a simple CAS-like mechanism to the Kafka producer. This update has a small footprint, but enables a bunch of interesting uses in stream processing or as a commit log for process state. h4. Proposed Change In short: - Allow the user to attach a specific offset to each message produced. - The server assigns offsets to messages in the usual way. However, if the expected offset doesn't match the actual offset, the server should fail the produce request instead of completing the write. This is a form of optimistic concurrency control, like the ubiquitous check-and-set -- but instead of checking the current value of some state, it checks the current offset of the log. h4. Motivation Much like check-and-set, this feature is only useful when there's very low contention. Happily, when Kafka is used as a commit log or as a stream-processing transport, it's common to have just one producer (or a small number) for a given partition -- and in many of these cases, predicting offsets turns out to be quite useful. - We get the same benefits as the 'idempotent producer' proposal: a producer can retry a write indefinitely and be sure that at most one of those attempts will succeed; and if two producers accidentally write to the end of the partition at once, we can be certain that at least one of them will fail.
[jira] [Commented] (KAFKA-2260) Allow specifying expected offset on produce
[ https://issues.apache.org/jira/browse/KAFKA-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632259#comment-14632259 ] Yasuhiro Matsuda commented on KAFKA-2260: - Here is the outline of the variant Jay mentioned. - A broker holds a fixed size array of offsets for each partition. Array indexes are hash of keys. In a sense, an array element works as a sub-partition. Sub-partitions do not hold data (messages). All they have are the high water marks. - The broker maintains high water marks for each sub-partition. A sub-partition high water mark is updated when a message whose key belongs to the sub-partition is appended to the log. - An application maintains the high water mark of each partition (not sub-partition!) as it consumes messages. It doesn't need to know anything about sub-partitions in a broker. A produce request is processed as follows. 1. The producer sends the known high water mark of the partition with a message. 2. The broker compares the high water mark in the produce request and the high water mark of the sub-partition corresponding the message key. 3. If the former is greater than the latter, the broker accepts the produce request. (Note that this is not equality test!) 4. Otherwise, the broker rejects the request. A nice thing about this is that it is easy to increase the concurrency without re-partitioning, and its overhead is predictable. When changing the number of sub-partitions, the broker doesn't have to recompute sub-partition high water marks. It can initialize all array elements with the partition's high water mark. Allow specifying expected offset on produce --- Key: KAFKA-2260 URL: https://issues.apache.org/jira/browse/KAFKA-2260 Project: Kafka Issue Type: Improvement Reporter: Ben Kirwin Assignee: Ewen Cheslack-Postava Priority: Minor Attachments: expected-offsets.patch I'd like to propose a change that adds a simple CAS-like mechanism to the Kafka producer. This update has a small footprint, but enables a bunch of interesting uses in stream processing or as a commit log for process state. h4. Proposed Change In short: - Allow the user to attach a specific offset to each message produced. - The server assigns offsets to messages in the usual way. However, if the expected offset doesn't match the actual offset, the server should fail the produce request instead of completing the write. This is a form of optimistic concurrency control, like the ubiquitous check-and-set -- but instead of checking the current value of some state, it checks the current offset of the log. h4. Motivation Much like check-and-set, this feature is only useful when there's very low contention. Happily, when Kafka is used as a commit log or as a stream-processing transport, it's common to have just one producer (or a small number) for a given partition -- and in many of these cases, predicting offsets turns out to be quite useful. - We get the same benefits as the 'idempotent producer' proposal: a producer can retry a write indefinitely and be sure that at most one of those attempts will succeed; and if two producers accidentally write to the end of the partition at once, we can be certain that at least one of them will fail. - It's possible to 'bulk load' Kafka this way -- you can write a list of n messages consecutively to a partition, even if the list is much larger than the buffer size or the producer has to be restarted. - If a process is using Kafka as a commit log -- reading from a partition to bootstrap, then writing any updates to that same partition -- it can be sure that it's seen all of the messages in that partition at the moment it does its first (successful) write. There's a bunch of other similar use-cases here, but they all have roughly the same flavour. h4. Implementation The major advantage of this proposal over other suggested transaction / idempotency mechanisms is its minimality: it gives the 'obvious' meaning to a currently-unused field, adds no new APIs, and requires very little new code or additional work from the server. - Produced messages already carry an offset field, which is currently ignored by the server. This field could be used for the 'expected offset', with a sigil value for the current behaviour. (-1 is a natural choice, since it's already used to mean 'next available offset'.) - We'd need a new error and error code for a 'CAS failure'. - The server assigns offsets to produced messages in {{ByteBufferMessageSet.validateMessagesAndAssignOffsets}}. After this changed, this method would assign offsets in the same way -- but if they don't match the offset in the message, we'd return an error instead of completing the write. - To avoid
[jira] [Updated] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance
[ https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2226: Attachment: KAFKA-2226_2015-05-29_10:49:34.patch NullPointerException in TestPurgatoryPerformance Key: KAFKA-2226 URL: https://issues.apache.org/jira/browse/KAFKA-2226 Project: Kafka Issue Type: Bug Reporter: Onur Karaman Assignee: Yasuhiro Matsuda Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, KAFKA-2226_2015-05-29_10:49:34.patch A NullPointerException sometimes shows up in TimerTaskList.remove while running TestPurgatoryPerformance. I’m on the HEAD of trunk. {code} ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 --timeout 20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1) java.lang.NullPointerException at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80) at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128) at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27) at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50) at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71) at kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance
[ https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565133#comment-14565133 ] Yasuhiro Matsuda commented on KAFKA-2226: - Updated reviewboard https://reviews.apache.org/r/34734/diff/ against branch origin/trunk NullPointerException in TestPurgatoryPerformance Key: KAFKA-2226 URL: https://issues.apache.org/jira/browse/KAFKA-2226 Project: Kafka Issue Type: Bug Reporter: Onur Karaman Assignee: Yasuhiro Matsuda Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, KAFKA-2226_2015-05-29_10:49:34.patch A NullPointerException sometimes shows up in TimerTaskList.remove while running TestPurgatoryPerformance. I’m on the HEAD of trunk. {code} ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 --timeout 20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1) java.lang.NullPointerException at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80) at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128) at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27) at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50) at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71) at kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance
[ https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565522#comment-14565522 ] Yasuhiro Matsuda commented on KAFKA-2226: - Updated reviewboard https://reviews.apache.org/r/34734/diff/ against branch origin/trunk NullPointerException in TestPurgatoryPerformance Key: KAFKA-2226 URL: https://issues.apache.org/jira/browse/KAFKA-2226 Project: Kafka Issue Type: Bug Reporter: Onur Karaman Assignee: Yasuhiro Matsuda Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, KAFKA-2226_2015-05-29_10:49:34.patch, KAFKA-2226_2015-05-29_15:04:35.patch A NullPointerException sometimes shows up in TimerTaskList.remove while running TestPurgatoryPerformance. I’m on the HEAD of trunk. {code} ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 --timeout 20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1) java.lang.NullPointerException at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80) at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128) at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27) at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50) at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71) at kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance
[ https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2226: Attachment: KAFKA-2226_2015-05-29_15:04:35.patch NullPointerException in TestPurgatoryPerformance Key: KAFKA-2226 URL: https://issues.apache.org/jira/browse/KAFKA-2226 Project: Kafka Issue Type: Bug Reporter: Onur Karaman Assignee: Yasuhiro Matsuda Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, KAFKA-2226_2015-05-29_10:49:34.patch, KAFKA-2226_2015-05-29_15:04:35.patch A NullPointerException sometimes shows up in TimerTaskList.remove while running TestPurgatoryPerformance. I’m on the HEAD of trunk. {code} ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 --timeout 20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1) java.lang.NullPointerException at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80) at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128) at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27) at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50) at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71) at kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance
[ https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565538#comment-14565538 ] Yasuhiro Matsuda commented on KAFKA-2226: - Updated reviewboard https://reviews.apache.org/r/34734/diff/ against branch origin/trunk NullPointerException in TestPurgatoryPerformance Key: KAFKA-2226 URL: https://issues.apache.org/jira/browse/KAFKA-2226 Project: Kafka Issue Type: Bug Reporter: Onur Karaman Assignee: Yasuhiro Matsuda Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, KAFKA-2226_2015-05-29_10:49:34.patch, KAFKA-2226_2015-05-29_15:04:35.patch, KAFKA-2226_2015-05-29_15:10:24.patch A NullPointerException sometimes shows up in TimerTaskList.remove while running TestPurgatoryPerformance. I’m on the HEAD of trunk. {code} ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 --timeout 20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1) java.lang.NullPointerException at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80) at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128) at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27) at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50) at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71) at kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance
[ https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2226: Attachment: KAFKA-2226_2015-05-29_15:10:24.patch NullPointerException in TestPurgatoryPerformance Key: KAFKA-2226 URL: https://issues.apache.org/jira/browse/KAFKA-2226 Project: Kafka Issue Type: Bug Reporter: Onur Karaman Assignee: Yasuhiro Matsuda Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch, KAFKA-2226_2015-05-29_10:49:34.patch, KAFKA-2226_2015-05-29_15:04:35.patch, KAFKA-2226_2015-05-29_15:10:24.patch A NullPointerException sometimes shows up in TimerTaskList.remove while running TestPurgatoryPerformance. I’m on the HEAD of trunk. {code} ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 --timeout 20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1) java.lang.NullPointerException at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80) at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128) at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27) at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50) at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71) at kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance
[ https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563965#comment-14563965 ] Yasuhiro Matsuda commented on KAFKA-2226: - Updated reviewboard https://reviews.apache.org/r/34734/diff/ against branch origin/trunk NullPointerException in TestPurgatoryPerformance Key: KAFKA-2226 URL: https://issues.apache.org/jira/browse/KAFKA-2226 Project: Kafka Issue Type: Bug Reporter: Onur Karaman Assignee: Yasuhiro Matsuda Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch A NullPointerException sometimes shows up in TimerTaskList.remove while running TestPurgatoryPerformance. I’m on the HEAD of trunk. {code} ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 --timeout 20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1) java.lang.NullPointerException at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80) at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128) at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27) at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50) at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71) at kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance
[ https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2226: Attachment: KAFKA-2226_2015-05-28_17:18:55.patch NullPointerException in TestPurgatoryPerformance Key: KAFKA-2226 URL: https://issues.apache.org/jira/browse/KAFKA-2226 Project: Kafka Issue Type: Bug Reporter: Onur Karaman Assignee: Yasuhiro Matsuda Attachments: KAFKA-2226.patch, KAFKA-2226_2015-05-28_17:18:55.patch A NullPointerException sometimes shows up in TimerTaskList.remove while running TestPurgatoryPerformance. I’m on the HEAD of trunk. {code} ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 --timeout 20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1) java.lang.NullPointerException at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80) at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128) at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27) at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50) at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71) at kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561470#comment-14561470 ] Yasuhiro Matsuda commented on KAFKA-1989: - Thanks. I am looking into it. Would you file a separate ticket for this? New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Fix For: 0.8.3 Attachments: KAFKA-1989.patch, KAFKA-1989_2015-03-20_08:44:57.patch, KAFKA-1989_2015-04-01_13:49:58.patch, KAFKA-1989_2015-04-07_14:59:33.patch, KAFKA-1989_2015-04-08_13:27:59.patch, KAFKA-1989_2015-04-08_14:29:51.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2226) NullPointerException in TestPurgatoryPerformance
[ https://issues.apache.org/jira/browse/KAFKA-2226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561724#comment-14561724 ] Yasuhiro Matsuda commented on KAFKA-2226: - Created reviewboard https://reviews.apache.org/r/34734/diff/ against branch origin/trunk NullPointerException in TestPurgatoryPerformance Key: KAFKA-2226 URL: https://issues.apache.org/jira/browse/KAFKA-2226 Project: Kafka Issue Type: Bug Reporter: Onur Karaman Assignee: Yasuhiro Matsuda Attachments: KAFKA-2226.patch A NullPointerException sometimes shows up in TimerTaskList.remove while running TestPurgatoryPerformance. I’m on the HEAD of trunk. {code} ./bin/kafka-run-class.sh kafka.TestPurgatoryPerformance --key-space-size 10 --keys 3 --num 10 --pct50 50 --pct75 75 --rate 1000 --size 50 --timeout 20 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/Users/okaraman/code/kafka/core/build/dependant-libs-2.10.5/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2015-05-27 10:02:14,782] ERROR [completion thread], Error due to (kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1) java.lang.NullPointerException at kafka.utils.timer.TimerTaskList.remove(TimerTaskList.scala:80) at kafka.utils.timer.TimerTaskEntry.remove(TimerTaskList.scala:128) at kafka.utils.timer.TimerTask$class.cancel(TimerTask.scala:27) at kafka.server.DelayedOperation.cancel(DelayedOperation.scala:50) at kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:71) at kafka.TestPurgatoryPerformance$CompletionQueue$$anon$1.doWork(TestPurgatoryPerformance.scala:263) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2112) make overflowWheel volatile
Yasuhiro Matsuda created KAFKA-2112: --- Summary: make overflowWheel volatile Key: KAFKA-2112 URL: https://issues.apache.org/jira/browse/KAFKA-2112 Project: Kafka Issue Type: Bug Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy overflowWheel in TimingWheel needs to be volatile due to the issue of Double-Checked Locking pattern with JVM. http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2112) make overflowWheel volatile
[ https://issues.apache.org/jira/browse/KAFKA-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2112: Assignee: Yasuhiro Matsuda (was: Joel Koshy) Status: Patch Available (was: Open) make overflowWheel volatile --- Key: KAFKA-2112 URL: https://issues.apache.org/jira/browse/KAFKA-2112 Project: Kafka Issue Type: Bug Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Attachments: KAFKA-2112.patch overflowWheel in TimingWheel needs to be volatile due to the issue of Double-Checked Locking pattern with JVM. http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2112) make overflowWheel volatile
[ https://issues.apache.org/jira/browse/KAFKA-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487830#comment-14487830 ] Yasuhiro Matsuda commented on KAFKA-2112: - Created reviewboard https://reviews.apache.org/r/33028/diff/ against branch origin/trunk make overflowWheel volatile --- Key: KAFKA-2112 URL: https://issues.apache.org/jira/browse/KAFKA-2112 Project: Kafka Issue Type: Bug Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Attachments: KAFKA-2112.patch overflowWheel in TimingWheel needs to be volatile due to the issue of Double-Checked Locking pattern with JVM. http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1989: Attachment: KAFKA-1989_2015-04-08_13:27:59.patch New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch, KAFKA-1989_2015-03-20_08:44:57.patch, KAFKA-1989_2015-04-01_13:49:58.patch, KAFKA-1989_2015-04-07_14:59:33.patch, KAFKA-1989_2015-04-08_13:27:59.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14485950#comment-14485950 ] Yasuhiro Matsuda commented on KAFKA-1989: - Updated reviewboard https://reviews.apache.org/r/31568/diff/ against branch origin/trunk New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch, KAFKA-1989_2015-03-20_08:44:57.patch, KAFKA-1989_2015-04-01_13:49:58.patch, KAFKA-1989_2015-04-07_14:59:33.patch, KAFKA-1989_2015-04-08_13:27:59.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1989: Attachment: KAFKA-1989_2015-04-08_14:29:51.patch New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch, KAFKA-1989_2015-03-20_08:44:57.patch, KAFKA-1989_2015-04-01_13:49:58.patch, KAFKA-1989_2015-04-07_14:59:33.patch, KAFKA-1989_2015-04-08_13:27:59.patch, KAFKA-1989_2015-04-08_14:29:51.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14486099#comment-14486099 ] Yasuhiro Matsuda commented on KAFKA-1989: - Updated reviewboard https://reviews.apache.org/r/31568/diff/ against branch origin/trunk New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch, KAFKA-1989_2015-03-20_08:44:57.patch, KAFKA-1989_2015-04-01_13:49:58.patch, KAFKA-1989_2015-04-07_14:59:33.patch, KAFKA-1989_2015-04-08_13:27:59.patch, KAFKA-1989_2015-04-08_14:29:51.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14484170#comment-14484170 ] Yasuhiro Matsuda commented on KAFKA-1989: - Updated reviewboard https://reviews.apache.org/r/31568/diff/ against branch origin/trunk New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch, KAFKA-1989_2015-03-20_08:44:57.patch, KAFKA-1989_2015-04-01_13:49:58.patch, KAFKA-1989_2015-04-07_14:59:33.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391422#comment-14391422 ] Yasuhiro Matsuda commented on KAFKA-1989: - Updated reviewboard https://reviews.apache.org/r/31568/diff/ against branch origin/trunk New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch, KAFKA-1989_2015-03-20_08:44:57.patch, KAFKA-1989_2015-04-01_13:49:58.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1989: Attachment: KAFKA-1989_2015-04-01_13:49:58.patch New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch, KAFKA-1989_2015-03-20_08:44:57.patch, KAFKA-1989_2015-04-01_13:49:58.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14389764#comment-14389764 ] Yasuhiro Matsuda commented on KAFKA-2013: - Updated reviewboard https://reviews.apache.org/r/31893/diff/ against branch origin/trunk benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch, KAFKA-2013_2015-03-16_13:23:38.patch, KAFKA-2013_2015-03-16_14:13:20.patch, KAFKA-2013_2015-03-16_14:39:07.patch, KAFKA-2013_2015-03-19_16:30:52.patch, KAFKA-2013_2015-03-31_17:30:56.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2013: Attachment: KAFKA-2013_2015-03-31_17:30:56.patch benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch, KAFKA-2013_2015-03-16_13:23:38.patch, KAFKA-2013_2015-03-16_14:13:20.patch, KAFKA-2013_2015-03-16_14:39:07.patch, KAFKA-2013_2015-03-19_16:30:52.patch, KAFKA-2013_2015-03-31_17:30:56.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-527: --- Attachment: KAFKA-527_2015-03-25_12:08:00.patch Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, KAFKA-527_2015-03-16_15:19:29.patch, KAFKA-527_2015-03-19_21:32:24.patch, KAFKA-527_2015-03-25_12:08:00.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380547#comment-14380547 ] Yasuhiro Matsuda commented on KAFKA-527: Updated reviewboard https://reviews.apache.org/r/31742/diff/ against branch origin/trunk Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, KAFKA-527_2015-03-16_15:19:29.patch, KAFKA-527_2015-03-19_21:32:24.patch, KAFKA-527_2015-03-25_12:08:00.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371503#comment-14371503 ] Yasuhiro Matsuda commented on KAFKA-1989: - Updated reviewboard https://reviews.apache.org/r/31568/diff/ against branch origin/trunk New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch, KAFKA-1989_2015-03-20_08:44:57.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1989: Attachment: KAFKA-1989_2015-03-20_08:44:57.patch New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch, KAFKA-1989_2015-03-20_08:44:57.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2013: Attachment: KAFKA-2013_2015-03-19_16:30:52.patch benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch, KAFKA-2013_2015-03-16_13:23:38.patch, KAFKA-2013_2015-03-16_14:13:20.patch, KAFKA-2013_2015-03-16_14:39:07.patch, KAFKA-2013_2015-03-19_16:30:52.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370341#comment-14370341 ] Yasuhiro Matsuda commented on KAFKA-2013: - Updated reviewboard https://reviews.apache.org/r/31893/diff/ against branch origin/trunk benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch, KAFKA-2013_2015-03-16_13:23:38.patch, KAFKA-2013_2015-03-16_14:13:20.patch, KAFKA-2013_2015-03-16_14:39:07.patch, KAFKA-2013_2015-03-19_16:30:52.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370703#comment-14370703 ] Yasuhiro Matsuda commented on KAFKA-527: Updated reviewboard https://reviews.apache.org/r/31742/diff/ against branch origin/trunk Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, KAFKA-527_2015-03-16_15:19:29.patch, KAFKA-527_2015-03-19_21:32:24.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-527: --- Attachment: KAFKA-527_2015-03-19_21:32:24.patch Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, KAFKA-527_2015-03-16_15:19:29.patch, KAFKA-527_2015-03-19_21:32:24.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2013: Attachment: KAFKA-2013_2015-03-16_14:13:20.patch benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch, KAFKA-2013_2015-03-16_13:23:38.patch, KAFKA-2013_2015-03-16_14:13:20.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363843#comment-14363843 ] Yasuhiro Matsuda commented on KAFKA-2013: - Updated reviewboard https://reviews.apache.org/r/31893/diff/ against branch origin/trunk benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch, KAFKA-2013_2015-03-16_13:23:38.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2013: Attachment: KAFKA-2013_2015-03-16_13:23:38.patch benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch, KAFKA-2013_2015-03-16_13:23:38.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363957#comment-14363957 ] Yasuhiro Matsuda commented on KAFKA-2013: - Updated reviewboard https://reviews.apache.org/r/31893/diff/ against branch origin/trunk benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch, KAFKA-2013_2015-03-16_13:23:38.patch, KAFKA-2013_2015-03-16_14:13:20.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2013: Attachment: KAFKA-2013_2015-03-16_14:39:07.patch benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch, KAFKA-2013_2015-03-16_13:23:38.patch, KAFKA-2013_2015-03-16_14:13:20.patch, KAFKA-2013_2015-03-16_14:39:07.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364085#comment-14364085 ] Yasuhiro Matsuda commented on KAFKA-527: Updated reviewboard https://reviews.apache.org/r/31742/diff/ against branch origin/trunk Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, KAFKA-527_2015-03-16_15:19:29.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-527: --- Attachment: KAFKA-527_2015-03-16_15:19:29.patch Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, KAFKA-527_2015-03-16_15:19:29.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-2013) benchmark test for the purgatory
Yasuhiro Matsuda created KAFKA-2013: --- Summary: benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2013: Attachment: KAFKA-2013.patch benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-2013: Status: Patch Available (was: Open) benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355189#comment-14355189 ] Yasuhiro Matsuda edited comment on KAFKA-2013 at 3/10/15 4:53 PM: -- This benchmark test measures the enqueue rate. Parameters - the number of requests - the target enqueue rate (request per second) A request interval follows an exponential distribution. - the timeout (milliseconds) - a distribution of time of request completion (75th percentile and 50th percentile) It follows a log-normal distribution - a data size per request Each request has three keys. All requests have the identical set of keys. After a run it shows the actual target rate and the actual enqueue rate. was (Author: yasuhiro.matsuda): This benchmark test measures the enqueue rate. Parameters - the number of requests - the target enqueue rate (request per second) A request interval follows an exponential distribution. - the timeout (milliseconds) - a distribution of time to request completion (75th percentile and 50th percentile) It follows a log-normal distribution - a data size per request Each request has three keys. All requests have the identical set of keys. After a run it shows the actual target rate and the actual enqueue rate. benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355189#comment-14355189 ] Yasuhiro Matsuda commented on KAFKA-2013: - This benchmark test measures the enqueue rate. Parameters - the number of requests - the target enqueue rate (request per second) A request interval follows an exponential distribution. - the timeout (milliseconds) - a distribution of time to completion (75th percentile and 50th percentile) It follows a log-normal distribution - a data size per request Each request has three keys. All requests have the identical set of keys. After a run it shows the actual target rate and the actual enqueue rate. benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (KAFKA-2013) benchmark test for the purgatory
[ https://issues.apache.org/jira/browse/KAFKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355189#comment-14355189 ] Yasuhiro Matsuda edited comment on KAFKA-2013 at 3/10/15 4:53 PM: -- This benchmark test measures the enqueue rate. Parameters - the number of requests - the target enqueue rate (request per second) A request interval follows an exponential distribution. - the timeout (milliseconds) - a distribution of time to request completion (75th percentile and 50th percentile) It follows a log-normal distribution - a data size per request Each request has three keys. All requests have the identical set of keys. After a run it shows the actual target rate and the actual enqueue rate. was (Author: yasuhiro.matsuda): This benchmark test measures the enqueue rate. Parameters - the number of requests - the target enqueue rate (request per second) A request interval follows an exponential distribution. - the timeout (milliseconds) - a distribution of time to completion (75th percentile and 50th percentile) It follows a log-normal distribution - a data size per request Each request has three keys. All requests have the identical set of keys. After a run it shows the actual target rate and the actual enqueue rate. benchmark test for the purgatory Key: KAFKA-2013 URL: https://issues.apache.org/jira/browse/KAFKA-2013 Project: Kafka Issue Type: Test Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Trivial Attachments: KAFKA-2013.patch We need a micro benchmark test for measuring the purgatory performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352290#comment-14352290 ] Yasuhiro Matsuda commented on KAFKA-527: This patch is mainly aimed at #1 above If you read the patch carefully, there are more for the compression part. It avoids copies to an intermediate buffer (byte array) when we do ByteArrayOutputStream to ByteBuffer, also a copy form ByteBuffer to ByteBuffer when we create a MessageSet from a Message at the end of compression. For the decompression part, your iterator patch looks nice. It seems to make ByteBufferSessageSet.decompress obsolete if you clean up all callers by using your iterator. Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-527: --- Attachment: KAFKA-527.patch Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347441#comment-14347441 ] Yasuhiro Matsuda commented on KAFKA-527: Created reviewboard https://reviews.apache.org/r/31742/diff/ against branch origin/trunk Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-527: --- Assignee: Yasuhiro Matsuda Status: Patch Available (was: Open) Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-527) Compression support does numerous byte copies
[ https://issues.apache.org/jira/browse/KAFKA-527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347478#comment-14347478 ] Yasuhiro Matsuda commented on KAFKA-527: This patch introduces BufferingOutputStream, an alternative for ByteArrayOutputStream. It is backed by a chain of byte arrays, so it does not copy bytes when increasing its capacity. Also, it has a method that writes the content to ByteBuffer directly, so there is no need to create an array instance to transfer the content to ByteBuffer. Lastly, it has a deferred write, which means that you reserve a number of bytes before knowing the value and fill it later. In MessageWriter (a new class), it is used for writing the CRC value and the payload length. On laptop,I tested the performance using TestLinearWriteSpeed with snappy. Previously 26.64786026813998 MB per sec With the patch 35.78401869390889 MB per sec The improvement is about 34% better throughput. Compression support does numerous byte copies - Key: KAFKA-527 URL: https://issues.apache.org/jira/browse/KAFKA-527 Project: Kafka Issue Type: Bug Components: compression Reporter: Jay Kreps Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-527.message-copy.history, KAFKA-527.patch, java.hprof.no-compression.txt, java.hprof.snappy.text The data path for compressing or decompressing messages is extremely inefficient. We do something like 7 (?) complete copies of the data, often for simple things like adding a 4 byte size to the front. I am not sure how this went by unnoticed. This is likely the root cause of the performance issues we saw in doing bulk recompression of data in mirror maker. The mismatch between the InputStream and OutputStream interfaces and the Message/MessageSet interfaces which are based on byte buffers is the cause of many of these. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1989) New purgatory design
Yasuhiro Matsuda created KAFKA-1989: --- Summary: New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1989: Status: Patch Available (was: Open) New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1989: Attachment: KAFKA-1989.patch New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1989) New purgatory design
[ https://issues.apache.org/jira/browse/KAFKA-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14341070#comment-14341070 ] Yasuhiro Matsuda commented on KAFKA-1989: - Created reviewboard https://reviews.apache.org/r/31568/diff/ against branch origin/trunk New purgatory design Key: KAFKA-1989 URL: https://issues.apache.org/jira/browse/KAFKA-1989 Project: Kafka Issue Type: Improvement Components: purgatory Affects Versions: 0.8.2.0 Reporter: Yasuhiro Matsuda Assignee: Yasuhiro Matsuda Priority: Critical Attachments: KAFKA-1989.patch This is a new design of purgatory based on Hierarchical Timing Wheels. https://cwiki.apache.org/confluence/display/KAFKA/Purgatory+Redesign+Proposal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1965) Leaner DelayedItem
[ https://issues.apache.org/jira/browse/KAFKA-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329725#comment-14329725 ] Yasuhiro Matsuda commented on KAFKA-1965: - Updated reviewboard https://reviews.apache.org/r/31199/diff/ against branch origin/trunk Leaner DelayedItem -- Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial Attachments: KAFKA-1965.patch, KAFKA-1965.patch, KAFKA-1965_2015-02-20_15:08:26.patch In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1965) Leaner DelayedItem
[ https://issues.apache.org/jira/browse/KAFKA-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1965: Attachment: KAFKA-1965_2015-02-20_15:08:26.patch Leaner DelayedItem -- Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial Attachments: KAFKA-1965.patch, KAFKA-1965.patch, KAFKA-1965_2015-02-20_15:08:26.patch In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1965) Leaner DelayedItem
[ https://issues.apache.org/jira/browse/KAFKA-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1965: Attachment: KAFKA-1965.patch Leaner DelayedItem -- Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial Attachments: KAFKA-1965.patch, KAFKA-1965.patch In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1965) Leaner DelayedItem
[ https://issues.apache.org/jira/browse/KAFKA-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327795#comment-14327795 ] Yasuhiro Matsuda commented on KAFKA-1965: - Created reviewboard https://reviews.apache.org/r/31199/diff/ against branch origin/trunk Leaner DelayedItem -- Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial Attachments: KAFKA-1965.patch, KAFKA-1965.patch In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1965) Leaner DelayedItem
Yasuhiro Matsuda created KAFKA-1965: --- Summary: Leaner DelayedItem Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1965) Leaner DelayedItem
[ https://issues.apache.org/jira/browse/KAFKA-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1965: Status: Patch Available (was: Open) Leaner DelayedItem -- Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1965) Leaner DelayedItem
[ https://issues.apache.org/jira/browse/KAFKA-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1965: Description: In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. (was: In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. ) Leaner DelayedItem -- Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial Attachments: KAFKA-1965.patch In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1965) Leaner DelayedItem
[ https://issues.apache.org/jira/browse/KAFKA-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1965: Description: In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. Leaner DelayedItem -- Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1965) Leaner DelayedItem
[ https://issues.apache.org/jira/browse/KAFKA-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1965: Attachment: KAFKA-1965.patch Leaner DelayedItem -- Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial Attachments: KAFKA-1965.patch In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1965) Leaner DelayedItem
[ https://issues.apache.org/jira/browse/KAFKA-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yasuhiro Matsuda updated KAFKA-1965: Status: Patch Available (was: Open) Leaner DelayedItem -- Key: KAFKA-1965 URL: https://issues.apache.org/jira/browse/KAFKA-1965 Project: Kafka Issue Type: Improvement Components: purgatory Reporter: Yasuhiro Matsuda Assignee: Joel Koshy Priority: Trivial Attachments: KAFKA-1965.patch In DelayedItem, which is a superclass of DelayedOperation, both the creation timestamp and the length delay are stored. However, all it needs is one timestamp that is the due of the item. -- This message was sent by Atlassian JIRA (v6.3.4#6332)