[jira] [Created] (KAFKA-7849) Warning when adding GlobalKTable
Dmitry Minkovsky created KAFKA-7849: --- Summary: Warning when adding GlobalKTable Key: KAFKA-7849 URL: https://issues.apache.org/jira/browse/KAFKA-7849 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 2.1.0 Reporter: Dmitry Minkovsky Per https://lists.apache.org/thread.html/59c119be8a2723c501e0653fa3ed571e8c09be40d5b5170c151528b5@%3Cusers.kafka.apache.org%3E When I add a GlobalKTable for topic "message-write-service-user-ids-by-email" to my topology, I get this warning: [2019-01-19 12:18:14,008] WARN (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:421) [Consumer clientId=message-write-service-55f2ca4d-0efc-4344-90d3-955f9f5a65fd-StreamThread-2-consumer, groupId=message-write-service] The following subscribed topics are not assigned to any members: [message-write-service-user-ids-by-email] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7536) TopologyTestDriver cannot pre-populate GlobalKTable
Dmitry Minkovsky created KAFKA-7536: --- Summary: TopologyTestDriver cannot pre-populate GlobalKTable Key: KAFKA-7536 URL: https://issues.apache.org/jira/browse/KAFKA-7536 Project: Kafka Issue Type: Bug Reporter: Dmitry Minkovsky I have a GlobalKTable that's defined as {code} GlobalKTable userIdsByEmail = topology .globalTable(USER_IDS_BY_EMAIL.name, USER_IDS_BY_EMAIL.consumed(), Materialized.as("user-ids-by-email")); {code} And the following test in Spock: {code} def topology = // my topology def driver = new TopologyTestDriver(topology, config()) def cleanup() { driver.close() } def "create from email request"() { def store = driver.getKeyValueStore('user-ids-by-email') store.put('string', ByteString.copyFrom(new byte[0])) {code} When I run this, I get the following: {code} [2018-10-23 19:35:27,055] INFO (org.apache.kafka.streams.processor.internals.GlobalStateManagerImpl) mock Restoring state for global store user-ids-by-email java.lang.NullPointerException at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.topic(AbstractProcessorContext.java:115) at org.apache.kafka.streams.state.internals.CachingKeyValueStore.putInternal(CachingKeyValueStore.java:237) at org.apache.kafka.streams.state.internals.CachingKeyValueStore.put(CachingKeyValueStore.java:220) at org.apache.kafka.streams.state.internals.CachingKeyValueStore.put(CachingKeyValueStore.java:38) at org.apache.kafka.streams.state.internals.InnerMeteredKeyValueStore.put(InnerMeteredKeyValueStore.java:206) at org.apache.kafka.streams.state.internals.MeteredKeyValueBytesStore.put(MeteredKeyValueBytesStore.java:117) at pony.message.MessageWriteStreamsTest.create from mailgun email request(MessageWriteStreamsTest.groovy:52) [2018-10-23 19:35:27,189] INFO (org.apache.kafka.streams.processor.internals.StateDirectory) stream-thread [main] Deleting state directory 0_0 for task 0_0 as user calling cleanup. {code} I've noticed that I can {{put()}} to the store if I first write to it with {{driver.pipeInput}}. But otherwise I get the above error. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-5824) Cannot write to key value store provided by ProcessorTopologyTestDriver
Dmitry Minkovsky created KAFKA-5824: --- Summary: Cannot write to key value store provided by ProcessorTopologyTestDriver Key: KAFKA-5824 URL: https://issues.apache.org/jira/browse/KAFKA-5824 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 0.11.0.0 Reporter: Dmitry Minkovsky I am trying to `put()` to a KeyValueStore that I got from ProcessorTopologyTestDriver#getKeyValueStore() as part of setup for a test. The JavaDoc endorses this use-case: * This is often useful in test cases to pre-populate the store before the test case instructs the topology to * {@link #process(String, byte[], byte[]) process an input message}, and/or to check the store afterward. However, the `put()` results in the following error: {{ java.lang.IllegalStateException: This should not happen as offset() should only be called while a record is processed at org.apache.kafka.streams.processor.internals.AbstractProcessorContext.offset(AbstractProcessorContext.java:139) at org.apache.kafka.streams.state.internals.CachingKeyValueStore.put(CachingKeyValueStore.java:193) at org.apache.kafka.streams.state.internals.CachingKeyValueStore.put(CachingKeyValueStore.java:188) at pony.UserEntityTopologySupplierTest.confirm-settings-requests(UserEntityTopologySupplierTest.groovy:81) }} This error seems straightforward: I am not doing the `put` within the context of stream processing. How do I reconcile this with the fact that I am trying to populate the store for a test, which the JavaDoc endorses? Thank you, Dmitry -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-5573) kaka-clients 0.11.0.0 AdminClient#createTopics() does not set configs
[ https://issues.apache.org/jira/browse/KAFKA-5573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Minkovsky resolved KAFKA-5573. - Resolution: Not A Bug Fix Version/s: 0.11.1.0 > kaka-clients 0.11.0.0 AdminClient#createTopics() does not set configs > - > > Key: KAFKA-5573 > URL: https://issues.apache.org/jira/browse/KAFKA-5573 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 0.11.0.0 >Reporter: Dmitry Minkovsky > Fix For: 0.11.1.0 > > > I am creating topics like > ``` >private void createTopics(String[] topics, Map config) { > log.info("creating topics: {} with config: {}", names, config); > CreateTopicsResult result = admin.createTopics( > Arrays > .stream(topics) > .map(topic -> new NewTopic(topic, partitions, replication)) > .collect(Collectors.toList()) > ); > for (Map.Entry> entry : > result.values().entrySet()) { > try { > entry.getValue().get(); > log.info("topic {} created", entry.getKey()); > } catch (InterruptedException | ExecutionException e) { > if (Throwables.getRootCause(e) instanceof > TopicExistsException) { > log.info("topic {} existed", entry.getKey()); > } > } > } > } > ``` > where I call this function like > ``` > Map config = new HashMap<>(); > config.put("cleanup.policy", "compact"); > createTopics(new String[]{"topic"}, config); > ``` > However, when I inspect the topic with > ./kafka-configs.sh --zookeeper localhost:2181 --entity-type topics > --entity-name topic --describe > or > ./kafka-topics.sh --zookeeper localhost:2181 --describe --topic topic > there are no configs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KAFKA-5573) kaka-clients 0.11.0.0 AdminClient#createTopics does not set configs
Dmitry Minkovsky created KAFKA-5573: --- Summary: kaka-clients 0.11.0.0 AdminClient#createTopics does not set configs Key: KAFKA-5573 URL: https://issues.apache.org/jira/browse/KAFKA-5573 Project: Kafka Issue Type: Bug Components: clients Affects Versions: 0.11.0.0 Reporter: Dmitry Minkovsky I am creating topics like ``` private void createTopics(String[] topics, Map config) { log.info("creating topics: {} with config: {}", names, config); CreateTopicsResult result = admin.createTopics( Arrays .stream(topics) .map(topic -> new NewTopic(topic, partitions, replication)) .collect(Collectors.toList()) ); for (Map.Entry> entry : result.values().entrySet()) { try { entry.getValue().get(); log.info("topic {} created", entry.getKey()); } catch (InterruptedException | ExecutionException e) { if (Throwables.getRootCause(e) instanceof TopicExistsException) { log.info("topic {} existed", entry.getKey()); } } } } ``` where I call this function like ``` Map config = new HashMap<>(); config.put("cleanup.policy", "compact"); createTopics(new String[]{"topic"}, config); ``` However, when I inspect the topic with ./kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name topic --describe or ./kafka-topics.sh --zookeeper localhost:2181 --describe --topic topic there are no configs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (KAFKA-4628) Support KTable/GlobalKTable Joins
[ https://issues.apache.org/jira/browse/KAFKA-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041869#comment-16041869 ] Dmitry Minkovsky edited comment on KAFKA-4628 at 6/7/17 11:15 PM: -- on KAFKA-4880 (a dupe of this ticket), @Michael Noll writes: > FWIW, I wonder how much interest there actually is for this functionality. > Users have been requesting stream-globalTable joins, but personally I have > yet to run into a person that wants table-globalTable joins. Just saying. I hadn't noticed that GlobalKTables were severely limited and assumed they had all the functionality of regular tables. I had been really excited to use them for "hydration"/denormalization. I have two models which will be present in my application in different quantities. Instances of one model will be very few in quantity compared to instances of the second model. I want to "hydrate" instances of the high-quantity model with instances of the low-quantity model, which can be available in its entirety on all streams app instances. KTable/KTable joins are great for denormalization because you can drive the update from both sides of the relationship. KTable/GlobalKTable joins would facilitates such updates while removing the need to repartition the left table. was (Author: dminkovsky): on KAFKA-4880 (a dupe of this ticket), @Michael Noll writes: > FWIW, I wonder how much interest there actually is for this functionality. > Users have been requesting stream-globalTable joins, but personally I have > yet to run into a person that wants table-globalTable joins. Just saying. I hadn't noticed that GlobalKTables were severely limited and assumed they had all the functionality of regular tables. I had been really excited to use them for "hydration"/denormalization. I have two models which will be present in my application in different quantities. Instances of one model will be very few in quantity compared to instances of the second model. I want to "hydrate" instances of the high-quantity model with instances of the low-quantity model, which can be available in its entirety at all instances. KTable/KTable joins are great for denormalization because you can drive the update from both sides of the relationship. KTable/GlobalKTable joins would facilitates such updates while removing the need to repartition the left table. > Support KTable/GlobalKTable Joins > - > > Key: KAFKA-4628 > URL: https://issues.apache.org/jira/browse/KAFKA-4628 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Damian Guy > Fix For: 0.11.1.0 > > > In KIP-99 we have added support for GlobalKTables, however we don't currently > support KTable/GlobalKTable joins as they require materializing a state store > for the join. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-4628) Support KTable/GlobalKTable Joins
[ https://issues.apache.org/jira/browse/KAFKA-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041869#comment-16041869 ] Dmitry Minkovsky edited comment on KAFKA-4628 at 6/7/17 11:14 PM: -- on KAFKA-4880 (a dupe of this ticket), @Michael Noll writes: > FWIW, I wonder how much interest there actually is for this functionality. > Users have been requesting stream-globalTable joins, but personally I have > yet to run into a person that wants table-globalTable joins. Just saying. I hadn't noticed that GlobalKTables were severely limited and assumed they had all the functionality of regular tables. I had been really excited to use them for "hydration"/denormalization. I have two models which will be present in my application in different quantities. Instances of one model will be very few in quantity compared to instances of the second model. I want to "hydrate" instances of the high-quantity model with instances of the low-quantity model, which can be available in its entirety at all instances. KTable/KTable joins are great for denormalization because you can drive the update from both sides of the relationship. KTable/GlobalKTable joins would facilitates such updates while removing the need to repartition the left table. was (Author: dminkovsky): on KAFKA-4880 (a dupe of this ticket), @Michael Noll writes: > FWIW, I wonder how much interest there actually is for this functionality. > Users have been requesting stream-globalTable joins, but personally I have > yet to run into a person that wants table-globalTable joins. Just saying. I hadn't noticed that GlobalKTables were severely limited and assumed they had all the functionality of regular tables. I had been really excited to use them for "hydration" or denormalization. I have two models which will be present in my application in different quantities. Instances of one model will be very few in quantity compared to instances of the second model. I want to "hydrate" instances of the high-quantity model with instances of the low-quantity model, which can be available in its entirety at all instances. KTable/KTable joins are great for denormalization because you can drive the update from both sides of the relationship. KTable/GlobalKTable joins would facilitates such updates while removing the need to repartition the left table. > Support KTable/GlobalKTable Joins > - > > Key: KAFKA-4628 > URL: https://issues.apache.org/jira/browse/KAFKA-4628 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Damian Guy > Fix For: 0.11.1.0 > > > In KIP-99 we have added support for GlobalKTables, however we don't currently > support KTable/GlobalKTable joins as they require materializing a state store > for the join. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-4628) Support KTable/GlobalKTable Joins
[ https://issues.apache.org/jira/browse/KAFKA-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041869#comment-16041869 ] Dmitry Minkovsky edited comment on KAFKA-4628 at 6/7/17 11:13 PM: -- on KAFKA-4880 (a dupe of this ticket), @Michale Noll writes: > FWIW, I wonder how much interest there actually is for this functionality. > Users have been requesting stream-globalTable joins, but personally I have > yet to run into a person that wants table-globalTable joins. Just saying. I hadn't noticed that GlobalKTables were severely limited and assumed they had all the functionality of regular tables. I had been really excited to use them for "hydration" or denormalization. I have two models which will be present in my application in different quantities. Instances of one model will be very few in quantity compared to instances of the second model. I want to "hydrate" instances of the high-quantity model with instances of the low-quantity model, which can be available in its entirety at all instances. KTable/KTable joins are great for denormalization because you can drive the update from both sides of the relationship. KTable/GlobalKTable joins would facilitates such updates while removing the need to repartition the left table. was (Author: dminkovsky): on KAFKA-4880 (a dupe of this ticket), Michale Noll writes: > FWIW, I wonder how much interest there actually is for this functionality. > Users have been requesting stream-globalTable joins, but personally I have > yet to run into a person that wants table-globalTable joins. Just saying. I hadn't noticed that GlobalKTables were severely limited and assumed they had all the functionality of regular tables. I had been really excited to use them for "hydration" or denormalization. I have two models which will be present in my application in different quantities. Instances of one model will be very few in quantity compared to instances of the second model. I want to "hydrate" instances of the high-quantity model with instances of the low-quantity model, which can be available in its entirety at all instances. KTable/KTable joins are great for denormalization because you can drive the update from both sides of the relationship. KTable/GlobalKTable joins would facilitates such updates while removing the need to repartition the left table. > Support KTable/GlobalKTable Joins > - > > Key: KAFKA-4628 > URL: https://issues.apache.org/jira/browse/KAFKA-4628 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Damian Guy > Fix For: 0.11.1.0 > > > In KIP-99 we have added support for GlobalKTables, however we don't currently > support KTable/GlobalKTable joins as they require materializing a state store > for the join. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-4628) Support KTable/GlobalKTable Joins
[ https://issues.apache.org/jira/browse/KAFKA-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041869#comment-16041869 ] Dmitry Minkovsky edited comment on KAFKA-4628 at 6/7/17 11:14 PM: -- on KAFKA-4880 (a dupe of this ticket), @Michael Noll writes: > FWIW, I wonder how much interest there actually is for this functionality. > Users have been requesting stream-globalTable joins, but personally I have > yet to run into a person that wants table-globalTable joins. Just saying. I hadn't noticed that GlobalKTables were severely limited and assumed they had all the functionality of regular tables. I had been really excited to use them for "hydration" or denormalization. I have two models which will be present in my application in different quantities. Instances of one model will be very few in quantity compared to instances of the second model. I want to "hydrate" instances of the high-quantity model with instances of the low-quantity model, which can be available in its entirety at all instances. KTable/KTable joins are great for denormalization because you can drive the update from both sides of the relationship. KTable/GlobalKTable joins would facilitates such updates while removing the need to repartition the left table. was (Author: dminkovsky): on KAFKA-4880 (a dupe of this ticket), @Michale Noll writes: > FWIW, I wonder how much interest there actually is for this functionality. > Users have been requesting stream-globalTable joins, but personally I have > yet to run into a person that wants table-globalTable joins. Just saying. I hadn't noticed that GlobalKTables were severely limited and assumed they had all the functionality of regular tables. I had been really excited to use them for "hydration" or denormalization. I have two models which will be present in my application in different quantities. Instances of one model will be very few in quantity compared to instances of the second model. I want to "hydrate" instances of the high-quantity model with instances of the low-quantity model, which can be available in its entirety at all instances. KTable/KTable joins are great for denormalization because you can drive the update from both sides of the relationship. KTable/GlobalKTable joins would facilitates such updates while removing the need to repartition the left table. > Support KTable/GlobalKTable Joins > - > > Key: KAFKA-4628 > URL: https://issues.apache.org/jira/browse/KAFKA-4628 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Damian Guy > Fix For: 0.11.1.0 > > > In KIP-99 we have added support for GlobalKTables, however we don't currently > support KTable/GlobalKTable joins as they require materializing a state store > for the join. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4628) Support KTable/GlobalKTable Joins
[ https://issues.apache.org/jira/browse/KAFKA-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041869#comment-16041869 ] Dmitry Minkovsky commented on KAFKA-4628: - on KAFKA-4880 (a dupe of this ticket), Michale Noll writes: > FWIW, I wonder how much interest there actually is for this functionality. > Users have been requesting stream-globalTable joins, but personally I have > yet to run into a person that wants table-globalTable joins. Just saying. I hadn't noticed that GlobalKTables were severely limited and assumed they had all the functionality of regular tables. I had been really excited to use them for "hydration" or denormalization. I have two models which will be present in my application in different quantities. Instances of one model will be very few in quantity compared to instances of the second model. I want to "hydrate" instances of the high-quantity model with instances of the low-quantity model, which can be available in its entirety at all instances. KTable/KTable joins are great for denormalization because you can drive the update from both sides of the relationship. KTable/GlobalKTable joins would facilitates such updates while removing the need to repartition the left table. > Support KTable/GlobalKTable Joins > - > > Key: KAFKA-4628 > URL: https://issues.apache.org/jira/browse/KAFKA-4628 > Project: Kafka > Issue Type: Sub-task > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Damian Guy > Fix For: 0.11.1.0 > > > In KIP-99 we have added support for GlobalKTables, however we don't currently > support KTable/GlobalKTable joins as they require materializing a state store > for the join. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4750) KeyValueIterator returns null values
[ https://issues.apache.org/jira/browse/KAFKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006486#comment-16006486 ] Dmitry Minkovsky commented on KAFKA-4750: - Affects 0.10.2.1 also > KeyValueIterator returns null values > > > Key: KAFKA-4750 > URL: https://issues.apache.org/jira/browse/KAFKA-4750 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.1.1 >Reporter: Michal Borowiecki >Assignee: Kamal Chandraprakash > Labels: newbie > > The API for ReadOnlyKeyValueStore.range method promises the returned iterator > will not return null values. However, after upgrading from 0.10.0.0 to > 0.10.1.1 we found null values are returned causing NPEs on our side. > I found this happens after removing entries from the store and I found > resemblance to SAMZA-94 defect. The problem seems to be as it was there, when > deleting entries and having a serializer that does not return null when null > is passed in, the state store doesn't actually delete that key/value pair but > the iterator will return null value for that key. > When I modified our serilizer to return null when null is passed in, the > problem went away. However, I believe this should be fixed in kafka streams, > perhaps with a similar approach as SAMZA-94. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (KAFKA-5105) ReadOnlyKeyValueStore range scans are not ordered
[ https://issues.apache.org/jira/browse/KAFKA-5105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Minkovsky updated KAFKA-5105: Description: Following up with this thread https://www.mail-archive.com/users@kafka.apache.org/msg25373.html Although ReadOnlyKeyValueStore's #range() is documented not to returns values in order, it would be great if it would for keys within a single partition. This would facilitate using interactive queries and local state as one would use HBase to index data by prefixed keys. If range returned keys in lexicographical order, I could use interactive queries for all my data needs except search. was: Following up with this thread https://www.mail-archive.com/users@kafka.apache.org/msg25373.html Although ReadOnlyKeyValueStore's #range() is documented not to returns values in order, it would be great if it would for keys within a single partition. This would facilitate using interactive queries and local state as one would use HBase to index data by prefixed keys. > ReadOnlyKeyValueStore range scans are not ordered > - > > Key: KAFKA-5105 > URL: https://issues.apache.org/jira/browse/KAFKA-5105 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.2.0 >Reporter: Dmitry Minkovsky > > Following up with this thread > https://www.mail-archive.com/users@kafka.apache.org/msg25373.html > Although ReadOnlyKeyValueStore's #range() is documented not to returns values > in order, it would be great if it would for keys within a single partition. > This would facilitate using interactive queries and local state as one would > use HBase to index data by prefixed keys. If range returned keys in > lexicographical order, I could use interactive queries for all my data needs > except search. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (KAFKA-5105) ReadOnlyKeyValueStore range scans are not ordered
Dmitry Minkovsky created KAFKA-5105: --- Summary: ReadOnlyKeyValueStore range scans are not ordered Key: KAFKA-5105 URL: https://issues.apache.org/jira/browse/KAFKA-5105 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 0.10.2.0 Reporter: Dmitry Minkovsky Following up with this thread https://www.mail-archive.com/users@kafka.apache.org/msg25373.html Although ReadOnlyKeyValueStore's #range() is documented not to returns values in order, it would be great if it would for keys within a single partition. This would facilitate using interactive queries and local state as one would use HBase to index data by prefixed keys. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4750) KeyValueIterator returns null values
[ https://issues.apache.org/jira/browse/KAFKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15930379#comment-15930379 ] Dmitry Minkovsky commented on KAFKA-4750: - Can confirm I experienced this on 0.10.1 as well. > KeyValueIterator returns null values > > > Key: KAFKA-4750 > URL: https://issues.apache.org/jira/browse/KAFKA-4750 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.1.1 >Reporter: Michal Borowiecki > > The API for ReadOnlyKeyValueStore.range method promises the returned iterator > will not return null values. However, after upgrading from 0.10.0.0 to > 0.10.1.1 we found null values are returned causing NPEs on our side. > I found this happens after removing entries from the store and I found > resemblance to SAMZA-94 defect. The problem seems to be as it was there, when > deleting entries and having a serializer that does not return null when null > is passed in, the state store doesn't actually delete that key/value pair but > the iterator will return null value for that key. > When I modified our serilizer to return null when null is passed in, the > problem went away. However, I believe this should be fixed in kafka streams, > perhaps with a similar approach as SAMZA-94. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (KAFKA-3901) KStreamTransformValues$KStreamTransformValuesProcessor#process() forwards null values
[ https://issues.apache.org/jira/browse/KAFKA-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649270#comment-15649270 ] Dmitry Minkovsky edited comment on KAFKA-3901 at 11/9/16 12:08 AM: --- Also affects 0.10.1.0. How about a PR? was (Author: dminkovsky): Also affects 0.10.1.0 <3 > KStreamTransformValues$KStreamTransformValuesProcessor#process() forwards > null values > - > > Key: KAFKA-3901 > URL: https://issues.apache.org/jira/browse/KAFKA-3901 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Dmitry Minkovsky >Assignee: Guozhang Wang >Priority: Minor > > Did this get missed in [KAFKA-3519: Refactor Transformer's transform / > punctuate to return nullable > value|https://github.com/apache/kafka/commit/40fd456649b5df29d030da46865b5e7e0ca6db15#diff-338c230fd5a15d98550230007651a224]? > I think it may have, because that processor's #punctuate() does not forward > null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3901) KStreamTransformValues$KStreamTransformValuesProcessor#process() forwards null values
[ https://issues.apache.org/jira/browse/KAFKA-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649270#comment-15649270 ] Dmitry Minkovsky commented on KAFKA-3901: - Also affects 0.10.1.0 <3 > KStreamTransformValues$KStreamTransformValuesProcessor#process() forwards > null values > - > > Key: KAFKA-3901 > URL: https://issues.apache.org/jira/browse/KAFKA-3901 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Dmitry Minkovsky >Assignee: Guozhang Wang >Priority: Minor > > Did this get missed in [KAFKA-3519: Refactor Transformer's transform / > punctuate to return nullable > value|https://github.com/apache/kafka/commit/40fd456649b5df29d030da46865b5e7e0ca6db15#diff-338c230fd5a15d98550230007651a224]? > I think it may have, because that processor's #punctuate() does not forward > null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3901) KStreamTransformValues$KStreamTransformValuesProcessor#process() forwards null values
[ https://issues.apache.org/jira/browse/KAFKA-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348624#comment-15348624 ] Dmitry Minkovsky commented on KAFKA-3901: - ...and KStreamTransform$KStreamTransformProcessor#process() does not forward null values. Anyway, I marked this minor because I can use the non-value process, so there is a workaround. > KStreamTransformValues$KStreamTransformValuesProcessor#process() forwards > null values > - > > Key: KAFKA-3901 > URL: https://issues.apache.org/jira/browse/KAFKA-3901 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 >Reporter: Dmitry Minkovsky >Assignee: Guozhang Wang >Priority: Minor > > Did this get missed in [KAFKA-3519: Refactor Transformer's transform / > punctuate to return nullable > value|https://github.com/apache/kafka/commit/40fd456649b5df29d030da46865b5e7e0ca6db15#diff-338c230fd5a15d98550230007651a224]? > I think it may have, because that processor's #punctuate() does not forward > null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3901) KStreamTransformValues$KStreamTransformValuesProcessor#process() forwards null values
Dmitry Minkovsky created KAFKA-3901: --- Summary: KStreamTransformValues$KStreamTransformValuesProcessor#process() forwards null values Key: KAFKA-3901 URL: https://issues.apache.org/jira/browse/KAFKA-3901 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 0.10.0.0 Reporter: Dmitry Minkovsky Assignee: Guozhang Wang Priority: Minor Did this get missed in [KAFKA-3519: Refactor Transformer's transform / punctuate to return nullable value|https://github.com/apache/kafka/commit/40fd456649b5df29d030da46865b5e7e0ca6db15#diff-338c230fd5a15d98550230007651a224]? I think it may have, because that processor's #punctuate() does not forward null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-3862) Kafka streams documentation: partition<->thread<->task assignment unclear
[ https://issues.apache.org/jira/browse/KAFKA-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336800#comment-15336800 ] Dmitry Minkovsky commented on KAFKA-3862: - Oh hey, I somehow missed 0.10 and Confluent 3.0. Current version of docs are much clearer/don't have this issue. > Kafka streams documentation: partition<->thread<->task assignment unclear > - > > Key: KAFKA-3862 > URL: https://issues.apache.org/jira/browse/KAFKA-3862 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 > Environment: N/A >Reporter: Dmitry Minkovsky >Assignee: Guozhang Wang >Priority: Minor > Fix For: 0.10.0.0 > > > Hi. Not sure if this is the right place to post about documentation (this > isn't actually a software bug). > I believe the following span of documentation is self-contradictory: > !http://i.imgur.com/WM8ada2.png! > In the paragraph text before and after the image, it says the partitions are > *evenly distributed* among the threads. However, in the image, Thread 1 gets > p1 and p3 (4 partitions, total) while Thread 2 gets p2 (2 partitions, total). > That's "even" in the sense that you can't do any better, but still not even, > and in conflict with the text that follows the image: > !http://i.imgur.com/xq3SOre.png! > This doesn't make sense because why would p2 from topicA and topicB go to > different threads? if this were the cause, you couldn't join the partitions, > right? > Either way, it seems like the two things are in conflict. > Finally, the wording: > bq. ...assume that each thread executes a single stream task... > What does "assume" mean here? I've not seen any indication elsewhere in the > documentation that shows you can configure this behavior. Sorry if I missed > this. > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (KAFKA-3862) Kafka streams documentation: partition<->thread<->task assignment unclear
[ https://issues.apache.org/jira/browse/KAFKA-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Minkovsky resolved KAFKA-3862. - Resolution: Fixed Fix Version/s: 0.10.0.0 > Kafka streams documentation: partition<->thread<->task assignment unclear > - > > Key: KAFKA-3862 > URL: https://issues.apache.org/jira/browse/KAFKA-3862 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 > Environment: N/A >Reporter: Dmitry Minkovsky >Assignee: Guozhang Wang >Priority: Minor > Fix For: 0.10.0.0 > > > Hi. Not sure if this is the right place to post about documentation (this > isn't actually a software bug). > I believe the following span of documentation is self-contradictory: > !http://i.imgur.com/WM8ada2.png! > In the paragraph text before and after the image, it says the partitions are > *evenly distributed* among the threads. However, in the image, Thread 1 gets > p1 and p3 (4 partitions, total) while Thread 2 gets p2 (2 partitions, total). > That's "even" in the sense that you can't do any better, but still not even, > and in conflict with the text that follows the image: > !http://i.imgur.com/xq3SOre.png! > This doesn't make sense because why would p2 from topicA and topicB go to > different threads? if this were the cause, you couldn't join the partitions, > right? > Either way, it seems like the two things are in conflict. > Finally, the wording: > bq. ...assume that each thread executes a single stream task... > What does "assume" mean here? I've not seen any indication elsewhere in the > documentation that shows you can configure this behavior. Sorry if I missed > this. > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3862) Kafka streams documentation: partition<->thread<->task assignment unclear
[ https://issues.apache.org/jira/browse/KAFKA-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Minkovsky updated KAFKA-3862: Description: Hi. Not sure if this is the right place to post about documentation (this isn't actually a software bug). I believe the following span of documentation is self-contradictory: !http://i.imgur.com/WM8ada2.png! In the paragraph text before and after the image, it says the partitions are *evenly distributed* among the threads. However, in the image, Thread 1 gets p1 and p3 (4 partitions, total) while Thread 2 gets p2 (2 partitions, total). That's "even" in the sense that you can't do any better, but still not even, and in conflict with the text that follows the image: !http://i.imgur.com/xq3SOre.png! This doesn't make sense because why would p2 from topicA and topicB go to different threads? if this were the cause, you couldn't join the partitions, right? Either way, it seems like the two things are in conflict. Finally, the wording: bq. ...assume that each thread executes a single stream task... What does "assume" mean here? I've not seen any indication elsewhere in the documentation that show you can configure this behavior. Sorry if I missed this. Thank you! was: Hi. Not sure if this is the right place to post about documentation (this isn't actually a software bug). I believe the following span of documentation is self-contradictory: !http://i.imgur.com/WM8ada2.png! In the paragraph text before and after the image, it says the partitions are *evenly distributed* among the threads. However, in the image, Thread 1 gets p1 and p3 (4 partitions, total) while Thread 2 gets p2 (2 partitions, total). That's "even" in the sense that you can't do any better, but still not even, and in conflict with the text that follows the image: !http://i.imgur.com/xq3SOre.png! This doesn't make sense because why would p2 from topicA and topicB go to different threads? if this were the cause, you couldn't join the partitions, right? Either way, it seems like the two things are in conflict. Finally, the wording: bq. ...assume that each thread executes a single stream task... What does "assume" mean here. I've not seen any indication elsewhere in the documentation that show you can configure this behavior. Sorry if I missed this. Thank you! > Kafka streams documentation: partition<->thread<->task assignment unclear > - > > Key: KAFKA-3862 > URL: https://issues.apache.org/jira/browse/KAFKA-3862 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 > Environment: N/A >Reporter: Dmitry Minkovsky >Assignee: Guozhang Wang >Priority: Minor > > Hi. Not sure if this is the right place to post about documentation (this > isn't actually a software bug). > I believe the following span of documentation is self-contradictory: > !http://i.imgur.com/WM8ada2.png! > In the paragraph text before and after the image, it says the partitions are > *evenly distributed* among the threads. However, in the image, Thread 1 gets > p1 and p3 (4 partitions, total) while Thread 2 gets p2 (2 partitions, total). > That's "even" in the sense that you can't do any better, but still not even, > and in conflict with the text that follows the image: > !http://i.imgur.com/xq3SOre.png! > This doesn't make sense because why would p2 from topicA and topicB go to > different threads? if this were the cause, you couldn't join the partitions, > right? > Either way, it seems like the two things are in conflict. > Finally, the wording: > bq. ...assume that each thread executes a single stream task... > What does "assume" mean here? I've not seen any indication elsewhere in the > documentation that show you can configure this behavior. Sorry if I missed > this. > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-3862) Kafka streams documentation: partition<->thread<->task assignment unclear
[ https://issues.apache.org/jira/browse/KAFKA-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Minkovsky updated KAFKA-3862: Description: Hi. Not sure if this is the right place to post about documentation (this isn't actually a software bug). I believe the following span of documentation is self-contradictory: !http://i.imgur.com/WM8ada2.png! In the paragraph text before and after the image, it says the partitions are *evenly distributed* among the threads. However, in the image, Thread 1 gets p1 and p3 (4 partitions, total) while Thread 2 gets p2 (2 partitions, total). That's "even" in the sense that you can't do any better, but still not even, and in conflict with the text that follows the image: !http://i.imgur.com/xq3SOre.png! This doesn't make sense because why would p2 from topicA and topicB go to different threads? if this were the cause, you couldn't join the partitions, right? Either way, it seems like the two things are in conflict. Finally, the wording: bq. ...assume that each thread executes a single stream task... What does "assume" mean here? I've not seen any indication elsewhere in the documentation that shows you can configure this behavior. Sorry if I missed this. Thank you! was: Hi. Not sure if this is the right place to post about documentation (this isn't actually a software bug). I believe the following span of documentation is self-contradictory: !http://i.imgur.com/WM8ada2.png! In the paragraph text before and after the image, it says the partitions are *evenly distributed* among the threads. However, in the image, Thread 1 gets p1 and p3 (4 partitions, total) while Thread 2 gets p2 (2 partitions, total). That's "even" in the sense that you can't do any better, but still not even, and in conflict with the text that follows the image: !http://i.imgur.com/xq3SOre.png! This doesn't make sense because why would p2 from topicA and topicB go to different threads? if this were the cause, you couldn't join the partitions, right? Either way, it seems like the two things are in conflict. Finally, the wording: bq. ...assume that each thread executes a single stream task... What does "assume" mean here? I've not seen any indication elsewhere in the documentation that show you can configure this behavior. Sorry if I missed this. Thank you! > Kafka streams documentation: partition<->thread<->task assignment unclear > - > > Key: KAFKA-3862 > URL: https://issues.apache.org/jira/browse/KAFKA-3862 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.0.0 > Environment: N/A >Reporter: Dmitry Minkovsky >Assignee: Guozhang Wang >Priority: Minor > > Hi. Not sure if this is the right place to post about documentation (this > isn't actually a software bug). > I believe the following span of documentation is self-contradictory: > !http://i.imgur.com/WM8ada2.png! > In the paragraph text before and after the image, it says the partitions are > *evenly distributed* among the threads. However, in the image, Thread 1 gets > p1 and p3 (4 partitions, total) while Thread 2 gets p2 (2 partitions, total). > That's "even" in the sense that you can't do any better, but still not even, > and in conflict with the text that follows the image: > !http://i.imgur.com/xq3SOre.png! > This doesn't make sense because why would p2 from topicA and topicB go to > different threads? if this were the cause, you couldn't join the partitions, > right? > Either way, it seems like the two things are in conflict. > Finally, the wording: > bq. ...assume that each thread executes a single stream task... > What does "assume" mean here? I've not seen any indication elsewhere in the > documentation that shows you can configure this behavior. Sorry if I missed > this. > Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-3862) Kafka streams documentation: partition<->thread<->task assignment unclear
Dmitry Minkovsky created KAFKA-3862: --- Summary: Kafka streams documentation: partition<->thread<->task assignment unclear Key: KAFKA-3862 URL: https://issues.apache.org/jira/browse/KAFKA-3862 Project: Kafka Issue Type: Bug Components: streams Affects Versions: 0.10.0.0 Environment: N/A Reporter: Dmitry Minkovsky Assignee: Guozhang Wang Priority: Minor Hi. Not sure if this is the right place to post about documentation (this isn't actually a software bug). I believe the following span of documentation is self-contradictory: !http://i.imgur.com/WM8ada2.png! In the paragraph text before and after the image, it says the partitions are *evenly distributed* among the threads. However, in the image, Thread 1 gets p1 and p3 (4 partitions, total) while Thread 2 gets p2 (2 partitions, total). That's "even" in the sense that you can't do any better, but still not even, and in conflict with the text that follows the image: !http://i.imgur.com/xq3SOre.png! This doesn't make sense because why would p2 from topicA and topicB go to different threads? if this were the cause, you couldn't join the partitions, right? Either way, it seems like the two things are in conflict. Finally, the wording: bq. ...assume that each thread executes a single stream task... What does "assume" mean here. I've not seen any indication elsewhere in the documentation that show you can configure this behavior. Sorry if I missed this. Thank you! -- This message was sent by Atlassian JIRA (v6.3.4#6332)