[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-15 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869382#comment-15869382
 ] 

huxi commented on KAFKA-4767:
-

I think I might get the point of what you mean. What you really concerns is 
that IO thread might got failed to be shut down or leave an inconsistent state 
if the user thread was interrupted. Am I right?

1. That `this.sender.initiateClose()` is not interruptible which means user 
thread could always be able to initiate a close to the IO thread even after we 
interrupt the user thread somewhere.
2. I do agree to restore the interruption status of the user thread after 
catching the InterruptedException, which is a really good practice.
3. The current logic already considers the situation where user thread does not 
wait enough time to have the IO thread finish its work, so it adds forceClose 
and corresponding code to force close the IO thread. In this case, we don't 
have to explicitly do the same thing again in the catch clause like what you 
suggest.

Do they make any senses?

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Kafka exception

2017-02-15 Thread 揣立武

Hi,all! Our program uses the high level consumer api(the version is 0.8.x). 
Sometimes the program will throw an exception in the 42th row in 
kafka.utils.IteratorTemplate class,the content is "throw new 
IllegalStateException("Expected item but none found.")". 

I think it is a race condition problem between the close thread and the consume 
thread. When the close thread calling the method ConsumerConnector.shutdown(), 
it will set ConsumerIterator's state is NOT_READY. But at the same time, the 
consume thread calls the method ConsumerIterator.hasNext() and goes to the 67th 
row in  kafka.utils.IteratorTemplate class,the content is "if(state == DONE) 
{", the if will be false that means has a item. And when calling the 
ConsumerIterator.next(), it will throw that exception.

Have you ever had this problem? Please tell me how to deal with it, thanks!





[jira] [Updated] (KAFKA-4769) Add Float serializer, deserializer, serde

2017-02-15 Thread Michael Noll (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Noll updated KAFKA-4769:

Status: Patch Available  (was: Open)

> Add Float serializer, deserializer, serde
> -
>
> Key: KAFKA-4769
> URL: https://issues.apache.org/jira/browse/KAFKA-4769
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.1.1, 0.10.2.0
>Reporter: Michael Noll
>Assignee: Michael Noll
>Priority: Minor
>
> We currently provide serializers/deserializers/serdes for a few data types 
> such as String, Long, Double, but not yet for Float.
> Adding built-in support Float is helpful for when e.g. you are using Kafka 
> Connect to write data from a MySQL database, where the field was defined as a 
> FLOAT, so the schema was generated as FLOAT, and you like to subsequently 
> process the data with Kafka Streams.
> Possible workaround:
> Instead of adding Float support, users can manually convert from float to 
> double.  The point of this ticket however is to save the user from being 
> forced to convert manually, thus providing more convenience and slightly 
> better Connect-Streams interoperability in a scenario such as above.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4769) Add Float serializer, deserializer, serde

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869281#comment-15869281
 ] 

ASF GitHub Bot commented on KAFKA-4769:
---

GitHub user miguno opened a pull request:

https://github.com/apache/kafka/pull/2554

KAFKA-4769: Add Float serializer, deserializer, serde



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/miguno/kafka KAFKA-4769

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2554.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2554


commit 6c1be74b5926e93ddca1cc44c733b63f63e6346b
Author: Michael G. Noll 
Date:   2017-02-16T05:33:14Z

KAFKA-4769: Add Float serializer, deserializer, serde




> Add Float serializer, deserializer, serde
> -
>
> Key: KAFKA-4769
> URL: https://issues.apache.org/jira/browse/KAFKA-4769
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.1.1, 0.10.2.0
>Reporter: Michael Noll
>Assignee: Michael Noll
>Priority: Minor
>
> We currently provide serializers/deserializers/serdes for a few data types 
> such as String, Long, Double, but not yet for Float.
> Adding built-in support Float is helpful for when e.g. you are using Kafka 
> Connect to write data from a MySQL database, where the field was defined as a 
> FLOAT, so the schema was generated as FLOAT, and you like to subsequently 
> process the data with Kafka Streams.
> Possible workaround:
> Instead of adding Float support, users can manually convert from float to 
> double.  The point of this ticket however is to save the user from being 
> forced to convert manually, thus providing more convenience and slightly 
> better Connect-Streams interoperability in a scenario such as above.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2554: KAFKA-4769: Add Float serializer, deserializer, se...

2017-02-15 Thread miguno
GitHub user miguno opened a pull request:

https://github.com/apache/kafka/pull/2554

KAFKA-4769: Add Float serializer, deserializer, serde



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/miguno/kafka KAFKA-4769

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2554.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2554


commit 6c1be74b5926e93ddca1cc44c733b63f63e6346b
Author: Michael G. Noll 
Date:   2017-02-16T05:33:14Z

KAFKA-4769: Add Float serializer, deserializer, serde




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-15 Thread Apurva Mehta
Hi Jun,

Answers inline:

210. Pid snapshots: Is the number of pid snapshot configurable or hardcoded
> with 2? When do we decide to roll a new snapshot? Based on time, byte, or
> offset? Is that configurable too?
>

These are good questions. We haven't fleshed out the policy by which the
snapshots will be generated. I guess there will be some scheduled task that
takes snapshots, and we will retain the two latest ones. These will be map
to different points in time, but each will be the complete view of the
PID->Sequence map at the point it time it was created. At start time, the
Pid mapping will be built from the latest snapshot unless it is somehow
corrupt, in which case the older one will be used. Otherwise, the older one
will be ignored.

I don't think there are good reason to keep more than one, to be honest. If
the snapshot is corrupt, we can always rebuild the map from the log itself.
I have updated the doc to state that there will be exactly one snapshot
file.

With one snapshot, we don't need additional configs. Do you agree?


>
> 211. I am wondering if we should store ExpirationTime in the producer
> transactionalId mapping message as we do in the producer transaction status
> message. If a producer only calls initTransactions(), but never publishes
> any data, we still want to be able to expire and remove the producer
> transactionalId mapping message.
>
>
Actually, the document was inaccurate. The transactionalId will be expired
only if there is no active transaction, and the age of the last transaction
with that transactionalId is older than the transactioanlId expiration
time. With these semantics, storing the expiration time in the
transactionalId mapping message won't be useful, since the expiration time
is a moving target based on transaction activity.

I have updated the doc with a clarification.



> 212. The doc says "The LSO is always equal to one less than the minimum of
> the initial offsets across all active transactions". This implies that LSO
> is inclusive. However, currently, both high watermark and log end offsets
> are exclusive. For consistency, it seems that we should make LSO exclusive
> as well.
>

Sounds good. Doc updated.


>
> 213. The doc says "If the topic is configured for compaction and deletion,
> we will use the topic’s own retention limit. Otherwise, we will use the
> default topic retention limit. Once the last message set produced by a
> given PID has aged beyond the retention time, the PID will be expired." For
> topics configured with just compaction, it seems it's more intuitive to
> expire PID based on transactional.id.expiration.ms?
>

This could work. The problem is that all idempotent producers get a PID,
but only transactional producers have a transactionalId. Since they are
separate concepts, it seems better not to conflate PID expiration with the
settings for transactionalId expiration.

Another way of putting it: it seems more natural for the retention of the
PID to be based on the retention of the messages in a topic. The
transactionalId expiration is based on the transaction activity of a
producer, which is quite a different thing.

But I see your point: having these separate can put us in a situation where
the PID gets expired even if the transactionalId is not. To solve this we
should set the default transactionalId expiration time to be less than the
default topic retention time, so that the transactionalId can be expired
before the PID. Does that seem reasonable?


>
> 214. In the Coordinator-Broker request handling section, the doc says "If
> the broker has a corresponding PID, verify that the received epoch is
> greater than or equal to the current epoch. " Is the epoch the coordinator
> epoch or the producer epoch?
>

This is the producer epoch, as known by the coordinator. I have clarified
this in the document.  However, there may be producers with the same PID on
a different epoch, which will be fenced out future calls.


> 215. The doc says "Append to the offset topic, but skip updating the offset
> cache in the delayed produce callback, until a WriteTxnMarkerRequest from
> the transaction coordinator is received including the offset topic
> partitions." How do we do this efficiently? Do we need to cache pending
> offsets per pid?
>

I think we would need to cache the consumerGroup/partition-> offset for the
written but uncommitted offsets. Once the transaction commits, these
entries can be moved to the main cache.

However, I think that the overhead for these duplicate temporary entries
should be marginal because it would be proportional to the number of input
topics per transaction.


>
> 216. Do we need to add any new JMX metrics? For example, on the broker and
> transaction coordinator side, it would be useful to know the number of live
> pids.
>

Yes, I think we will definitely need to introduce new metrics. Some would
be:

1. Number of live PIDs (a proxy for the size of the PID->Sequence map)
2. Current LSO per partition (useful to 

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-15 Thread Jason Gustafson
We have proposed a few significant changes to the message format in KIP-98
which now seems likely to pass (perhaps with some iterations on
implementation details). It would be good to try and coordinate the changes
in both of the proposals to make sure they are consistent and compatible.

I think using the attributes to indicate null headers is a reasonable
approach. We have proposed to do the same thing for the message key and
value. That said, I sympathize with Jay's argument. Having multiple ways to
specify a null value increases the overall complexity of the protocol. You
can see this just from the fact that you need the extra verbiage in the
protocol specification in this KIP and in KIP-98 to describe the dependence
between the fields and the attributes. It seems like a slippery slope if
you start allowing different request types to implement the protocol
specification differently.

You can also argue that the messages already are and are likely to remain a
special case. For example, there is currently no generality in how
compressed message sets are represented that would be applicable for other
request types. Some might see this divergence as an unfortunate protocol
deficiency which should be fixed; others might see it as sort of the
inevitability of needing to optimize where it counts most. I'm probably
somewhere in between, but I think we probably all share the intuition that
the protocol should be kept as consistent as possible. With that in mind,
here are a few comments:

1. One thing I found a little odd when reading the current proposal is that
the headers are both represented as an array of bytes and as an array of
key/value pairs. I'd probably suggest something like this:

Headers => [HeaderKey HeaderValue]
 HeaderKey => String
 HeaderValue => Bytes

An array in the Kafka protocol is represented as a 4-byte integer
indicating the number of elements in the array followed by the
serialization of the elements. Unless I'm misunderstanding, what you have
instead is the total size of the headers in bytes followed by the elements.
I'm not sure I see any reason for this inconsistency.

2. In KIP-98, we've introduced variable-length integer fields. Effectively,
we've enriched (or "complicated" as Jay might say ;) the protocol
specification to include the following types: VarInt, VarLong,
UnsignedVarInt and UnsignedVarLong.

Along with these primitives, we could introduce the following types:

VarSizeArray => NumberOfItems Item1 Item2 .. ItemN
  NumberOfItems => UnsignedVarInt

VarSizeNullableArray => NumberOfItemsOrNull Item1 Item2 .. ItemN
  NumberOfItemsOrNull => VarInt (-1 means null)

And similarly for the `String` and `Bytes` types. These types can save a
considerable amount of space in this proposal because they can be used for
both the number of headers included in the message and the lengths of the
header keys and values. We could do this instead:

Headers => VarSizeArray[HeaderKey HeaderValue]
  HeaderKey => VarSizeString
  HeaderValue => VarSizeBytes

Combining the savings from the use of variable length fields, the benefit
of using the attributes to represent null seems pretty small.

3. Whichever way we go (whether we use the attributes or not), we should at
least be consistent between this KIP and KIP-98. It would be very strange
to have two ways to represent null values in the same schema. Either way is
OK with me. I think some message-level optimizations are justifiable, but
the savings here seem minimal (a few bytes per message), so maybe it's not
worth the cost of letting the message diverge even further from the rest of
the protocol.

-Jason


On Wed, Feb 15, 2017 at 8:52 AM, radai  wrote:

> I've trimmed the inline contents as this mail is getting too big for the
> apache mailing list software to deliver :-(
>
> 1. the important thing for interoperability is for different "interested
> parties" (plugins, infra layers/wrappers, user-code) to be able to stick
> pieces of metadata onto msgs without getting in each other's way. a common
> key scheme (Strings, as of the time of this writing?) is all thats required
> for that. it is assumed that the other end interested in any such piece of
> metadata knows the encoding, and byte[] provides for the most flexibility.
> i believe this is the same logic behind core kafka being byte[]/byte[] -
> Strings are more "usable" but bytes are flexible and so were chosen.
> Also - core kafka doesnt even do that good of a job on usability of the
> payload (example - i have to specify the nop byte[] "decoders" explicitly
> in conf), and again sacrificies usability for the sake of performance (no
> convenient single-record processing as poll is a batch, lots of obscure
> little config details exposing internals of the batching mechanism, etc)
>
> this is also why i really dislike the idea of a "type system" for header
> values, it further degrades the usability, adds complexity and will
> eventually get in people's way, 

[jira] [Created] (KAFKA-4769) Add Float serializer, deserializer, serde

2017-02-15 Thread Michael Noll (JIRA)
Michael Noll created KAFKA-4769:
---

 Summary: Add Float serializer, deserializer, serde
 Key: KAFKA-4769
 URL: https://issues.apache.org/jira/browse/KAFKA-4769
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 0.10.1.1, 0.10.2.0
Reporter: Michael Noll
Assignee: Michael Noll
Priority: Minor


We currently provide serializers/deserializers/serdes for a few data types such 
as String, Long, Double, but not yet for Float.

Adding built-in support Float is helpful for when e.g. you are using Kafka 
Connect to write data from a MySQL database, where the field was defined as a 
FLOAT, so the schema was generated as FLOAT, and you like to subsequently 
process the data with Kafka Streams.

Possible workaround:
Instead of adding Float support, users can manually convert from float to 
double.  The point of this ticket however is to save the user from being forced 
to convert manually, thus providing more convenience and slightly better 
Connect-Streams interoperability in a scenario such as above.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869240#comment-15869240
 ] 

Buğra Gedik edited comment on KAFKA-4767 at 2/16/17 5:26 AM:
-

IMO:
  * An interrupt means immediate exit from whatever you are doing, whereas the 
regular {{close}} flow can take up to {{timeout}}. So the two are different 
scenarios. That's why I thought interrupting the IO thread was needed in this 
case.
  * Join should not be left incomplete just because we were asked to be 
interrupted.
  * The interrupt status of the current thread should be restored.


was (Author: bgedik):
IMO:
  * An interrupt means immediate exit from whatever you are doing, whereas the 
regular {{close}} flow can take up to {{timeout}}. So the two are different 
scenarios. That's why I thought interrupting the IO thread was needed in this 
case.
  * Join should not be left incomplete just because we were asked to be 
interrupted.
  * The interrupt status of the original thread should be restored.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869240#comment-15869240
 ] 

Buğra Gedik commented on KAFKA-4767:


IMO:
  * An interrupt means immediate exit from whatever you are doing, whereas the 
regular {{close}} flow can take up to {{timeout}}. So the two are different 
scenarios. That's why I thought interrupting the IO thread was needed in this 
case.
  * Join should not be left incomplete just because we were asked to be 
interrupted.
  * The interrupt status of the original thread should be restored.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4720) Add KStream.peek(ForeachAction<K,V>)

2017-02-15 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869239#comment-15869239
 ] 

Guozhang Wang commented on KAFKA-4720:
--

[~stevenschlansker] I have added you to contributor list, you can assign JIRAs 
to yourself in the future.

> Add KStream.peek(ForeachAction)
> 
>
> Key: KAFKA-4720
> URL: https://issues.apache.org/jira/browse/KAFKA-4720
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 0.10.1.1
>Reporter: Steven Schlansker
>Assignee: Steven Schlansker
>  Labels: needs-kip
> Fix For: 0.10.3.0
>
>
> Java's Stream provides a handy peek method that observes elements in the 
> stream without transforming or filtering them.  While you can emulate this 
> functionality with either a filter or map, peek provides potentially useful 
> semantic information (doesn't modify the stream) and is much more concise.
> Example usage: using Dropwizard Metrics to provide event counters
> {code}
> KStream s = ...;
> s.map(this::mungeData)
>  .peek((i, s) -> metrics.noteMungedEvent(i, s))
>  .filter(this::hadProcessingError)
>  .print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Buğra Gedik updated KAFKA-4767:
---
Description: 
The {{KafkaProducer}} is not properly joining the thread it creates. The code 
is like this:

{code}
try {
this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
firstException.compareAndSet(null, t);
log.error("Interrupted while joining ioThread", t);
}
{code}

If the code is interrupted while performing the join, it will end up leaving 
the io thread running. The correct way of handling this is a follows:
{code}
try {
this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
// propagate the interrupt
this.ioThread.interrupt();
try { 
 this.ioThread.join();
} catch (InterruptedException t) {
firstException.compareAndSet(null, t);
log.error("Interrupted while joining ioThread", t);
} finally {
// make sure we maintain the interrupted status
Thread.currentThread.interrupt();
}
}
{code}



  was:
The {{KafkaProducer}} is not properly joining the thread it creates. The code 
is like this:

{code}
try {
this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
firstException.compareAndSet(null, t);
log.error("Interrupted while joining ioThread", t);
}
{code}

If the code is interrupted while performing the join, it will end up leaving 
the io thread running. The correct way of handling this is a follows:
{code}
try {
this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
// propagate the interrupt
this.ioThread.interrupt();
try { 
 // join again (if you want to be more accurate, you can re-adjust the 
timeout)
 this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
firstException.compareAndSet(null, t);
log.error("Interrupted while joining ioThread", t);
} finally {
// make sure we maintain the interrupted status
Thread.currentThread.interrupt();
}
}
{code}




> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  this.ioThread.join();
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4720) Add KStream.peek(ForeachAction<K,V>)

2017-02-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang reassigned KAFKA-4720:


Assignee: Steven Schlansker

> Add KStream.peek(ForeachAction)
> 
>
> Key: KAFKA-4720
> URL: https://issues.apache.org/jira/browse/KAFKA-4720
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 0.10.1.1
>Reporter: Steven Schlansker
>Assignee: Steven Schlansker
>  Labels: needs-kip
> Fix For: 0.10.3.0
>
>
> Java's Stream provides a handy peek method that observes elements in the 
> stream without transforming or filtering them.  While you can emulate this 
> functionality with either a filter or map, peek provides potentially useful 
> semantic information (doesn't modify the stream) and is much more concise.
> Example usage: using Dropwizard Metrics to provide event counters
> {code}
> KStream s = ...;
> s.map(this::mungeData)
>  .peek((i, s) -> metrics.noteMungedEvent(i, s))
>  .filter(this::hadProcessingError)
>  .print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4720) Add KStream.peek(ForeachAction<K,V>)

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869229#comment-15869229
 ] 

ASF GitHub Bot commented on KAFKA-4720:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2493


> Add KStream.peek(ForeachAction)
> 
>
> Key: KAFKA-4720
> URL: https://issues.apache.org/jira/browse/KAFKA-4720
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 0.10.1.1
>Reporter: Steven Schlansker
>  Labels: needs-kip
> Fix For: 0.10.3.0
>
>
> Java's Stream provides a handy peek method that observes elements in the 
> stream without transforming or filtering them.  While you can emulate this 
> functionality with either a filter or map, peek provides potentially useful 
> semantic information (doesn't modify the stream) and is much more concise.
> Example usage: using Dropwizard Metrics to provide event counters
> {code}
> KStream s = ...;
> s.map(this::mungeData)
>  .peek((i, s) -> metrics.noteMungedEvent(i, s))
>  .filter(this::hadProcessingError)
>  .print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-4720) Add KStream.peek(ForeachAction<K,V>)

2017-02-15 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-4720.
--
   Resolution: Fixed
Fix Version/s: 0.10.3.0

Issue resolved by pull request 2493
[https://github.com/apache/kafka/pull/2493]

> Add KStream.peek(ForeachAction)
> 
>
> Key: KAFKA-4720
> URL: https://issues.apache.org/jira/browse/KAFKA-4720
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Affects Versions: 0.10.1.1
>Reporter: Steven Schlansker
>  Labels: needs-kip
> Fix For: 0.10.3.0
>
>
> Java's Stream provides a handy peek method that observes elements in the 
> stream without transforming or filtering them.  While you can emulate this 
> functionality with either a filter or map, peek provides potentially useful 
> semantic information (doesn't modify the stream) and is much more concise.
> Example usage: using Dropwizard Metrics to provide event counters
> {code}
> KStream s = ...;
> s.map(this::mungeData)
>  .peek((i, s) -> metrics.noteMungedEvent(i, s))
>  .filter(this::hadProcessingError)
>  .print();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2493: KAFKA-4720: add a KStream#peek(ForeachAction<K, V>...

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2493


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4766) Document lz4 and lz4hc in confluence

2017-02-15 Thread Lee Dongjin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869092#comment-15869092
 ] 

Lee Dongjin commented on KAFKA-4766:


I just updated the wiki page. Is this exactly corresponds to your intention?

> Document lz4 and lz4hc in confluence
> 
>
> Key: KAFKA-4766
> URL: https://issues.apache.org/jira/browse/KAFKA-4766
> Project: Kafka
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.8.2.0
>Reporter: Daniel Pinyol
> Fix For: 0.8.2.0
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/Compression does not 
> mention that lz4 and lz4hc compressions are supported 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4766) Document lz4 and lz4hc in confluence

2017-02-15 Thread Lee Dongjin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15869089#comment-15869089
 ] 

Lee Dongjin commented on KAFKA-4766:


Yes, it seems like LZ4 compression is omitted at the wiki page. However, LZ4HC 
compression is not currently supported - it was introduced in commit 547cced 
but removed in commit 37356bf. (For details, please see 
https://www.mail-archive.com/dev@kafka.apache.org/msg65485.html)

It seems like we have to update the page for LZ4 codec only.

> Document lz4 and lz4hc in confluence
> 
>
> Key: KAFKA-4766
> URL: https://issues.apache.org/jira/browse/KAFKA-4766
> Project: Kafka
>  Issue Type: Sub-task
>  Components: documentation
>Affects Versions: 0.8.2.0
>Reporter: Daniel Pinyol
> Fix For: 0.8.2.0
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/Compression does not 
> mention that lz4 and lz4hc compressions are supported 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-15 Thread Mathieu Fenniak
On Wed, Feb 15, 2017 at 5:04 PM, Matthias J. Sax 
wrote:

> - We also removed method #topologyBuilder() from KStreamBuilder because
> we think #transform() should provide all functionality you need to
> mix-an-match Processor API and DSL. If there is any further concern
> about this, please let us know.
>

Hi Matthias,

Yes, I'm sorry I didn't respond sooner, but I still have a lot of concerns
about this.  You're correct to point out that transform() can be used for
some of the output situations I pointed out; albeit it seems somewhat
awkward to do so in a "transform" method; what do you do with the retval?

The PAPI processors I use in my KStreams app are all functioning on KTable
internals.  I wouldn't be able to convert them to process()/transform().

What's the harm in permitting both APIs to be used in the same application?

Mathieu


Jenkins build is back to normal : kafka-trunk-jdk7 #1940

2017-02-15 Thread Apache Jenkins Server
See 



Re: [kafka-clients] [VOTE] 0.10.2.0 RC2

2017-02-15 Thread Jun Rao
Hi, Ewen,

Thanks for running the release. +1. Verified quickstart on 2.10 binary.

Jun

On Tue, Feb 14, 2017 at 10:39 AM, Ewen Cheslack-Postava 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 0.10.2.0.
>
> This is a minor version release of Apache Kafka. It includes 19 new KIPs.
> See the release notes and release plan (https://cwiki.apache.org/conf
> luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few
> feature highlights: SASL-SCRAM support, improved client compatibility to
> allow use of clients newer than the broker, session windows and global
> tables in the Kafka Streams API, single message transforms in the Kafka
> Connect framework.
>
> Important note: in addition to the artifacts generated using JDK7 for
> Scala 2.10 and 2.11, this release also includes experimental artifacts
> built using JDK8 for Scala 2.12.
>
> Important code changes since RC1 (non-docs, non system tests):
>
> KAFKA-4756; The auto-generated broker id should be passed to
> MetricReporter.configure
> KAFKA-4761; Fix producer regression handling small or zero batch size
>
> Release notes for the 0.10.2.0 release:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by February 17th 5pm ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
>
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 5712b489038b71ed8d5a679856d1dfaa925eadc1
>
>
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
>
> * Successful Jenkins builds for the 0.10.2 branch:
> Unit/integration tests: https://builds.apache.org/job/
> kafka-0.10.2-jdk7/77/
> System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.2/
> 29/
>
> /**
>
> Thanks,
> Ewen
>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/kafka-clients/CAE1jLMORScgr1RekNgY0fLykSPh_%
> 2BgkRYN7vok3fz1ou%3DuW3kw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-02-15 Thread Elias Levy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868971#comment-15868971
 ] 

Elias Levy commented on KAFKA-2729:
---

Hit this again during testing with 0.10.0.1 on a 10 node broker cluster with a 
3 node ZK ensemble.  This should have priority Blocker instead of Major.

> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-15 Thread Xavier Léauté
One current benefit of having those classes extensible is the ability to
write simple wrappers around KStreamBuilder to add functionality that
currently doesn't exist.

In my case, I extend KStreamBuilder mainly to provide syntactic sugar and
make it easier to work with. For instance, I overload implementations of
some methods, such as .stream(...) to take custom objects that
intrinsically define topic names and key/value types, so I don't have to
worry about casting to the correct type.

I agree that extending those classes is probably not a sound practice,
however, absent any kind of interface to implement, it becomes difficult to
use a delegate pattern.

My proposal would be that if we make those classes final, we should also
define actual interfaces that KSteamBuilder and TopologyBuilder implement,
so that users could still easily create custom wrappers delegating most
methods.

An added benefit is that this would also codify the methods that are shared
between KStreamBuilder and TopologyBuilder, avoiding the need to manually
keep them in sync.

Xavier

On Sat, Feb 4, 2017 at 11:31 AM Matthias J. Sax 
wrote:

> I think the right pattern should be to use TopologyBuilder as a class
> member, like new KStreamBuilder will do, instead of a class hierarchy,
> in case you want to offer your own higher level abstraction to describe
> a topology.
>
> Why is it important that you can derive a new class from it?
>
> I am not too strict about about. It's just a suggestion to make it final
> to enforce a different usage pattern, that IMHO is better than the
> current one.
>
>
> -Matthias
>
>
> On 2/3/17 5:15 PM, radai wrote:
> > why is it so important to make those classes final ?
> >
> > On Fri, Feb 3, 2017 at 3:33 PM, Matthias J. Sax 
> > wrote:
> >
> >> Hi All,
> >>
> >> I did prepare a KIP to do some cleanup some of Kafka's Streaming API.
> >>
> >> Please have a look here:
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 120%3A+Cleanup+Kafka+Streams+builder+API
> >>
> >> Looking forward to your feedback!
> >>
> >>
> >> -Matthias
> >>
> >>
> >
>
>


[jira] [Commented] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-15 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868943#comment-15868943
 ] 

huxi commented on KAFKA-4767:
-

Seems that KafkaProducer already initiates a close to the IO thread by setting 
running to false, so it is not reasonable to issue ioThread.interrupt() 
directly.

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  // join again (if you want to be more accurate, you can re-adjust 
> the timeout)
>  this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2553: MINOR: fix indention in tags

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2553


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2553: MINOR: fix indention in tags

2017-02-15 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/2553

MINOR: fix indention in  tags



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka hotfixDocs2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2553


commit ca4c81a8de52cf41b6f10a4a5d07e47526de49a1
Author: Matthias J. Sax 
Date:   2017-02-16T00:50:38Z

MINOR: fix indention in  tags




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2547: MINOR: add session windows doc to streams.html

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2547


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-15 Thread Jun Rao
Hi, Jason,

Thanks for the updated doc. Looks good to me overall. Just a few more minor
comments.

210. Pid snapshots: Is the number of pid snapshot configurable or hardcoded
with 2? When do we decide to roll a new snapshot? Based on time, byte, or
offset? Is that configurable too?

211. I am wondering if we should store ExpirationTime in the producer
transactionalId mapping message as we do in the producer transaction status
message. If a producer only calls initTransactions(), but never publishes
any data, we still want to be able to expire and remove the producer
transactionalId mapping message.

212. The doc says "The LSO is always equal to one less than the minimum of
the initial offsets across all active transactions". This implies that LSO
is inclusive. However, currently, both high watermark and log end offsets
are exclusive. For consistency, it seems that we should make LSO exclusive
as well.

213. The doc says "If the topic is configured for compaction and deletion,
we will use the topic’s own retention limit. Otherwise, we will use the
default topic retention limit. Once the last message set produced by a
given PID has aged beyond the retention time, the PID will be expired." For
topics configured with just compaction, it seems it's more intuitive to
expire PID based on transactional.id.expiration.ms?

214. In the Coordinator-Broker request handling section, the doc says "If
the broker has a corresponding PID, verify that the received epoch is
greater than or equal to the current epoch. " Is the epoch the coordinator
epoch or the producer epoch?

215. The doc says "Append to the offset topic, but skip updating the offset
cache in the delayed produce callback, until a WriteTxnMarkerRequest from
the transaction coordinator is received including the offset topic
partitions." How do we do this efficiently? Do we need to cache pending
offsets per pid?

216. Do we need to add any new JMX metrics? For example, on the broker and
transaction coordinator side, it would be useful to know the number of live
pids.

Thanks,

Jun

On Wed, Feb 15, 2017 at 11:04 AM, Jason Gustafson 
wrote:

> Thanks everyone who has voted so far!
>
> Jun brought up a good point offline that the BeginTxnRequest was not
> strictly needed since there is no state to recover until a partition has
> been added to the transaction. Instead we can start the transaction
> implicitly upon receiving the first AddPartitionsToTxn request. This
> results in a slight change of behavior since the transaction timeout will
> be enforced only after the first send() instead of the beginTransaction().
> However, the main point of the timeout is to avoid blocking downstream
> consumers, which is only possible once you've added a partition to the
> transaction, so we feel the simplification is justified. I've updated the
> document accordingly.
>
> Thanks,
> Jason
>
> On Tue, Feb 14, 2017 at 2:03 PM, Jay Kreps  wrote:
>
> > +1
> >
> > Super happy with how this turned out. It's been a long journey since we
> > started thinking about this 3+ years ago. Can't wait to see it in
> > code---this is a big one! :-)
> >
> > -Jay
> >
> > On Wed, Feb 1, 2017 at 8:13 PM, Guozhang Wang 
> wrote:
> >
> > > Hi all,
> > >
> > > We would like to start the voting process for KIP-98. The KIP can be
> > found
> > > at
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 98+-+Exactly+Once+Delivery+and+Transactional+Messaging
> > >
> > > Discussion thread can be found here:
> > >
> > > http://search-hadoop.com/m/Kafka/uyzND1jwZrr7HRHf?subj=+
> > > DISCUSS+KIP+98+Exactly+Once+Delivery+and+Transactional+Messaging
> > >
> > > Thanks,
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


[GitHub] kafka pull request #2542: MINOR: Stream metrics documentation

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2542


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] 0.10.2.0 RC2

2017-02-15 Thread Ismael Juma
+1 (non-binding).

Verified source and Scala 2.11 binary artifacts, quick start on source
artifact and Scala 2.11 binary artifacts.

Thanks for managing the release!

Ismael

On Tue, Feb 14, 2017 at 6:39 PM, Ewen Cheslack-Postava 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 0.10.2.0.
>
> This is a minor version release of Apache Kafka. It includes 19 new KIPs.
> See the release notes and release plan (https://cwiki.apache.org/conf
> luence/display/KAFKA/Release+Plan+0.10.2.0
> )
> for more details. A few feature
> highlights: SASL-SCRAM support, improved client compatibility to allow use
> of clients newer than the broker, session windows and global tables in the
> Kafka Streams API, single message transforms in the Kafka Connect
> framework.
>
> Important note: in addition to the artifacts generated using JDK7 for Scala
> 2.10 and 2.11, this release also includes experimental artifacts built
> using JDK8 for Scala 2.12.
>
> Important code changes since RC1 (non-docs, non system tests):
>
> KAFKA-4756; The auto-generated broker id should be passed to
> MetricReporter.configure
> KAFKA-4761; Fix producer regression handling small or zero batch size
>
> Release notes for the 0.10.2.0 release:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by February 17th 5pm ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
>
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 5712b489038b71ed8d5a679856d1dfaa925eadc1
>
>
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
>
> * Successful Jenkins builds for the 0.10.2 branch:
> Unit/integration tests: https://builds.apache.org/job/
> kafka-0.10.2-jdk7/77/
> System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.2/29
> /
>
> /**
>
> Thanks,
> Ewen
>


Re: [DISCUSS] KIP-120: Cleanup Kafka Streams builder API

2017-02-15 Thread Matthias J. Sax
Hi,

according to the feedback, I updated the KIP, and limited its scope to
some extend:

- Instead of changing the creation of KafkaStreams instances, we keep
the current pattern (we might do a follow up KIP on this though).

- We also added a new method #describe() that returns a
TopologyDescription that can be used for any "pretty print"
functionality or similar.

- We also removed method #topologyBuilder() from KStreamBuilder because
we think #transform() should provide all functionality you need to
mix-an-match Processor API and DSL. If there is any further concern
about this, please let us know.



-Matthias


On 2/14/17 9:59 AM, Matthias J. Sax wrote:
> You can already output any number of record within .transform() using
> the provided Context object from init()...
> 
> 
> -Matthias
> 
> On 2/14/17 9:16 AM, Guozhang Wang wrote:
>>> and you can't output multiple records or branching logic from a
>> transform();
>>
>> For output multiple records in transform, we are currently working on
>> https://issues.apache.org/jira/browse/KAFKA-4217, I think that should cover
>> this use case.
>>
>> For branching the output in transform, I agree this is not perfect but I
>> think users can follow some patterns like "stream.transform().branch()",
>> would that work for you?
>>
>>
>> Guozhang
>>
>>
>> On Tue, Feb 14, 2017 at 8:29 AM, Mathieu Fenniak <
>> mathieu.fenn...@replicon.com> wrote:
>>
>>> On Tue, Feb 14, 2017 at 1:14 AM, Guozhang Wang  wrote:
>>>
 Some thoughts on the mixture usage of DSL / PAPI:

 There were some suggestions on mixing the usage of DSL and PAPI:
 https://issues.apache.org/jira/browse/KAFKA-3455, and after thinking it
>>> a
 bit more carefully, I'd rather not recommend users following this
>>> pattern,
 since in DSL this can always be achieved in process() / transform().
>>> Hence
 I think it is okay to prevent such patterns in the new APIs. And for the
 same reasons, I think we can remove KStreamBuilder#newName() from the
 public APIs.

>>>
>>> I'm not sure that things can always be achieved by process() /
>>> transform()... there are some limitations to these APIs.  You can't output
>>> from a process(), and you can't output multiple records or branching logic
>>> from a transform(); these are things that can be done in the PAPI quite
>>> easily.
>>>
>>> I definitely understand a preference for using process()/transform() where
>>> possible, but, they don't seem to replace the PAPI.
>>>
>>> I would love to be operating in a world that was entirely DSL.  But the DSL
>>> is limited, and it isn't extensible (... by any stable API).  I don't mind
>>> reaching into internals today and making my own life difficult to extend
>>> it, and I'd continue to find a way to do that if you made the APIs distinct
>>> and split, but I'm just expressing my preference that you not do that. :-)
>>>
>>> And about printing the topology for debuggability: I agrees this is a
 potential drawback, and I'd suggest maintain some functionality to build
>>> a
 "dry topology" as Mathieu suggested; the difficulty is that, internally
>>> we
 need a different "copy" of the topology for each thread so that they will
 not share any states, so we cannot directly pass in the topology into
 KafkaStreams instead of the topology builder. So how about adding a
 `ToplogyBuilder#toString` function which calls `build()` internally then
 prints the built dry topology?

>>>
>>> Well, this sounds better than KafkaStreams#toString() in that it doesn't
>>> require a running processor.  But I'd really love to have a simple object
>>> model for the topology, not a string output, so that I can output my own
>>> debug format.  I currently have that in the form of
>>> TopologyBuilder#nodeGroups() & TopologyBuilder#build(Integer).
>>>
>>> Mathieu
>>>
>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Comment Edited] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms

2017-02-15 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868777#comment-15868777
 ] 

Jiangjie Qin edited comment on KAFKA-4340 at 2/15/17 11:05 PM:
---

[~junrao] [~ijuma] [~cmccabe] I created 
https://github.com/apache/kafka/pull/2544 for this. I am run the branch builder 
now.


was (Author: becket_qin):
[~junrao] [~ijuma] [~cmccabe] I created 
https://github.com/apache/kafka/pull/2071 for this. I am run the branch builder 
now.

> Change the default value of log.message.timestamp.difference.max.ms to the 
> same as log.retention.ms
> ---
>
> Key: KAFKA-4340
> URL: https://issues.apache.org/jira/browse/KAFKA-4340
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.3.0
>
>
> [~junrao] brought up the following scenario: 
> If users are pumping data with timestamp already passed log.retention.ms into 
> Kafka, the messages will be appended to the log but will be immediately 
> rolled out by log retention thread when it kicks in and the messages will be 
> deleted. 
> To avoid this produce-and-deleted scenario, we can set the default value of 
> log.message.timestamp.difference.max.ms to be the same as log.retention.ms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms

2017-02-15 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868777#comment-15868777
 ] 

Jiangjie Qin commented on KAFKA-4340:
-

[~junrao] [~ijuma] [~cmccabe] I created 
https://github.com/apache/kafka/pull/2071 for this. I am run the branch builder 
now.

> Change the default value of log.message.timestamp.difference.max.ms to the 
> same as log.retention.ms
> ---
>
> Key: KAFKA-4340
> URL: https://issues.apache.org/jira/browse/KAFKA-4340
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.3.0
>
>
> [~junrao] brought up the following scenario: 
> If users are pumping data with timestamp already passed log.retention.ms into 
> Kafka, the messages will be appended to the log but will be immediately 
> rolled out by log retention thread when it kicks in and the messages will be 
> deleted. 
> To avoid this produce-and-deleted scenario, we can set the default value of 
> log.message.timestamp.difference.max.ms to be the same as log.retention.ms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: KIP-121 [VOTE]: Add KStream peek method

2017-02-15 Thread Eno Thereska
KIP is accepted, discussion now moves to PR.

Thanks
Eno

On Wed, Feb 15, 2017 at 12:28 PM, Steven Schlansker <
sschlans...@opentable.com> wrote:

> Oops, sorry, a number of votes were sent only to -dev and not to
> -user and so I missed those in the email I just sent.  The actual count is
> more like +8
>
> > On Feb 15, 2017, at 12:24 PM, Steven Schlansker <
> sschlans...@opentable.com> wrote:
> >
> > From reading the bylaws it's not entirely clear who closes the vote or
> how they
> > decide when to do so.
> >
> > Given a week has passed and assuming Jay's and Matthias's votes are
> binding,
> > we have a result of +4 votes with no other votes cast.
> >
> > I'll update the KIP with the result shortly :)
> >
> >> On Feb 14, 2017, at 3:36 PM, Zakee  wrote:
> >>
> >> +1
> >>
> >> -Zakee
> >>> On Feb 14, 2017, at 1:56 PM, Jay Kreps  wrote:
> >>>
> >>> +1
> >>>
> >>> Nice improvement.
> >>>
> >>> -Jay
> >>>
> >>> On Tue, Feb 14, 2017 at 1:22 PM, Steven Schlansker <
> >>> sschlans...@opentable.com> wrote:
> >>>
>  Hi, it looks like I have 2 of the 3 minimum votes, can a third voter
>  please consider this KIP?
>  Thanks.
> 
>  (PS - new revision on GitHub PR with hopefully the last round of
>  improvements)
> 
> > On Feb 8, 2017, at 9:06 PM, Matthias J. Sax 
>  wrote:
> >
> > +1
> >
> > On 2/8/17 4:51 PM, Gwen Shapira wrote:
> >> +1 (binding)
> >>
> >> On Wed, Feb 8, 2017 at 4:45 PM, Steven Schlansker
> >>  wrote:
> >>> Hi everyone,
> >>>
> >>> Thank you for constructive feedback on KIP-121,
>  KStream.peek(ForeachAction) ;
> >>> it seems like it is time to call a vote which I hope will pass
> easily
>  :)
> >>>
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>  121%3A+Add+KStream+peek+method
> >>>
> >>> I believe the PR attached is already in good shape to consider
> merging:
> >>>
> >>> https://github.com/apache/kafka/pull/2493
> >>>
> >>> Thanks!
> >>> Steven
> >>>
> >>
> >>
> >>
> >
> 
> 
> >>
> >> 
> >> Police Urge Your City Residents to Carry This at All Times
> >> Smart Trends
> >> http://thirdpartyoffers.netzero.net/TGL3231/
> 58a39467a3d7f146654a1st03duc
> >
>
>


[GitHub] kafka pull request #2492: HOTFIX: fixed section incompatible Steams API chan...

2017-02-15 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2492


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: KIP-121 [VOTE]: Add KStream peek method

2017-02-15 Thread Ismael Juma
+1 (binding) from me.

For the record, there were 4 binding +1s (Gwen, Guozhang, Jay and myself).

Ismael

On Wed, Feb 15, 2017 at 8:24 PM, Steven Schlansker <
sschlans...@opentable.com> wrote:

> From reading the bylaws it's not entirely clear who closes the vote or how
> they
> decide when to do so.
>
> Given a week has passed and assuming Jay's and Matthias's votes are
> binding,
> we have a result of +4 votes with no other votes cast.
>
> I'll update the KIP with the result shortly :)
>
> > On Feb 14, 2017, at 3:36 PM, Zakee  wrote:
> >
> > +1
> >
> > -Zakee
> >> On Feb 14, 2017, at 1:56 PM, Jay Kreps  wrote:
> >>
> >> +1
> >>
> >> Nice improvement.
> >>
> >> -Jay
> >>
> >> On Tue, Feb 14, 2017 at 1:22 PM, Steven Schlansker <
> >> sschlans...@opentable.com> wrote:
> >>
> >>> Hi, it looks like I have 2 of the 3 minimum votes, can a third voter
> >>> please consider this KIP?
> >>> Thanks.
> >>>
> >>> (PS - new revision on GitHub PR with hopefully the last round of
> >>> improvements)
> >>>
>  On Feb 8, 2017, at 9:06 PM, Matthias J. Sax 
> >>> wrote:
> 
>  +1
> 
>  On 2/8/17 4:51 PM, Gwen Shapira wrote:
> > +1 (binding)
> >
> > On Wed, Feb 8, 2017 at 4:45 PM, Steven Schlansker
> >  wrote:
> >> Hi everyone,
> >>
> >> Thank you for constructive feedback on KIP-121,
> >>> KStream.peek(ForeachAction) ;
> >> it seems like it is time to call a vote which I hope will pass
> easily
> >>> :)
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>> 121%3A+Add+KStream+peek+method
> >>
> >> I believe the PR attached is already in good shape to consider
> merging:
> >>
> >> https://github.com/apache/kafka/pull/2493
> >>
> >> Thanks!
> >> Steven
> >>
> >
> >
> >
> 
> >>>
> >>>
> >
> > 
> > Police Urge Your City Residents to Carry This at All Times
> > Smart Trends
> > http://thirdpartyoffers.netzero.net/TGL3231/58a39467a3d7f146654a1st03duc
>
>


Re: KIP-121 [VOTE]: Add KStream peek method

2017-02-15 Thread Steven Schlansker
Oops, sorry, a number of votes were sent only to -dev and not to
-user and so I missed those in the email I just sent.  The actual count is more 
like +8

> On Feb 15, 2017, at 12:24 PM, Steven Schlansker  
> wrote:
> 
> From reading the bylaws it's not entirely clear who closes the vote or how 
> they
> decide when to do so.
> 
> Given a week has passed and assuming Jay's and Matthias's votes are binding,
> we have a result of +4 votes with no other votes cast.
> 
> I'll update the KIP with the result shortly :)
> 
>> On Feb 14, 2017, at 3:36 PM, Zakee  wrote:
>> 
>> +1
>> 
>> -Zakee
>>> On Feb 14, 2017, at 1:56 PM, Jay Kreps  wrote:
>>> 
>>> +1
>>> 
>>> Nice improvement.
>>> 
>>> -Jay
>>> 
>>> On Tue, Feb 14, 2017 at 1:22 PM, Steven Schlansker <
>>> sschlans...@opentable.com> wrote:
>>> 
 Hi, it looks like I have 2 of the 3 minimum votes, can a third voter
 please consider this KIP?
 Thanks.
 
 (PS - new revision on GitHub PR with hopefully the last round of
 improvements)
 
> On Feb 8, 2017, at 9:06 PM, Matthias J. Sax 
 wrote:
> 
> +1
> 
> On 2/8/17 4:51 PM, Gwen Shapira wrote:
>> +1 (binding)
>> 
>> On Wed, Feb 8, 2017 at 4:45 PM, Steven Schlansker
>>  wrote:
>>> Hi everyone,
>>> 
>>> Thank you for constructive feedback on KIP-121,
 KStream.peek(ForeachAction) ;
>>> it seems like it is time to call a vote which I hope will pass easily
 :)
>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
 121%3A+Add+KStream+peek+method
>>> 
>>> I believe the PR attached is already in good shape to consider merging:
>>> 
>>> https://github.com/apache/kafka/pull/2493
>>> 
>>> Thanks!
>>> Steven
>>> 
>> 
>> 
>> 
> 
 
 
>> 
>> 
>> Police Urge Your City Residents to Carry This at All Times
>> Smart Trends
>> http://thirdpartyoffers.netzero.net/TGL3231/58a39467a3d7f146654a1st03duc
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: KIP-121 [VOTE]: Add KStream peek method

2017-02-15 Thread Steven Schlansker
From reading the bylaws it's not entirely clear who closes the vote or how they
decide when to do so.

Given a week has passed and assuming Jay's and Matthias's votes are binding,
we have a result of +4 votes with no other votes cast.

I'll update the KIP with the result shortly :)

> On Feb 14, 2017, at 3:36 PM, Zakee  wrote:
> 
> +1
> 
> -Zakee
>> On Feb 14, 2017, at 1:56 PM, Jay Kreps  wrote:
>> 
>> +1
>> 
>> Nice improvement.
>> 
>> -Jay
>> 
>> On Tue, Feb 14, 2017 at 1:22 PM, Steven Schlansker <
>> sschlans...@opentable.com> wrote:
>> 
>>> Hi, it looks like I have 2 of the 3 minimum votes, can a third voter
>>> please consider this KIP?
>>> Thanks.
>>> 
>>> (PS - new revision on GitHub PR with hopefully the last round of
>>> improvements)
>>> 
 On Feb 8, 2017, at 9:06 PM, Matthias J. Sax 
>>> wrote:
 
 +1
 
 On 2/8/17 4:51 PM, Gwen Shapira wrote:
> +1 (binding)
> 
> On Wed, Feb 8, 2017 at 4:45 PM, Steven Schlansker
>  wrote:
>> Hi everyone,
>> 
>> Thank you for constructive feedback on KIP-121,
>>> KStream.peek(ForeachAction) ;
>> it seems like it is time to call a vote which I hope will pass easily
>>> :)
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>> 121%3A+Add+KStream+peek+method
>> 
>> I believe the PR attached is already in good shape to consider merging:
>> 
>> https://github.com/apache/kafka/pull/2493
>> 
>> Thanks!
>> Steven
>> 
> 
> 
> 
 
>>> 
>>> 
> 
> 
> Police Urge Your City Residents to Carry This at All Times
> Smart Trends
> http://thirdpartyoffers.netzero.net/TGL3231/58a39467a3d7f146654a1st03duc



signature.asc
Description: Message signed with OpenPGP using GPGMail


[jira] [Commented] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms

2017-02-15 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868449#comment-15868449
 ] 

Ismael Juma commented on KAFKA-4340:


[~becket_qin], do you think you'll be able to look into this today? If not, we 
may need to temporarily revert the merged commit in order to get the system 
tests passing again.

> Change the default value of log.message.timestamp.difference.max.ms to the 
> same as log.retention.ms
> ---
>
> Key: KAFKA-4340
> URL: https://issues.apache.org/jira/browse/KAFKA-4340
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.3.0
>
>
> [~junrao] brought up the following scenario: 
> If users are pumping data with timestamp already passed log.retention.ms into 
> Kafka, the messages will be appended to the log but will be immediately 
> rolled out by log retention thread when it kicks in and the messages will be 
> deleted. 
> To avoid this produce-and-deleted scenario, we can set the default value of 
> log.message.timestamp.difference.max.ms to be the same as log.retention.ms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms

2017-02-15 Thread Colin P. McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868437#comment-15868437
 ] 

Colin P. McCabe commented on KAFKA-4340:


ClientCompatibilityTestNewBroker, TestUpgrade, TestSecurityRollingUpgrade, and 
MessageFormatChangeTest also fail after this change.

> Change the default value of log.message.timestamp.difference.max.ms to the 
> same as log.retention.ms
> ---
>
> Key: KAFKA-4340
> URL: https://issues.apache.org/jira/browse/KAFKA-4340
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.3.0
>
>
> [~junrao] brought up the following scenario: 
> If users are pumping data with timestamp already passed log.retention.ms into 
> Kafka, the messages will be appended to the log but will be immediately 
> rolled out by log retention thread when it kicks in and the messages will be 
> deleted. 
> To avoid this produce-and-deleted scenario, we can set the default value of 
> log.message.timestamp.difference.max.ms to be the same as log.retention.ms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms

2017-02-15 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868433#comment-15868433
 ] 

Jun Rao commented on KAFKA-4340:


[~becket_qin], after committing the patch in this jira. One of our system tests 
start to fail consistently: 
https://testbreak.confluent.io/kiosk/test_result?id=15685

In the failed tests, we use 0.9.0.1 producer to publish to trunk brokers and 
the producer keeps getting the following error.

{code}
[2017-02-15 10:10:17,275] INFO Kafka version : 0.9.0.1 
(org.apache.kafka.common.utils.AppInfoParser)
[2017-02-15 10:10:17,276] INFO Kafka commitId : 23c69d62a0cabf06 
(org.apache.kafka.common.utils.AppInfoParser)
[2017-02-15 10:10:17,467] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:17,697] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:17,730] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:17,735] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:17,739] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:17,744] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:18,108] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:18,123] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:18,126] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:18,128] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:18,130] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:18,295] INFO Closing the Kafka producer with timeoutMillis = 
9223372036854775807 ms. (org.apache.kafka.clients.producer.KafkaProducer)
[2017-02-15 10:10:18,319] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:18,325] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:18,327] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:18,328] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
[2017-02-15 10:10:18,329] WARN Unexpected error code: 32. 
(org.apache.kafka.common.protocol.Errors)
{code}

It seems that the issue is that messages from the 0.9 producer will get the 
default -1 timestamp and will be rejected by the new default 
message.timestamp.difference.max.ms. For backward compatibility, perhaps we 
should not reject messages with -1 timestamp?

> Change the default value of log.message.timestamp.difference.max.ms to the 
> same as log.retention.ms
> ---
>
> Key: KAFKA-4340
> URL: https://issues.apache.org/jira/browse/KAFKA-4340
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.3.0
>
>
> [~junrao] brought up the following scenario: 
> If users are pumping data with timestamp already passed log.retention.ms into 
> Kafka, the messages will be appended to the log but will be immediately 
> rolled out by log retention thread when it kicks in and the messages will be 
> deleted. 
> To avoid this produce-and-deleted scenario, we can set the default value of 
> log.message.timestamp.difference.max.ms to be the same as log.retention.ms.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-98: Exactly Once Delivery and Transactional Messaging

2017-02-15 Thread Jason Gustafson
Thanks everyone who has voted so far!

Jun brought up a good point offline that the BeginTxnRequest was not
strictly needed since there is no state to recover until a partition has
been added to the transaction. Instead we can start the transaction
implicitly upon receiving the first AddPartitionsToTxn request. This
results in a slight change of behavior since the transaction timeout will
be enforced only after the first send() instead of the beginTransaction().
However, the main point of the timeout is to avoid blocking downstream
consumers, which is only possible once you've added a partition to the
transaction, so we feel the simplification is justified. I've updated the
document accordingly.

Thanks,
Jason

On Tue, Feb 14, 2017 at 2:03 PM, Jay Kreps  wrote:

> +1
>
> Super happy with how this turned out. It's been a long journey since we
> started thinking about this 3+ years ago. Can't wait to see it in
> code---this is a big one! :-)
>
> -Jay
>
> On Wed, Feb 1, 2017 at 8:13 PM, Guozhang Wang  wrote:
>
> > Hi all,
> >
> > We would like to start the voting process for KIP-98. The KIP can be
> found
> > at
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 98+-+Exactly+Once+Delivery+and+Transactional+Messaging
> >
> > Discussion thread can be found here:
> >
> > http://search-hadoop.com/m/Kafka/uyzND1jwZrr7HRHf?subj=+
> > DISCUSS+KIP+98+Exactly+Once+Delivery+and+Transactional+Messaging
> >
> > Thanks,
> >
> > --
> > -- Guozhang
> >
>


Reg: Kafka HDFS Connector with (HDFS SSL enabled)

2017-02-15 Thread BigData dev
Hi,

Does Kafka HDFS Connect work with HDFS (SSL). As I see only properties in
security is
hdfs.authentication.kerberos, connect.hdfs.keytab,hdfs.namenode.principal
as these properties are all related to HDFS Kerberos.

As from the configuration and code I see we pass only Kerberos parameters,
not seen SSL configuration, so want to confirm will the Kafka HDFS
Connector works with HDFS (SSL enabled)?

Could you please provide any information on this.


Thanks


[jira] [Commented] (KAFKA-3893) Kafka Broker ID disappears from /brokers/ids

2017-02-15 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868331#comment-15868331
 ] 

Armin Braun commented on KAFKA-3893:


I think this is a duplicate of https://issues.apache.org/jira/browse/KAFKA-4041 
right ? (they already found the root cause for this lying in ZK there 
apparently)

> Kafka Broker ID disappears from /brokers/ids
> 
>
> Key: KAFKA-3893
> URL: https://issues.apache.org/jira/browse/KAFKA-3893
> Project: Kafka
>  Issue Type: Bug
>Reporter: chaitra
>Priority: Critical
>
> Kafka version used : 0.8.2.1 
> Zookeeper version: 3.4.6
> We have scenario where kafka 's broker in  zookeeper path /brokers/ids just 
> disappears.
> We see the zookeeper connection active and no network issue.
> The zookeeper conection timeout is set to 6000ms in server.properties
> Hence Kafka not participating in cluster



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-15 Thread Mayuresh Gharat
@Jun thanks for the comments.Please see the replies inline.

Currently kafka-acl.sh just creates an ACL path in ZK with the principal
name string.
> Yes, the kafka-acl.sh calls the addAcl() on the inbuilt
SimpleAclAuthorizer which in turn creates an ACL in ZK with the Principal
name string. This is because we supply the SimpleAclAuthorizer as a
commandline argument in the Kafka-acls.sh command.

The authorizer module in the broker reads the principal name
string from the acl path in ZK and creates the expected KafkaPrincipal for
matching. As you can see, the expected principal is created on the broker
side, not by the kafka-acl.sh tool.
> This is considering the fact that the user is using the
SimpleAclAuthorizer on the broker side and not his own custom Authorizer.
The SimpleAclAuthorizer will take the Principal it gets from the Session
class . Currently the Principal is KafkaPrincipal. This KafkaPrincipal is
generated from the name of the actual channel Principal, in SocketServer
class when processing completed receives.
With this KIP, this will no longer be the case as the Session class will
store a java.security.Principal instead of specific KafkaPrincipal. So the
SimpleAclAuthorizer will construct the KafkaPrincipal from the channel
Principal it gets from the Session class.
User might not want to use the SimpleAclAuthorizer but use his/her own
custom Authorizer.

The broker already has the ability to
configure PrincipalBuilder. That's why I am not sure if there is a need for
kafka-acl.sh to customize PrincipalBuilder.
> This is exactly the reason why we want to propose a PrincipalBuilder
in kafka-acls.sh so that the Principal generated by the PrincipalBuilder on
broker is consistent with that generated while creating ACLs using the
kafka-acls.sh command line tool.


*To summarize the above discussions :*
What if we only make the following changes: pass the java principal in
session and in
SimpleAuthorizer, construct KafkaPrincipal from java principal name. Will
that work for LinkedIn?
--> Yes, this works for Linkedin as we are not using the kafka-acls.sh
tool to create/update/add ACLs, for now.

Do you think there is a use case for a customized authorizer and kafka-acl
at the
same time? If not, it's better not to complicate the kafka-acl api.
-> At Linkedin, we don't use this tool for now. So we are fine with the
minimal change for now.

Initially, our change was minimal, just getting the Kafka to preserve the
channel principal. Since there was a discussion how kafka-acls.sh would
work with this change, on the ticket, we designed a detailed solution to
make this tool generally usable with all sorts of combinations of
Authorizers and PrincipalBuilders and give more flexibility to the end
users.
Without the changes proposed for kafka-acls.sh in this KIP, it cannot be
used with a custom Authorizer/PrinipalBuilder but will only work with
SimpleAclAuthorizer.

Although, I would actually like it to work for general scenario, I am fine
with separating it under a separate KIP and limit the scope of this KIP.
I will update the KIP accordingly and put this under rejected alternatives
and create a new KIP for the Kafka-acls.sh changes.

@Manikumar
Since we are limiting the scope of this KIP by not making any changes to
kafka-acls.sh, I will cover your concern in a separate KIP that I will put
up for kafka-acls.sh. Does that work?

Thanks,

Mayuresh


On Wed, Feb 15, 2017 at 9:18 AM, radai  wrote:

> @jun:
> "Currently kafka-acl.sh just creates an ACL path in ZK with the principal
> name string" - yes, but not directly. all it actually does it spin-up the
> Authorizer and call Authorizer.addAcl() on it.
> the vanilla Authorizer goes to ZK.
> but generally speaking, users can plug in their own Authorizers (that can
> store/load ACLs to/from wherever).
>
> it would be nice if users who customize Authorizers (and PrincipalBuilders)
> did not immediately lose the ability to use kafka-acl.sh with their new
> Authorizers.
>
> On Wed, Feb 15, 2017 at 5:50 AM, Manikumar 
> wrote:
>
> > Sorry, I am late to this discussion.
> >
> > PrincipalBuilder is only used for SSL Protocol.
> > For SASL, we use "sasl.kerberos.principal.to.local.rules" config to map
> > SASL principal names to short names. To make it consistent,
> > Do we also need to pass the SASL full principal name to authorizer ?
> > We may need to use PrincipalBuilder for mapping SASL names.
> >
> > Related JIRA is here:
> > https://issues.apache.org/jira/browse/KAFKA-2854
> >
> >
> > On Wed, Feb 15, 2017 at 7:47 AM, Jun Rao  wrote:
> >
> > > Hi, Radai,
> > >
> > > Currently kafka-acl.sh just creates an ACL path in ZK with the
> principal
> > > name string. The authorizer module in the broker reads the principal
> name
> > > string from the acl path in ZK and creates the expected KafkaPrincipal
> > for
> > > matching. As you can see, the expected principal is created on 

[jira] [Comment Edited] (KAFKA-4765) org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warnin

2017-02-15 Thread Armin Braun (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867408#comment-15867408
 ] 

Armin Braun edited comment on KAFKA-4765 at 2/15/17 6:15 PM:
-

-I could add a PR for this if you see it as a valid issue too, already tried 
the `.local` suffix fix out locally with success.-

Added a PR now as this is really trivial and I'm sure that I'm not the only one 
who'd benefit from this trivial fix :)


was (Author: original-brownbear):
I could add a PR for this if you see it as a valid issue too, already tried the 
`.local` suffix fix out locally with success.

> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
>  and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)
> -
>
> Key: KAFKA-4765
> URL: https://issues.apache.org/jira/browse/KAFKA-4765
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 0.10.1.1
> Environment: All DNS environments that properly forward 127.0.53.53
>Reporter: Armin Braun
>
> The test
> {code}
> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
> {code}
> fails on some systems because this below snippet from 
> {code}
> org.apache.kafka.clients.ClientUtils#parseAndValidateAddresses
> {code}
> {code}
> InetSocketAddress address = new InetSocketAddress(host, 
> port);
> if (address.isUnresolved()) {
> log.warn("Removing server {} from {} as DNS 
> resolution failed for {}", url, CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, 
> host);
> } else {
> addresses.add(address);
> }
> {code}
> will add the address *some.invalid.hostname.foo.bar* to the addresses list 
> without error since it is resolved to *127.0.53.53* to indicate potential 
> future collision of the _.bar_ tld.
> The same issue applies to a few other test cases that try to intentionally 
> run into broken hostnames.
> This can (and should) be fixed by using broken hostname examples that do not 
> collide. I would suggest just putting a ".local" suffix on all that are 
> currently affected by this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4765) org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)

2017-02-15 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun updated KAFKA-4765:
---
Status: Patch Available  (was: Open)

> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
>  and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)
> -
>
> Key: KAFKA-4765
> URL: https://issues.apache.org/jira/browse/KAFKA-4765
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 0.10.1.1
> Environment: All DNS environments that properly forward 127.0.53.53
>Reporter: Armin Braun
>
> The test
> {code}
> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
> {code}
> fails on some systems because this below snippet from 
> {code}
> org.apache.kafka.clients.ClientUtils#parseAndValidateAddresses
> {code}
> {code}
> InetSocketAddress address = new InetSocketAddress(host, 
> port);
> if (address.isUnresolved()) {
> log.warn("Removing server {} from {} as DNS 
> resolution failed for {}", url, CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, 
> host);
> } else {
> addresses.add(address);
> }
> {code}
> will add the address *some.invalid.hostname.foo.bar* to the addresses list 
> without error since it is resolved to *127.0.53.53* to indicate potential 
> future collision of the _.bar_ tld.
> The same issue applies to a few other test cases that try to intentionally 
> run into broken hostnames.
> This can (and should) be fixed by using broken hostname examples that do not 
> collide. I would suggest just putting a ".local" suffix on all that are 
> currently affected by this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-84: Support SASL/SCRAM mechanisms

2017-02-15 Thread Roger Hoover
Yes.  Thank you, Ismael.

On Wed, Feb 8, 2017 at 2:30 AM, Ismael Juma  wrote:

> Hi Roger,
>
> Sorry for the delay. SCRAM is specified by:
>
> https://tools.ietf.org/html/rfc5802
>
> The following quote is relevant:
>
> A SCRAM mechanism name is a string "SCRAM-" followed by the
> > uppercased name of the underlying hash function taken from the IANA
> > "Hash Function Textual Names" registry (see http://www.iana.org),
> > optionally followed by the suffix "-PLUS" (see below)
>
>
> And:
>
> "md2" 1.2.840.113549.2.2 [RFC3279]
> > "md5" 1.2.840.113549.2.5 [RFC3279]
> > "sha-1" 1.3.14.3.2.26 [RFC3279]
> > "sha-224" 2.16.840.1.101.3.4.2.4 [RFC4055]
> > "sha-256" 2.16.840.1.101.3.4.2.1 [RFC4055]
> > "sha-384" 2.16.840.1.101.3.4.2.2 [RFC4055]
> > "sha-512" 2.16.840.1.101.3.4.2.3 [RFC4055]
>
>
> https://www.iana.org/assignments/hash-function-
> text-names/hash-function-text-names.xhtml
>
> As you see, bcrypt is not an option for the current spec. The naming of the
> mechanisms would be a bit misleading if support for bcrypt was added
> (SCRAM-PKBDF2-SHA512, SCRAM-BCRYPT*, etc. would be better).
>
> Does that make sense?
>
> Ismael
>
> On Tue, Jan 24, 2017 at 7:26 PM, Roger Hoover 
> wrote:
>
> > Thanks, Ismael.  Just curious, why does it not make sense to do bcrypt
> > it in the context of SCRAM?
> >
> > On Mon, Jan 23, 2017 at 3:54 PM, Ismael Juma  wrote:
> >
> > > Hi Roger,
> > >
> > > SCRAM uses the PBKDF2 mechanism, here's a comparison between PBKDF2 and
> > > bcrypt:
> > >
> > > http://security.stackexchange.com/questions/4781/do-any-secu
> > > rity-experts-recommend-bcrypt-for-password-storage/6415#6415
> > >
> > > It may be worth supporting bcrypt, but not sure it would make sense to
> do
> > > it in the context of SCRAM.
> > >
> > > A minor correction: the KIP includes SCRAM-SHA-256 and SCRAM-SHA-512
> (not
> > > SCRAM-SHA-1).
> > >
> > > Ismael
> > >
> > > On Mon, Jan 23, 2017 at 10:49 PM, Roger Hoover  >
> > > wrote:
> > >
> > > > Sorry for the late question but is there a reason to choose SHA-1 and
> > > > SHA-256 instead of bcrypt?
> > > >
> > > > https://codahale.com/how-to-safely-store-a-password/
> > > >
> > > > On Fri, Nov 11, 2016 at 5:30 AM, Rajini Sivaram <
> > > > rajinisiva...@googlemail.com> wrote:
> > > >
> > > > > I think all the comments and suggestions on this thread have now
> been
> > > > > incorporated into the KIP. If there are no objections, I will start
> > the
> > > > > voting process on Monday.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > > On Tue, Nov 8, 2016 at 9:20 PM, Rajini Sivaram <
> > > > > rajinisiva...@googlemail.com
> > > > > > wrote:
> > > > >
> > > > > > Jun,
> > > > > >
> > > > > > Have added a sub-section on delegation token support to the KIP.
> > > > > >
> > > > > > Thank you,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > > On Tue, Nov 8, 2016 at 8:07 PM, Jun Rao 
> wrote:
> > > > > >
> > > > > >> Hi, Rajini,
> > > > > >>
> > > > > >> That makes sense. Could you document this potential future
> > extension
> > > > in
> > > > > >> the
> > > > > >> KIP?
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >> On Tue, Nov 8, 2016 at 11:17 AM, Rajini Sivaram <
> > > > > >> rajinisiva...@googlemail.com> wrote:
> > > > > >>
> > > > > >> > Jun,
> > > > > >> >
> > > > > >> > 11. SCRAM messages have an optional extensions field which is
> a
> > > list
> > > > > of
> > > > > >> > key=value pairs. We can add an extension key to the first
> client
> > > > > >> message to
> > > > > >> > indicate delegation token. Broker can then obtain credentials
> > and
> > > > > >> principal
> > > > > >> > using a different code path for delegation tokens.
> > > > > >> >
> > > > > >> > On Tue, Nov 8, 2016 at 6:38 PM, Jun Rao 
> > wrote:
> > > > > >> >
> > > > > >> > > Magnus,
> > > > > >> > >
> > > > > >> > > Thanks for the input. If you don't feel strongly the need to
> > > bump
> > > > up
> > > > > >> the
> > > > > >> > > version of SaslHandshake, we can leave the version
> unchanged.
> > > > > >> > >
> > > > > >> > > Rajini,
> > > > > >> > >
> > > > > >> > > 11. Yes, we could send the HMAC as the SCRAM password for
> the
> > > > > >> delegation
> > > > > >> > > token. Do we need something to indicate that this SCRAM
> token
> > is
> > > > > >> special
> > > > > >> > > (i.e., delegation token) so that we can generate the correct
> > > > > >> > > KafkaPrincipal? The delegation token logic can be added
> > later. I
> > > > am
> > > > > >> > asking
> > > > > >> > > just so that we have enough in the design of SCRAM to add
> the
> > > > > >> delegation
> > > > > >> > > token logic later.
> > > > > >> > >
> > > > > >> > > Thanks,
> > > > > >> > >
> > > > > >> > > Jun
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Tue, Nov 8, 2016 at 4:42 AM, Rajini Sivaram <
> > > > > >> > > rajinisiva...@googlemail.com
> > > > > >> > > > wrote:

[jira] [Commented] (KAFKA-4765) org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868284#comment-15868284
 ] 

ASF GitHub Bot commented on KAFKA-4765:
---

GitHub user original-brownbear opened a pull request:

https://github.com/apache/kafka/pull/2552

KAFKA-4765: Fixed Intentionally Broken Hosts Resolving to 127.0.53.53

Fixes https://issues.apache.org/jira/browse/KAFKA-4765 by simply using 
artificially broken hosts that are not resolved as potential collisions 
(127.0.53.53s) by some DNS servers.

This change is the only way to build while using Google's `8.8.8.8` (at 
least in my network).
I can't see a downside to making this change, simply makes the build more 
stable and portable :)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/original-brownbear/kafka KAFKA-4765

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2552.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2552


commit e62468c43930390fa545d0ff24376d8bd784e691
Author: Armin Braun 
Date:   2017-02-15T09:13:58Z

KAFKA-4765 Fixed Intentionally Broken Hosts Resolving to 127.0.53.53




> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
>  and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)
> -
>
> Key: KAFKA-4765
> URL: https://issues.apache.org/jira/browse/KAFKA-4765
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 0.10.1.1
> Environment: All DNS environments that properly forward 127.0.53.53
>Reporter: Armin Braun
>
> The test
> {code}
> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
> {code}
> fails on some systems because this below snippet from 
> {code}
> org.apache.kafka.clients.ClientUtils#parseAndValidateAddresses
> {code}
> {code}
> InetSocketAddress address = new InetSocketAddress(host, 
> port);
> if (address.isUnresolved()) {
> log.warn("Removing server {} from {} as DNS 
> resolution failed for {}", url, CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, 
> host);
> } else {
> addresses.add(address);
> }
> {code}
> will add the address *some.invalid.hostname.foo.bar* to the addresses list 
> without error since it is resolved to *127.0.53.53* to indicate potential 
> future collision of the _.bar_ tld.
> The same issue applies to a few other test cases that try to intentionally 
> run into broken hostnames.
> This can (and should) be fixed by using broken hostname examples that do not 
> collide. I would suggest just putting a ".local" suffix on all that are 
> currently affected by this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2552: KAFKA-4765: Fixed Intentionally Broken Hosts Resol...

2017-02-15 Thread original-brownbear
GitHub user original-brownbear opened a pull request:

https://github.com/apache/kafka/pull/2552

KAFKA-4765: Fixed Intentionally Broken Hosts Resolving to 127.0.53.53

Fixes https://issues.apache.org/jira/browse/KAFKA-4765 by simply using 
artificially broken hosts that are not resolved as potential collisions 
(127.0.53.53s) by some DNS servers.

This change is the only way to build while using Google's `8.8.8.8` (at 
least in my network).
I can't see a downside to making this change, simply makes the build more 
stable and portable :)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/original-brownbear/kafka KAFKA-4765

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2552.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2552


commit e62468c43930390fa545d0ff24376d8bd784e691
Author: Armin Braun 
Date:   2017-02-15T09:13:58Z

KAFKA-4765 Fixed Intentionally Broken Hosts Resolving to 127.0.53.53




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Cannot start up Kafka Server within Java

2017-02-15 Thread radai
kafka publishes a "test" artifact:
http://search.maven.org/#search|ga|1|a%3A%22kafka_2.12%22
if you introduce a test-scoped dep. on it, which in maven would look like:

   org.apache.kafka


   kafka_2.12
   0.10.1.1
   test
   test

you would get access to trait KafkaServerTestHarness, that you can base
your unit tests on (warning - its scala)

On Wed, Feb 15, 2017 at 8:33 AM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> Hi Stefan,
>
> The mailing list suppressed your attachment, so, it's hard to offer any
> advice without the error.  I'd suggest trying to post the error in a GitHub
> Gist, or some similar format.
>
> I dug up this mailing list archive about starting a Kafka server in-process
> that might help:
> http://mail-archives.apache.org/mod_mbox/kafka-users/201608.mbox/%
> 3CCAJTjOQG5MdboQ9ciBrvZNLQzaDAvKhyQeW-mmz-br4bWy32B7A%40mail.gmail.com%3E
>  The whole thread (Automated Testing w/ Kafka Streams) has a few other
> approaches as well that you might be interested in trying.
>
> Mathieu
>
>
> On Wed, Feb 15, 2017 at 8:27 AM, Stefan Kölbl  wrote:
>
> > Dear Kafka dev team!
> >
> >
> >
> > I’m implementing a data stream processor with Apache Kafka for my
> > bachelor’s thesis and right now I’m struggling with trying to get the
> Kafka
> > Server started with Java.
> >
> >
> >
> > A short summary of what I’m doing currently:
> >
> >1. Start ZooKeeper (manually via terminal).
> >2. Start Kafka Server (manually via terminal).
> >3. Create Kafka Topic (manually via terminal, although it would be
> >auto-created by the producer anyway).
> >4. Run Kafka Producer written with the Kafka Clients API in Java.
> >5. Run Kafka Consumer written with the Kafka Clients API in Java.
> >
> >
> >
> > To further automatize the testing process with jUnit, I’d like to be able
> > to start ZooKeeper and the Kafka Server with Java too, as I do with the
> > producer and the consumer.
> >
> > This should be possible, according to some examples I found online, using
> > the Scala source of Kafka (not Kafka Clients!):
> > http://www.programcreek.com/java-api-examples/index.php?
> > api=kafka.server.KafkaServer
> >
> > But whenever I try to create a new KafkaServer object, I get an error
> > (please see attached error.png).
> >
> >
> >
> > Here are my maven dependencies:
> >
> > <*dependency*>
> > <*groupId*>org.apache.kafka
> > <*artifactId*>kafka-clients
> > <*version*>0.10.1.1
> > 
> > <*dependency*>
> > <*groupId*>org.apache.kafka
> > <*artifactId*>kafka-streams
> > <*version*>0.10.1.1
> > 
> > <*dependency*>
> > <*groupId*>org.apache.kafka
> > <*artifactId*>kafka_2.11
> > <*version*>0.10.1.1
> > 
> >
> >
> >
> > I Somehow have the feeling I’m missing something crucial and easy-to-fix.
> > Neither a lot of google searches nor playing around with the parameters
> > (time, threadNamePrefix, kafkaMetricsReporters) provided to the
> KafkaServer
> > constructor could resolve my issue.
> >
> > Could you please help me? I’m stuck and don’t know what to do anymore.
> >
> > Thank you in advance.
> >
> >
> >
> > Best regards,
> >
> > Stefan Kölbl
> >
>


Re: [VOTE] 0.10.2.0 RC2

2017-02-15 Thread Tom Crayford
Heroku tested this with our usual round of performance benchmarks, and
there seem to be no notable regressions in this RC that we can see (for a
sample on earlier regressions we found using these benchmarks during the
0.10.0.0 release,
https://engineering.heroku.com/blogs/2016-05-27-apache-kafka-010-evaluating-performance-in-distributed-systems/
is a decent writeup)

On Tue, Feb 14, 2017 at 6:39 PM, Ewen Cheslack-Postava 
wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 0.10.2.0.
>
> This is a minor version release of Apache Kafka. It includes 19 new KIPs.
> See the release notes and release plan (https://cwiki.apache.org/conf
> luence/display/KAFKA/Release+Plan+0.10.2.0) for more details. A few
> feature
> highlights: SASL-SCRAM support, improved client compatibility to allow use
> of clients newer than the broker, session windows and global tables in the
> Kafka Streams API, single message transforms in the Kafka Connect
> framework.
>
> Important note: in addition to the artifacts generated using JDK7 for Scala
> 2.10 and 2.11, this release also includes experimental artifacts built
> using JDK8 for Scala 2.12.
>
> Important code changes since RC1 (non-docs, non system tests):
>
> KAFKA-4756; The auto-generated broker id should be passed to
> MetricReporter.configure
> KAFKA-4761; Fix producer regression handling small or zero batch size
>
> Release notes for the 0.10.2.0 release:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by February 17th 5pm ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/
>
> * Javadoc:
> http://home.apache.org/~ewencp/kafka-0.10.2.0-rc2/javadoc/
>
> * Tag to be voted upon (off 0.10.2 branch) is the 0.10.2.0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 5712b489038b71ed8d5a679856d1dfaa925eadc1
>
>
> * Documentation:
> http://kafka.apache.org/0102/documentation.html
>
> * Protocol:
> http://kafka.apache.org/0102/protocol.html
>
> * Successful Jenkins builds for the 0.10.2 branch:
> Unit/integration tests: https://builds.apache.org/job/
> kafka-0.10.2-jdk7/77/
> System tests: https://jenkins.confluent.io/job/system-test-kafka-0.10.2/
> 29/
>
> /**
>
> Thanks,
> Ewen
>


Re: [VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-15 Thread radai
@jun:
"Currently kafka-acl.sh just creates an ACL path in ZK with the principal
name string" - yes, but not directly. all it actually does it spin-up the
Authorizer and call Authorizer.addAcl() on it.
the vanilla Authorizer goes to ZK.
but generally speaking, users can plug in their own Authorizers (that can
store/load ACLs to/from wherever).

it would be nice if users who customize Authorizers (and PrincipalBuilders)
did not immediately lose the ability to use kafka-acl.sh with their new
Authorizers.

On Wed, Feb 15, 2017 at 5:50 AM, Manikumar 
wrote:

> Sorry, I am late to this discussion.
>
> PrincipalBuilder is only used for SSL Protocol.
> For SASL, we use "sasl.kerberos.principal.to.local.rules" config to map
> SASL principal names to short names. To make it consistent,
> Do we also need to pass the SASL full principal name to authorizer ?
> We may need to use PrincipalBuilder for mapping SASL names.
>
> Related JIRA is here:
> https://issues.apache.org/jira/browse/KAFKA-2854
>
>
> On Wed, Feb 15, 2017 at 7:47 AM, Jun Rao  wrote:
>
> > Hi, Radai,
> >
> > Currently kafka-acl.sh just creates an ACL path in ZK with the principal
> > name string. The authorizer module in the broker reads the principal name
> > string from the acl path in ZK and creates the expected KafkaPrincipal
> for
> > matching. As you can see, the expected principal is created on the broker
> > side, not by the kafka-acl.sh tool. The broker already has the ability to
> > configure PrincipalBuilder. That's why I am not sure if there is a need
> for
> > kafka-acl.sh to customize PrincipalBuilder.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Feb 13, 2017 at 7:01 PM, radai 
> wrote:
> >
> > > if i understand correctly, kafka-acls.sh spins up an instance of (the
> > > custom, in our case) Authorizer, and calls things like addAcls(acls:
> > > Set[Acl], resource: Resource) on it, which are defined in the
> interface,
> > > hence expected to be "extensible".
> > >
> > > (side note: if Authorizer and PrincipalBuilder are defined as
> extensible
> > > interfaces, why doesnt class Acl, which is in the signature for
> > Authorizer
> > > calls, use java.security.Principal?)
> > >
> > > we would like to be able to use the standard kafka-acl command line for
> > > defining ACLs even when replacing the vanilla Authorizer and
> > > PrincipalBuilder (even though we have a management UI for these
> > operations
> > > within linkedin) - simply because thats the correct thing to do from an
> > > extensibility point of view.
> > >
> > > On Mon, Feb 13, 2017 at 1:39 PM, Jun Rao  wrote:
> > >
> > > > Hi, Mayuresh,
> > > >
> > > > I seems to me that there are two common use cases of authorizer. (1)
> > Use
> > > > the default SimpleAuthorizer and the kafka-acl to do authorization.
> (2)
> > > Use
> > > > a customized authorizer and an external tool for authorization. Do
> you
> > > > think there is a use case for a customized authorizer and kafka-acl
> at
> > > the
> > > > same time? If not, it's better not to complicate the kafka-acl api.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > >
> > > > On Mon, Feb 13, 2017 at 10:35 AM, Mayuresh Gharat <
> > > > gharatmayures...@gmail.com> wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks for the review and comments. Please find the replies inline
> :
> > > > >
> > > > > This is so that in the future, we can extend to types like group.
> > > > > ---> Yep, I did think the same. But since the SocketServer was
> always
> > > > > creating User type, it wasn't actually used. If we go ahead with
> > > changes
> > > > in
> > > > > this KIP, we will give this power of creating different Principal
> > types
> > > > to
> > > > > the PrincipalBuilder (which users can define there own). In that
> way
> > > > Kafka
> > > > > will not have to deal with handling this. So the Principal building
> > and
> > > > > Authorization will be opaque to Kafka which seems like an expected
> > > > > behavior.
> > > > >
> > > > >
> > > > > Hmm, normally, the configurations you specify for plug-ins refer to
> > > those
> > > > > needed to construct the plug-in object. So, it's kind of weird to
> use
> > > > that
> > > > > to call a method. For example, why can't
> > principalBuilderService.rest.
> > > > url
> > > > > be passed in through the configure() method and the implementation
> > can
> > > > use
> > > > > that to build principal. This way, there is only a single method to
> > > > compute
> > > > > the principal in a consistent way in the broker and in the
> kafka-acl
> > > > tool.
> > > > > > We can do that as well. But since the rest url is not related
> > to
> > > > the
> > > > > Principal, it seems out of place to me to pass it every time we
> have
> > to
> > > > > create a Principal. I should replace "principalConfigs" with
> > > > > "principalProperties".
> > > > > I was trying to differentiate the 

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-15 Thread radai
I've trimmed the inline contents as this mail is getting too big for the
apache mailing list software to deliver :-(

1. the important thing for interoperability is for different "interested
parties" (plugins, infra layers/wrappers, user-code) to be able to stick
pieces of metadata onto msgs without getting in each other's way. a common
key scheme (Strings, as of the time of this writing?) is all thats required
for that. it is assumed that the other end interested in any such piece of
metadata knows the encoding, and byte[] provides for the most flexibility.
i believe this is the same logic behind core kafka being byte[]/byte[] -
Strings are more "usable" but bytes are flexible and so were chosen.
Also - core kafka doesnt even do that good of a job on usability of the
payload (example - i have to specify the nop byte[] "decoders" explicitly
in conf), and again sacrificies usability for the sake of performance (no
convenient single-record processing as poll is a batch, lots of obscure
little config details exposing internals of the batching mechanism, etc)

this is also why i really dislike the idea of a "type system" for header
values, it further degrades the usability, adds complexity and will
eventually get in people's way, also, it would be the 2nd/3rd home-group
serialization mechanism in core kafka (counting 2 iterations of the "type
definition DSL")

2. this is an implementation detail, and not even a very "user facing" one?
to the best of my understanding the vote process is on proposed
API/behaviour. also - since we're willing to go with strings just serialize
a 0-sized header blob and IIUC you dont need any optionals anymore.

3. yes, we can :-)

On Tue, Feb 14, 2017 at 11:56 PM, Michael Pearce 
wrote:

> Hi Jay,
>
> 1) There was some initial debate on the value part, as youll note String,
> String headers were discounted early on. The reason for this is flexibility
> and keeping in line with the flexibility of key, value of the message
> object itself. I don’t think it takes away from an ecosystem as each plugin
> will care for their own key, this way ints, booleans , exotic custom binary
> can all be catered for=.
> a. If you really wanted to push for a typed value interface, I wouldn’t
> want just String values supported, but the the primatives plus string and
> also still keeping the ability to have a binary for custom binaries that
> some organisations may have.
> i. I have written this slight alternative here, https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-82+-+Add+Record+Headers+-+Typed
> ii. Essentially the value bytes, has a leading byte overhead.
> 1.  This tells you what type the value is, before reading the rest of the
> bytes, allowing serialisation/deserialization to and from the primitives,
> string and byte[]. This is akin to some other messaging systems.
> 2) We are making it optional, so that for those not wanting headers have 0
> bytes overhead (think of it as a feature flag), I don’t think this is
> complex, especially if comparing to changes proposed in other kips like
> kip-98.
> a. If you really really don’t like this, we can drop it, but it would mean
> buying into 4 bytes extra overhead for users who do not want to use headers.
> 3) In the summary yes, it is at a higher level, but I think this is well
> documented in the proposed changes section.
> a. Added getHeaders method to Producer/Consumer record (that is it)
> b. We’ve also detailed the new Headers class that this method returns that
> encapsulates the headers protocol and logic.
>
> Best,
> Mike
>
> ==Original questions from the vote thread from Jay.==
>
> Couple of things I think we still need to work out:
>
>1. I think we agree about the key, but I think we haven't talked about
>the value yet. I think if our goal is an open ecosystem of these header
>spread across many plugins from many systems we should consider making
> this
>a string as well so it can be printed, set via a UI, set in config, etc.
>Basically encouraging pluggable serialization formats here will lead to
> a
>bit of a tower of babel.
>2. This proposal still includes a pretty big change to our serialization
>and protocol definition layer. Essentially it is introducing an optional
>type, where the format is data dependent. I think this is actually a big
>change though it doesn't seem like it. It means you can no longer
> specify
>this type with our type definition DSL, and likewise it requires custom
>handling in client libs. This isn't a huge thing, since the Record
>definition is custom anyway, but I think this kind of protocol
>inconsistency is very non-desirable and ties you to hand-coding things.
> I
>think the type should instead by [Key Value] in our BNF, where key and
>value are both short strings as used elsewhere. This brings it in line
> with
>the rest of the protocol.
>3. Could we get more specific about the exact Java API change to

Re: Cannot start up Kafka Server within Java

2017-02-15 Thread Mathieu Fenniak
Hi Stefan,

The mailing list suppressed your attachment, so, it's hard to offer any
advice without the error.  I'd suggest trying to post the error in a GitHub
Gist, or some similar format.

I dug up this mailing list archive about starting a Kafka server in-process
that might help:
http://mail-archives.apache.org/mod_mbox/kafka-users/201608.mbox/%3CCAJTjOQG5MdboQ9ciBrvZNLQzaDAvKhyQeW-mmz-br4bWy32B7A%40mail.gmail.com%3E
 The whole thread (Automated Testing w/ Kafka Streams) has a few other
approaches as well that you might be interested in trying.

Mathieu


On Wed, Feb 15, 2017 at 8:27 AM, Stefan Kölbl  wrote:

> Dear Kafka dev team!
>
>
>
> I’m implementing a data stream processor with Apache Kafka for my
> bachelor’s thesis and right now I’m struggling with trying to get the Kafka
> Server started with Java.
>
>
>
> A short summary of what I’m doing currently:
>
>1. Start ZooKeeper (manually via terminal).
>2. Start Kafka Server (manually via terminal).
>3. Create Kafka Topic (manually via terminal, although it would be
>auto-created by the producer anyway).
>4. Run Kafka Producer written with the Kafka Clients API in Java.
>5. Run Kafka Consumer written with the Kafka Clients API in Java.
>
>
>
> To further automatize the testing process with jUnit, I’d like to be able
> to start ZooKeeper and the Kafka Server with Java too, as I do with the
> producer and the consumer.
>
> This should be possible, according to some examples I found online, using
> the Scala source of Kafka (not Kafka Clients!):
> http://www.programcreek.com/java-api-examples/index.php?
> api=kafka.server.KafkaServer
>
> But whenever I try to create a new KafkaServer object, I get an error
> (please see attached error.png).
>
>
>
> Here are my maven dependencies:
>
> <*dependency*>
> <*groupId*>org.apache.kafka
> <*artifactId*>kafka-clients
> <*version*>0.10.1.1
> 
> <*dependency*>
> <*groupId*>org.apache.kafka
> <*artifactId*>kafka-streams
> <*version*>0.10.1.1
> 
> <*dependency*>
> <*groupId*>org.apache.kafka
> <*artifactId*>kafka_2.11
> <*version*>0.10.1.1
> 
>
>
>
> I Somehow have the feeling I’m missing something crucial and easy-to-fix.
> Neither a lot of google searches nor playing around with the parameters
> (time, threadNamePrefix, kafkaMetricsReporters) provided to the KafkaServer
> constructor could resolve my issue.
>
> Could you please help me? I’m stuck and don’t know what to do anymore.
>
> Thank you in advance.
>
>
>
> Best regards,
>
> Stefan Kölbl
>


Cannot start up Kafka Server within Java

2017-02-15 Thread Stefan Kölbl
Dear Kafka dev team!

 

I’m implementing a data stream processor with Apache Kafka for my bachelor’s
thesis and right now I’m struggling with trying to get the Kafka Server
started with Java.

 

A short summary of what I’m doing currently:

1.  Start ZooKeeper (manually via terminal).
2.  Start Kafka Server (manually via terminal).
3.  Create Kafka Topic (manually via terminal, although it would be
auto-created by the producer anyway).
4.  Run Kafka Producer written with the Kafka Clients API in Java.
5.  Run Kafka Consumer written with the Kafka Clients API in Java.

 

To further automatize the testing process with jUnit, I’d like to be able to
start ZooKeeper and the Kafka Server with Java too, as I do with the
producer and the consumer.

This should be possible, according to some examples I found online, using
the Scala source of Kafka (not Kafka Clients!):
http://www.programcreek.com/java-api-examples/index.php?api=kafka.server.Kaf
kaServer

But whenever I try to create a new KafkaServer object, I get an error
(please see attached error.png).

 

Here are my maven dependencies:


org.apache.kafka
kafka-clients
0.10.1.1


org.apache.kafka
kafka-streams
0.10.1.1


org.apache.kafka
kafka_2.11
0.10.1.1


 

I Somehow have the feeling I’m missing something crucial and easy-to-fix.
Neither a lot of google searches nor playing around with the parameters
(time, threadNamePrefix, kafkaMetricsReporters) provided to the KafkaServer
constructor could resolve my issue.

Could you please help me? I’m stuck and don’t know what to do anymore.

Thank you in advance.

 

Best regards,

Stefan Kölbl



[jira] [Commented] (KAFKA-4768) Single-request latency for streams can be very high

2017-02-15 Thread Eno Thereska (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867928#comment-15867928
 ] 

Eno Thereska commented on KAFKA-4768:
-

This is actually not a real problem since it's because of caching. The default 
commit interval is 30 seconds and it can be dialed down.

> Single-request latency for streams can be very high
> ---
>
> Key: KAFKA-4768
> URL: https://issues.apache.org/jira/browse/KAFKA-4768
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
> Fix For: 0.10.3.0
>
>
> When running SimpleBenchmark.java with just 1 request and measuring the 
> latency, we observe that it can be very high for some topologies, like 
> count(). Latency can be as high as 30 seconds. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-4768) Single-request latency for streams can be very high

2017-02-15 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska resolved KAFKA-4768.
-
Resolution: Not A Problem

> Single-request latency for streams can be very high
> ---
>
> Key: KAFKA-4768
> URL: https://issues.apache.org/jira/browse/KAFKA-4768
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
> Fix For: 0.10.3.0
>
>
> When running SimpleBenchmark.java with just 1 request and measuring the 
> latency, we observe that it can be very high for some topologies, like 
> count(). Latency can be as high as 30 seconds. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4768) Single-request latency for streams can be very high

2017-02-15 Thread Eno Thereska (JIRA)
Eno Thereska created KAFKA-4768:
---

 Summary: Single-request latency for streams can be very high
 Key: KAFKA-4768
 URL: https://issues.apache.org/jira/browse/KAFKA-4768
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.0
Reporter: Eno Thereska
 Fix For: 0.10.3.0


When running SimpleBenchmark.java with just 1 request and measuring the 
latency, we observe that it can be very high for some topologies, like count(). 
Latency can be as high as 30 seconds. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-82 Add Record Headers

2017-02-15 Thread Jeroen van Disseldorp
+1 on introducing the concept of headers, neutral on specific 
implementation.



On 14/02/2017 22:34, Jay Kreps wrote:

Couple of things I think we still need to work out:

1. I think we agree about the key, but I think we haven't talked about
the value yet. I think if our goal is an open ecosystem of these header
spread across many plugins from many systems we should consider making this
a string as well so it can be printed, set via a UI, set in config, etc.
Basically encouraging pluggable serialization formats here will lead to a
bit of a tower of babel.
2. This proposal still includes a pretty big change to our serialization
and protocol definition layer. Essentially it is introducing an optional
type, where the format is data dependent. I think this is actually a big
change though it doesn't seem like it. It means you can no longer specify
this type with our type definition DSL, and likewise it requires custom
handling in client libs. This isn't a huge thing, since the Record
definition is custom anyway, but I think this kind of protocol
inconsistency is very non-desirable and ties you to hand-coding things. I
think the type should instead by [Key Value] in our BNF, where key and
value are both short strings as used elsewhere. This brings it in line with
the rest of the protocol.
3. Could we get more specific about the exact Java API change to
ProducerRecord, ConsumerRecord, Record, etc?

-Jay

On Tue, Feb 14, 2017 at 9:42 AM, Michael Pearce 
wrote:


Hi all,

We would like to start the voting process for KIP-82 – Add record headers.
The KIP can be found
at

https://cwiki.apache.org/confluence/display/KAFKA/KIP-
82+-+Add+Record+Headers

Discussion thread(s) can be found here:

http://search-hadoop.com/m/Kafka/uyzND1nSTOHTvj81?subj=
Re+DISCUSS+KIP+82+Add+Record+Headers
http://search-hadoop.com/m/Kafka/uyzND1Arxt22Tvj81?subj=
Re+DISCUSS+KIP+82+Add+Record+Headers
http://search-hadoop.com/?project=Kafka=KIP-82



Thanks,
Mike

The information contained in this email is strictly confidential and for
the use of the addressee only, unless otherwise indicated. If you are not
the intended recipient, please do not read, copy, use or disclose to others
this message or any attachment. Please also notify the sender by replying
to this email or by telephone (+44(020 7896 0011) and then delete the email
and any copies of it. Opinions, conclusion (etc) that do not relate to the
official business of this company shall be understood as neither given nor
endorsed by it. IG is a trading name of IG Markets Limited (a company
registered in England and Wales, company number 04008957) and IG Index
Limited (a company registered in England and Wales, company number
01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
Index Limited (register number 114059) are authorised and regulated by the
Financial Conduct Authority.





Re: [VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-15 Thread Manikumar
Sorry, I am late to this discussion.

PrincipalBuilder is only used for SSL Protocol.
For SASL, we use "sasl.kerberos.principal.to.local.rules" config to map
SASL principal names to short names. To make it consistent,
Do we also need to pass the SASL full principal name to authorizer ?
We may need to use PrincipalBuilder for mapping SASL names.

Related JIRA is here:
https://issues.apache.org/jira/browse/KAFKA-2854


On Wed, Feb 15, 2017 at 7:47 AM, Jun Rao  wrote:

> Hi, Radai,
>
> Currently kafka-acl.sh just creates an ACL path in ZK with the principal
> name string. The authorizer module in the broker reads the principal name
> string from the acl path in ZK and creates the expected KafkaPrincipal for
> matching. As you can see, the expected principal is created on the broker
> side, not by the kafka-acl.sh tool. The broker already has the ability to
> configure PrincipalBuilder. That's why I am not sure if there is a need for
> kafka-acl.sh to customize PrincipalBuilder.
>
> Thanks,
>
> Jun
>
>
> On Mon, Feb 13, 2017 at 7:01 PM, radai  wrote:
>
> > if i understand correctly, kafka-acls.sh spins up an instance of (the
> > custom, in our case) Authorizer, and calls things like addAcls(acls:
> > Set[Acl], resource: Resource) on it, which are defined in the interface,
> > hence expected to be "extensible".
> >
> > (side note: if Authorizer and PrincipalBuilder are defined as extensible
> > interfaces, why doesnt class Acl, which is in the signature for
> Authorizer
> > calls, use java.security.Principal?)
> >
> > we would like to be able to use the standard kafka-acl command line for
> > defining ACLs even when replacing the vanilla Authorizer and
> > PrincipalBuilder (even though we have a management UI for these
> operations
> > within linkedin) - simply because thats the correct thing to do from an
> > extensibility point of view.
> >
> > On Mon, Feb 13, 2017 at 1:39 PM, Jun Rao  wrote:
> >
> > > Hi, Mayuresh,
> > >
> > > I seems to me that there are two common use cases of authorizer. (1)
> Use
> > > the default SimpleAuthorizer and the kafka-acl to do authorization. (2)
> > Use
> > > a customized authorizer and an external tool for authorization. Do you
> > > think there is a use case for a customized authorizer and kafka-acl at
> > the
> > > same time? If not, it's better not to complicate the kafka-acl api.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > >
> > > On Mon, Feb 13, 2017 at 10:35 AM, Mayuresh Gharat <
> > > gharatmayures...@gmail.com> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for the review and comments. Please find the replies inline :
> > > >
> > > > This is so that in the future, we can extend to types like group.
> > > > ---> Yep, I did think the same. But since the SocketServer was always
> > > > creating User type, it wasn't actually used. If we go ahead with
> > changes
> > > in
> > > > this KIP, we will give this power of creating different Principal
> types
> > > to
> > > > the PrincipalBuilder (which users can define there own). In that way
> > > Kafka
> > > > will not have to deal with handling this. So the Principal building
> and
> > > > Authorization will be opaque to Kafka which seems like an expected
> > > > behavior.
> > > >
> > > >
> > > > Hmm, normally, the configurations you specify for plug-ins refer to
> > those
> > > > needed to construct the plug-in object. So, it's kind of weird to use
> > > that
> > > > to call a method. For example, why can't
> principalBuilderService.rest.
> > > url
> > > > be passed in through the configure() method and the implementation
> can
> > > use
> > > > that to build principal. This way, there is only a single method to
> > > compute
> > > > the principal in a consistent way in the broker and in the kafka-acl
> > > tool.
> > > > > We can do that as well. But since the rest url is not related
> to
> > > the
> > > > Principal, it seems out of place to me to pass it every time we have
> to
> > > > create a Principal. I should replace "principalConfigs" with
> > > > "principalProperties".
> > > > I was trying to differentiate the configs/properties that are used to
> > > > create the PrincipalBuilder class and the Principal/Principals
> itself.
> > > >
> > > >
> > > > For LinkedIn's use case, do you actually use the kafka-acl tool? My
> > > > understanding is that LinkedIn does authorization through an external
> > > tool.
> > > > > For Linkedin's use case we don't actually use the kafka-acl
> tool
> > > > right now. As per the discussion that we had on
> > > > https://issues.apache.org/jira/browse/KAFKA-4454, we thought that it
> > > would
> > > > be good to make kafka-acl tool changes, to make it flexible and we
> > might
> > > be
> > > > even able to use it in future.
> > > >
> > > > It seems it's simpler if kafka-acl doesn't to need to understand the
> > > > principal builder. The tool does authorization based on a string
> name,
> > > > which is 

[GitHub] kafka pull request #2551: KAFKA-4752: Fixed bandwidth calculation

2017-02-15 Thread enothereska
GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/2551

KAFKA-4752: Fixed bandwidth calculation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka KAFKA-4752-join-bw

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2551.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2551


commit f743d5e21714088f663f3bc0b105f6afe882335d
Author: Eno Thereska 
Date:   2017-02-15T13:44:34Z

Fixed byte calculation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4752) Streams Simple Benchmark MB/sec calculation is not correct for Join operations

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867853#comment-15867853
 ] 

ASF GitHub Bot commented on KAFKA-4752:
---

GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/2551

KAFKA-4752: Fixed bandwidth calculation



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka KAFKA-4752-join-bw

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2551.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2551


commit f743d5e21714088f663f3bc0b105f6afe882335d
Author: Eno Thereska 
Date:   2017-02-15T13:44:34Z

Fixed byte calculation




> Streams Simple Benchmark MB/sec calculation is not correct for Join operations
> --
>
> Key: KAFKA-4752
> URL: https://issues.apache.org/jira/browse/KAFKA-4752
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.3.0
>Reporter: Damian Guy
>Assignee: Eno Thereska
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> When benchmarking join operations via the {{SimpleBenchmark}} we produce 2 * 
> {{numRecords}}, i.e, {{numRecords}} to topic1 and {{numRecords}} to topic2.
> The {{CountDownLatch}} in the {{foreach}} keeps processing until it has 
> received {{numRecords}}. As this is the {{numRecords}} produced from the 
> join, the size of each record will either be {{VALUE_SIZE}} or {{2 * 
> VALUE_SIZE}}. The MB/s calculation doesn't take this into account.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-116 - Add State Store Checkpoint Interval Configuration

2017-02-15 Thread Damian Guy
Ok. lets close this KIP off then as it isn't needed at the moment. We can
revive later if needed.

On Tue, 14 Feb 2017 at 04:16 Eno Thereska  wrote:

> Even if users commit on every record, the expensive part will not be the
> checkpointing proposed in this KIP, but the rest of the commit.
>
> Eno
>
>
> > On 13 Feb 2017, at 23:46, Guozhang Wang  wrote:
> >
> > I think I'm OK to always enable checkpointing, but I'm not sure if we
> want
> > to checkpoint on every commit. Since in the extreme case users can commit
> > on completed processing each record. So I think it is still valuable to
> > have a checkpoint internal config in this KIP, which can be ignored if
> EOS
> > is turned on. That being said, if most people are favoring checkpointing
> on
> > each commit we can try that with this as well, since it won't change any
> > public APIs and we can still add this config in the future if we do
> observe
> > some users reporting it has huge perf impacts.
> >
> >
> >
> > Guozhang
> >
> > On Fri, Feb 10, 2017 at 12:20 PM, Damian Guy 
> wrote:
> >
> >> I'm fine with that. Gouzhang?
> >> On Fri, 10 Feb 2017 at 19:45, Matthias J. Sax 
> >> wrote:
> >>
> >>> I am actually supporting Eno's view: checkpoint on every commit.
> >>>
> >>> @Dhwani: I understand your view and did raise the same question about
> >>> performance trade-off with checkpoiting enabled/disabled etc. However,
> >>> it seems that writing the checkpoint file is super cheap -- thus, there
> >>> is nothing to gain performance wise by disabling it.
> >>>
> >>> For Streams EoS we do not need the checkpoint file -- but we should
> have
> >>> a switch for EoS anyway and can disable the checkpoint file for this
> >>> case. And even if there is no switch and we enable EoS all the time, we
> >>> can get rid of the checkpoint file overall (making the parameter
> >> obsolete).
> >>>
> >>> IMHO, if the config parameter is not really useful, we should not have
> >> it.
> >>>
> >>>
> >>> -Matthias
> >>>
> >>>
> >>> On 2/10/17 9:27 AM, Damian Guy wrote:
>  Gouzhang, Thanks for the clarification. Understood.
> 
>  Eno, you are correct if we just used commit interval then we wouldn't
> >>> need
>  a KIP. But, then we'd have no way of turning it off.
> 
>  On Fri, 10 Feb 2017 at 17:14 Eno Thereska 
> >>> wrote:
> 
> > A quick check: the checkpoint file is not new, we're just exposing a
> >>> knob
> > on when to set it, right? Would turning if off still do what it does
> >>> today
> > (i.e., write the checkpoint at the end when the user quits?) So it's
> >>> not a
> > new feature as such, I was only recommending we dial up the frequency
> >> by
> > default. With that option arguably we don't even need a KIP.
> >
> > Eno
> >
> >
> >
> >> On 10 Feb 2017, at 17:02, Guozhang Wang  wrote:
> >>
> >> Damian,
> >>
> >> I was thinking if it is a new failure scenarios but as Eno pointed
> >> out
> >>> it
> >> was not.
> >>
> >> Another thing I was considering is if it has any impact for
> >>> incorporating
> >> KIP-98 to avoid duplicates: if there is a failure in the middle of a
> >> transaction, then upon recovery we cannot rely on the local state
> >> store
> >> file even if the checkpoint file exists, since the local state store
> >>> file
> >> may not be at the transaction boundaries. But since Streams will
> >> likely
> > to
> >> have EOS as an opt-in I think it is still worthwhile to add this
> >>> feature,
> >> just keeping in mind that when EOS is turned on it may cease to be
> >> effective.
> >>
> >> And yes, I'd suggest we leave the config value to be possibly
> > non-positive
> >> to indicate not turning on this feature for the reason above: if it
> >>> will
> >> not be effective then we want to leave it as an option to be turned
> >>> off.
> >>
> >> Guozhang
> >>
> >>
> >> On Fri, Feb 10, 2017 at 8:06 AM, Eno Thereska <
> >> eno.there...@gmail.com>
> >> wrote:
> >>
> >>> The overhead of writing to the checkpoint file should be much, much
> >>> smaller than the overall overhead of doing a commit, so I think
> >> tuning
> > the
> >>> commit time is sufficient to guide performance tradeoffs.
> >>>
> >>> Eno
> >>>
>  On 10 Feb 2017, at 13:08, Dhwani Katagade <
> > dhwani_katag...@persistent.co
> >>> .in> wrote:
> 
>  May be for fine tuning the performance.
>  Say we don't need the checkpointing and would like to gain the lil
> >>> bit
> >>> of performance improvement by turning it off.
>  The trade off is between giving people control knobs vs
> >> complicating
> > the
> >>> complete set of knobs.
> 
>  -dk
> 
>  On Friday 10 February 2017 

[jira] [Work started] (KAFKA-4752) Streams Simple Benchmark MB/sec calculation is not correct for Join operations

2017-02-15 Thread Eno Thereska (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4752 started by Eno Thereska.
---
> Streams Simple Benchmark MB/sec calculation is not correct for Join operations
> --
>
> Key: KAFKA-4752
> URL: https://issues.apache.org/jira/browse/KAFKA-4752
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.3.0
>Reporter: Damian Guy
>Assignee: Eno Thereska
>Priority: Minor
> Fix For: 0.10.3.0
>
>
> When benchmarking join operations via the {{SimpleBenchmark}} we produce 2 * 
> {{numRecords}}, i.e, {{numRecords}} to topic1 and {{numRecords}} to topic2.
> The {{CountDownLatch}} in the {{foreach}} keeps processing until it has 
> received {{numRecords}}. As this is the {{numRecords}} produced from the 
> join, the size of each record will either be {{VALUE_SIZE}} or {{2 * 
> VALUE_SIZE}}. The MB/s calculation doesn't take this into account.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4767) KafkaProducer is not joining its IO thread properly

2017-02-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Buğra Gedik updated KAFKA-4767:
---
Summary: KafkaProducer is not joining its IO thread properly  (was: 
KafkaProducer is not joining the thread properly)

> KafkaProducer is not joining its IO thread properly
> ---
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  // join again (if you want to be more accurate, you can re-adjust 
> the timeout)
>  this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4767) KafkaProducer is not joining the thread properly

2017-02-15 Thread JIRA
Buğra Gedik created KAFKA-4767:
--

 Summary: KafkaProducer is not joining the thread properly
 Key: KAFKA-4767
 URL: https://issues.apache.org/jira/browse/KAFKA-4767
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.11.0.0
Reporter: Buğra Gedik
Priority: Minor


The {{KafkaProducer}} is not properly joining the thread it creates. The code 
is like this:

{code}
try {
this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
firstException.compareAndSet(null, t);
log.error("Interrupted while joining ioThread", t);
}
{code}

If the code is interrupted while performing the join, it will end up leaving 
the io thread running. The correct way of handling this is a follows:
{code}
try {
this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
// propagate the interrupt
this.ioThread.interrupt();
try { 
 // join again (if you want to be more accurate, you can re-adjust the 
time)
 this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
firstException.compareAndSet(null, t);
log.error("Interrupted while joining ioThread", t);
} finally {
// make sure we maintain the interrupted status
Thread.currentThread.interrupt();
}
}
{code}





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4767) KafkaProducer is not joining the thread properly

2017-02-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/KAFKA-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Buğra Gedik updated KAFKA-4767:
---
Description: 
The {{KafkaProducer}} is not properly joining the thread it creates. The code 
is like this:

{code}
try {
this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
firstException.compareAndSet(null, t);
log.error("Interrupted while joining ioThread", t);
}
{code}

If the code is interrupted while performing the join, it will end up leaving 
the io thread running. The correct way of handling this is a follows:
{code}
try {
this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
// propagate the interrupt
this.ioThread.interrupt();
try { 
 // join again (if you want to be more accurate, you can re-adjust the 
timeout)
 this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
firstException.compareAndSet(null, t);
log.error("Interrupted while joining ioThread", t);
} finally {
// make sure we maintain the interrupted status
Thread.currentThread.interrupt();
}
}
{code}



  was:
The {{KafkaProducer}} is not properly joining the thread it creates. The code 
is like this:

{code}
try {
this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
firstException.compareAndSet(null, t);
log.error("Interrupted while joining ioThread", t);
}
{code}

If the code is interrupted while performing the join, it will end up leaving 
the io thread running. The correct way of handling this is a follows:
{code}
try {
this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
// propagate the interrupt
this.ioThread.interrupt();
try { 
 // join again (if you want to be more accurate, you can re-adjust the 
time)
 this.ioThread.join(timeUnit.toMillis(timeout));
} catch (InterruptedException t) {
firstException.compareAndSet(null, t);
log.error("Interrupted while joining ioThread", t);
} finally {
// make sure we maintain the interrupted status
Thread.currentThread.interrupt();
}
}
{code}




> KafkaProducer is not joining the thread properly
> 
>
> Key: KAFKA-4767
> URL: https://issues.apache.org/jira/browse/KAFKA-4767
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Buğra Gedik
>Priority: Minor
>
> The {{KafkaProducer}} is not properly joining the thread it creates. The code 
> is like this:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> }
> {code}
> If the code is interrupted while performing the join, it will end up leaving 
> the io thread running. The correct way of handling this is a follows:
> {code}
> try {
> this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> // propagate the interrupt
> this.ioThread.interrupt();
> try { 
>  // join again (if you want to be more accurate, you can re-adjust 
> the timeout)
>  this.ioThread.join(timeUnit.toMillis(timeout));
> } catch (InterruptedException t) {
> firstException.compareAndSet(null, t);
> log.error("Interrupted while joining ioThread", t);
> } finally {
> // make sure we maintain the interrupted status
> Thread.currentThread.interrupt();
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4766) Document lz4 and lz4hc in confluence

2017-02-15 Thread Daniel Pinyol (JIRA)
Daniel Pinyol created KAFKA-4766:


 Summary: Document lz4 and lz4hc in confluence
 Key: KAFKA-4766
 URL: https://issues.apache.org/jira/browse/KAFKA-4766
 Project: Kafka
  Issue Type: Sub-task
  Components: documentation
Affects Versions: 0.8.2.0
Reporter: Daniel Pinyol


https://cwiki.apache.org/confluence/display/KAFKA/Compression does not mention 
that lz4 and lz4hc compressions are supported 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4277) creating ephemeral node already exist

2017-02-15 Thread Steven Aerts (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867671#comment-15867671
 ] 

Steven Aerts commented on KAFKA-4277:
-

Reproduced on our side with Scala 2.11 kafka_2.11-0.10.1.1, zookeeper v. 3.4.8.

We saw this problem happening on all our instances simultaneously (3ms apart).  
Some of them were already up for a long time, some of them were rather new.  We 
have the impression that it was triggered due to a network issue.  On zookeeper 
there is however nothing visible.  (Maybe it did not even notice that the 
client decided to stop the session due to some strange effects like half closed 
tcp socket?)

The zkclient instantaneously retries and triggers this exception.  So it does 
not wait for a zookeeper timeout or something.

I am a fan of re-introducing the workaround which was present in 0.8.1.
Let me know if you are interested in more traces.


> creating ephemeral node already exist
> -
>
> Key: KAFKA-4277
> URL: https://issues.apache.org/jira/browse/KAFKA-4277
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Feixiang Yan
>
> I use zookeeper 3.4.6.
> Zookeeper session time out, zkClient try reconnect failed. Then re-establish 
> the session and re-registering broker info in ZK, throws NODEEXISTS Exception.
>  I think it is because the ephemeral node which created by old session has 
> not removed. 
> I read the 
> [ZkUtils.scala|https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/utils/ZkUtils.scala]
>  of 0.8.1, createEphemeralPathExpectConflictHandleZKBug try create node in a 
> while loop until create success. This can solve the issue. But in 
> [ZkUtils.scala|https://github.com/apache/kafka/blob/0.10.0.1/core/src/main/scala/kafka/utils/ZkUtils.scala]
>   0.10.1 the function removed.
> {noformat}
> [2016-10-07 19:00:32,562] INFO Socket connection established to 
> 10.191.155.238/10.191.155.238:21819, initiating session 
> (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,563] INFO zookeeper state changed (Expired) 
> (org.I0Itec.zkclient.ZkClient)
> [2016-10-07 19:00:32,564] INFO Unable to reconnect to ZooKeeper service, 
> session 0x1576b11f9b201bd has expired, closing socket connection 
> (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,564] INFO Initiating client connection, 
> connectString=10.191.155.237:21819,10.191.155.238:21819,10.191.155.239:21819/cluster2
>  sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@ae71be2 
> (org.apache.zookeeper.ZooKeeper)
> [2016-10-07 19:00:32,566] INFO Opening socket connection to server 
> 10.191.155.237/10.191.155.237:21819. Will not attempt to authenticate using 
> SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,566] INFO Socket connection established to 
> 10.191.155.237/10.191.155.237:21819, initiating session 
> (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,566] INFO EventThread shut down 
> (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,567] INFO Session establishment complete on server 
> 10.191.155.237/10.191.155.237:21819, sessionid = 0x1579ecd39c20006, 
> negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
> [2016-10-07 19:00:32,567] INFO zookeeper state changed (SyncConnected) 
> (org.I0Itec.zkclient.ZkClient)
> [2016-10-07 19:00:32,608] INFO re-registering broker info in ZK for broker 3 
> (kafka.server.KafkaHealthcheck$SessionExpireListener)
> [2016-10-07 19:00:32,610] INFO Creating /brokers/ids/3 (is it secure? false) 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-10-07 19:00:32,611] INFO Result of znode creation is: NODEEXISTS 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-10-07 19:00:32,614] ERROR Error handling event ZkEvent[New session 
> event sent to kafka.server.KafkaHealthcheck$SessionExpireListener@324f1bc] 
> (org.I0Itec.zkclient.ZkEventThread)
> java.lang.RuntimeException: A broker is already registered on the path 
> /brokers/ids/3. This probably indicates that you either have configured a 
> brokerid that is already in use, or else you have shutdown this broker and 
> restarted it faster than the zookeeper timeout so it appears to be 
> re-registering.
> at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:305)
> at kafka.utils.ZkUtils.registerBrokerInZk(ZkUtils.scala:291)
> at kafka.server.KafkaHealthcheck.register(KafkaHealthcheck.scala:70)
> at 
> kafka.server.KafkaHealthcheck$SessionExpireListener.handleNewSession(KafkaHealthcheck.scala:104)
> at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Correct prefetching of data to KTable-like structure on application startup

2017-02-15 Thread Jan Lukavský
Hi Matthias,
yes, that's exactly what I was looking for. I wasn't aware of the 
possibility to get the starting offset of a partition. My bad, thanks a lot.
Cheers,
 Jan


-- Původní zpráva --
Od: Matthias J. Sax 
Komu: dev@kafka.apache.org
Datum: 15. 2. 2017 2:57:54
Předmět: Re: Correct prefetching of data to KTable-like structure on 
application startup

"Jan, 

If I understand you problem correctly, you do something like this on 
startup (I simplify to single partition) 

endOffset = consumer.endOffset(...) 

while (!done) { 
for (ConsumerRecord r : consumer.poll()) { 
// do some processing 
if (r.offset == endOffset) { 
done = true; 
break; 
} 
} 
} 

If your partitions is empty, poll() never returns anything and thus you 
loop forever. 

However, to solve this problem, you can simple check the "start offset" 
of the partitions before the loop. If start and end offset are the same, 
the partitions is empty and you never call poll. 

startOffset = consumer.beginningOffset(...) 
endOffset = consumer.endOffset(...) 

if(startOffset < endOffset) { 
while (!done) { 
for (ConsumerRecord r : consumer.poll()) { 
// do some processing 
if (r.offset == endOffset) { 
done = true; 
break; 
} 
} 
} 
} 


Does this help? 


-Matthias 

On 2/14/17 12:31 AM, Jan Lukavský wrote: 
> Hi Matthias, 
> 
> I understand that the local cache will not be automatically cleared, but 
> that is not an issue for me now. 
> 
> The problem I see is still the same as at the beginning - even caching 
> data to RocksDB in KafkaStreams implementation might (I would say) 
> suffer from this issue. When using time based data retention (for 
> whatever reason, maybe in combination with the log compation, but the 
> the issue is there irrespective to whether the log compation is used or 
> not), it is possible that some partition will report nonzero "next" 
> offset, but will not be able to deliver any message to the KafkaConsumer 
> (because the partition is emptied by the data retention) and therefore 
> the consumer will not be able to finish the materialization of the topic 
> to local store (either RocksDB or any other cache) and therefore will 
> not be able to start processing the KStream. If I understand the problem 
> right, then using timestamp will not help either, because there must be 
> some sort of vector clock with a time dimension for each input 
> partition, and the empty partition will not be able to move the 
> timestamp any further, and therefore the whole system will remain 
> blocked at timestamp 0, because the vector clock usually calculates 
> minimum from all time dimensions. 
> 
> Does that make any sense, or I am doing something fundamentally wrong? :) 
> 
> Thanks again for any thoughts, 
> 
> Jan 
> 
> 
> On 02/13/2017 06:37 PM, Matthias J. Sax wrote: 
>> Jan, 
>> 
>> brokers with version 0.10.1 or higher allow to set both topic cleanup 
>> policies in combination: 
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+
compaction+and+deletion+to+co-exist 
>> 
>> 
>> However, this will only delete data in you changelog topic but not in 
>> your RocksDB -- if you want to get data delete in RocksDB, you would 
>> need to send tombstone messages for those keys. It's kinda tricky to get 
>> this done. 
>> 
>> An "brute force" alternative would be, stop the application, delete the 
>> local state directory, and restart. This will force Streams to recreate 
>> the RocksDB files from the changelog and thus only loading keys that got 
>> not deleted. But this is of course a quite expensive approach and you 
>> should be very careful about using it. 
>> 
>> 
>> -Matthias 
>> 
>> 
>> On 2/13/17 12:25 AM, Jan Lukavský wrote: 
>>> Hi Michael, 
>>> 
>>> sorry for my late answer. Configuring the topic as you suggest is one 
>>> option (and I will configure it that way), but I wanted to combine the 
>>> two data retention mechanisms (if possible). I would like to use log 
>>> compaction, so that I will always get at least the last message for 
>>> given key, but I would also like to use the classical temporal data 
>>> retention, which would function as a sort of TTL for the keys - if a key

>>> doesn't get an update for the configured period of time, if could be 
>>> removed. That way I could ensure that out-dated keys could be removed. 
>>> 
>>> Is there any other option for this? And can kafka be configured this 
>>> way? 
>>> 
>>> Best, 
>>> 
>>> Jan 
>>> 
>>> On 02/09/2017 12:08 PM, Michael Noll wrote: 
 Jan, 
 
> - if I don't send any data to a kafka partition for a period longer 
> then 
 the data retention interval, then all data from the partition is wiped 
 out 
 
 If I interpret your first and second message in this email thread 
 correctly, then you are talking only about your "state topic" here, 
 i.e. 
 the topic that you read into a KTable. You should configure this 
 topic to 
 use log compaction, 

[jira] [Comment Edited] (KAFKA-4762) Consumer throwing RecordTooLargeException even when messages are not that large

2017-02-15 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864954#comment-15864954
 ] 

huxi edited comment on KAFKA-4762 at 2/15/17 9:17 AM:
--

Logs show that you are using 0.10.x(or before) where max.partition.fetch.bytes 
is a hard limit even when you enable the compression. In your case, seems that 
you have enabled the compression on the producer side. 
`max.partition.fetch.bytes` also applies to the whole compressed message which 
is often much larger than a single one. That's why you run into 
RecordTooLargeException.

0.10.1 which completes 
[KIP-74|https://cwiki.apache.org/confluence/display/KAFKA/KIP-74:+Add+Fetch+Response+Size+Limit+in+Bytes]
 already 'fixes' your problem by making  `max.partition.fetch.bytes` field in 
the fetch request much less useful, so you can try with an 0.10.1 build.



was (Author: huxi_2b):
Logs show that you are using 0.10.x where max.partition.fetch.bytes is a hard 
limit even when you enable the compression. In your case, seems that you have 
enabled the compression on the producer side. `max.partition.fetch.bytes` also 
applies to the whole compressed message which is often much larger than a 
single one. That's why you run into RecordTooLargeException.

0.10.1 which completes 
[KIP-74|https://cwiki.apache.org/confluence/display/KAFKA/KIP-74:+Add+Fetch+Response+Size+Limit+in+Bytes]
 already 'fixes' your problem by making  `max.partition.fetch.bytes` field in 
the fetch request much less useful, so you can try with an 0.10.1 build.


> Consumer throwing RecordTooLargeException even when messages are not that 
> large
> ---
>
> Key: KAFKA-4762
> URL: https://issues.apache.org/jira/browse/KAFKA-4762
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Vipul Singh
>
> We were just recently hit by a weird error. 
> Before going in any further, explaining of our service setup. we have a 
> producer which produces messages not larger than 256 kb of messages( we have 
> an explicit check about this on the producer side) and on the client side we 
> have a fetch limit of 512kb(max.partition.fetch.bytes is set to 524288 bytes) 
> Recently our client started to see this error:
> {quote}
> org.apache.kafka.common.errors.RecordTooLargeException: There are some 
> messages at [Partition=Offset]: {topic_name-0=9925056036} whose size is 
> larger than the fetch size 524288 and hence cannot be ever returned. Increase 
> the fetch size, or decrease the maximum message size the broker will allow.
> {quote}
> We tried consuming messages with another consumer, without any 
> max.partition.fetch.bytes limit, and it consumed fine. The messages were 
> small, and did not seem to be greater than 256 kb
> We took a log dump, and the log size looked fine.
> {quote}
> mpresscodec: NoCompressionCodec crc: 2473548911 keysize: 8
> offset: 9925056032 position: 191380053 isvalid: true payloadsize: 539 magic: 
> 0 compresscodec: NoCompressionCodec crc: 1656420267 keysize: 8
> offset: 9925056033 position: 191380053 isvalid: true payloadsize: 1551 magic: 
> 0 compresscodec: NoCompressionCodec crc: 2398479758 keysize: 8
> offset: 9925056034 position: 191380053 isvalid: true payloadsize: 1307 magic: 
> 0 compresscodec: NoCompressionCodec crc: 2845554215 keysize: 8
> offset: 9925056035 position: 191380053 isvalid: true payloadsize: 1520 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3106984195 keysize: 8
> offset: 9925056036 position: 191713371 isvalid: true payloadsize: 1207 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3462154435 keysize: 8
> offset: 9925056037 position: 191713371 isvalid: true payloadsize: 418 magic: 
> 0 compresscodec: NoCompressionCodec crc: 1536701802 keysize: 8
> offset: 9925056038 position: 191713371 isvalid: true payloadsize: 299 magic: 
> 0 compresscodec: NoCompressionCodec crc: 4112567543 keysize: 8
> offset: 9925056039 position: 191713371 isvalid: true payloadsize: 1571 magic: 
> 0 compresscodec: NoCompressionCodec crc: 3696994307 keysize: 8
> {quote}
> Has anyone seen something similar? or any points to troubleshoot this further
> Please Note: To overcome this issue, we deployed a new consumer, without this 
> limit of max.partition.fetch.bytes, and it worked fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4765) org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)

2017-02-15 Thread Armin Braun (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Armin Braun updated KAFKA-4765:
---
Priority: Major  (was: Critical)

> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
>  and Similar Tests are Failing on some Systems (127.0.53.53 Collision Warning)
> -
>
> Key: KAFKA-4765
> URL: https://issues.apache.org/jira/browse/KAFKA-4765
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 0.10.1.1
> Environment: All DNS environments that properly forward 127.0.53.53
>Reporter: Armin Braun
>
> The test
> {code}
> org.apache.kafka.clients.producer.KafkaProducerTest#testConstructorFailureCloseResource
> {code}
> fails on some systems because this below snippet from 
> {code}
> org.apache.kafka.clients.ClientUtils#parseAndValidateAddresses
> {code}
> {code}
> InetSocketAddress address = new InetSocketAddress(host, 
> port);
> if (address.isUnresolved()) {
> log.warn("Removing server {} from {} as DNS 
> resolution failed for {}", url, CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG, 
> host);
> } else {
> addresses.add(address);
> }
> {code}
> will add the address *some.invalid.hostname.foo.bar* to the addresses list 
> without error since it is resolved to *127.0.53.53* to indicate potential 
> future collision of the _.bar_ tld.
> The same issue applies to a few other test cases that try to intentionally 
> run into broken hostnames.
> This can (and should) be fixed by using broken hostname examples that do not 
> collide. I would suggest just putting a ".local" suffix on all that are 
> currently affected by this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)