date:20141016

[jira] [Updated] (KAFKA-1506) Cancel "kafka-reassign-partitions" Job

2014-10-16 Thread Neha Narkhede (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1506:
-
Labels: newbie++  (was: )

> Cancel "kafka-reassign-partitions" Job
> --
>
> Key: KAFKA-1506
> URL: https://issues.apache.org/jira/browse/KAFKA-1506
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication, tools
>Affects Versions: 0.8.1, 0.8.1.1
>Reporter: Paul Lung
>Assignee: Neha Narkhede
>  Labels: newbie++
>
> I started a reassignment, and for some reason it just takes forever. However, 
> it won¹t let me start another reassignment job while this one is running. So 
> a tool to cancel a reassignment job is needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1506) Cancel "kafka-reassign-partitions" Job

2014-10-16 Thread Neha Narkhede (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1506:
-
Reviewer: Neha Narkhede
Assignee: (was: Neha Narkhede)

> Cancel "kafka-reassign-partitions" Job
> --
>
> Key: KAFKA-1506
> URL: https://issues.apache.org/jira/browse/KAFKA-1506
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication, tools
>Affects Versions: 0.8.1, 0.8.1.1
>Reporter: Paul Lung
>  Labels: newbie++
>
> I started a reassignment, and for some reason it just takes forever. However, 
> it won¹t let me start another reassignment job while this one is running. So 
> a tool to cancel a reassignment job is needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1481) Stop using dashes AND underscores as separators in MBean names

2014-10-16 Thread Jun Rao (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173792#comment-14173792
 ] 

Jun Rao commented on KAFKA-1481:


4. removeAllMetricsInList() will be called when a producer/consumer instance is 
closed to remove metrics related to a specific client id.

> Stop using dashes AND underscores as separators in MBean names
> --
>
> Key: KAFKA-1481
> URL: https://issues.apache.org/jira/browse/KAFKA-1481
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.1.1
>Reporter: Otis Gospodnetic
>Priority: Critical
>  Labels: patch
> Fix For: 0.8.2
>
> Attachments: KAFKA-1481_2014-06-06_13-06-35.patch, 
> KAFKA-1481_2014-10-13_18-23-35.patch, KAFKA-1481_2014-10-14_21-53-35.patch, 
> KAFKA-1481_2014-10-15_10-23-35.patch, 
> KAFKA-1481_IDEA_IDE_2014-10-14_21-53-35.patch, 
> KAFKA-1481_IDEA_IDE_2014-10-15_10-23-35.patch
>
>
> MBeans should not use dashes or underscores as separators because these 
> characters are allowed in hostnames, topics, group and consumer IDs, etc., 
> and these are embedded in MBeans names making it impossible to parse out 
> individual bits from MBeans.
> Perhaps a pipe character should be used to avoid the conflict. 
> This looks like a major blocker because it means nobody can write Kafka 0.8.x 
> monitoring tools unless they are doing it for themselves AND do not use 
> dashes AND do not use underscores.
> See: http://search-hadoop.com/m/4TaT4lonIW



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSSION] Message Metadata

2014-10-16 Thread Joel Koshy

I think the tags are a useful concept to have in that they do for
applications, what the additional metadata does for brokers. i.e.,
avoiding decompression and recompression of an entire message-set. I
agree that we should not place any "core" fields (i.e., those used
internally by Kafka) in tags and those should be first-class fields in
the message header.  E.g., if we intend to support in-built end-to-end
audit in Kafka then fields for auditing (server, timestamps, etc.)
should be first-class fields in the message header.  However, tags are
useful for application-level features that can avoid a full
decompression.

Although Avro has the ability to just deserialize select fields (say a
header) we then limit the optimization to avro-like formats. Also,
that will remain an application-specific thing and not an intrinsic
part of the wire protocol. i.e., brokers will continue to have to
decompress and recompress messages to assign offsets.

Joel

On Wed, Oct 15, 2014 at 09:04:55PM +, Todd Palino wrote:
> Let me add my view on #2 in less delicate terms than Guozhang did :)
> 
> When you¹re trying to run Kafka as a service, having to care about the
> format of the message sucks. I have plenty of users who are just fine
> using the Avro standard and play nice. Then I have a bunch of users who
> don¹t want to use Avro and want to do something else (json, some plain
> text, whatever). Then I have a bunch of users who use Avro but don¹t
> properly register their schemas. Then I have a bunch of users who do
> whatever they want and don¹t tell us.
> 
> What this means is that I can¹t have standard tooling, like auditing, that
> works on the entire system. I either have to whitelist or blacklist
> topics, and then I run into problems when someone adds something new
> either way. It would be preferable if I could monitor and maintain the
> health of the system without having to worry about the message format.
> 
> -Todd
> 
> 
> On 10/15/14, 10:50 AM, "Guozhang Wang"  wrote:
> 
> >Thanks Joe,
> >
> >I think we now have a few open questions to discuss around this topic:
> >
> >1. Shall we make core Kafka properties as first class fields in message
> >header or put them as tags?
> >
> >The pros of the first approach is more compacted format and hence less
> >message header overhead; the cons are that any future message header
> >change
> >needs protocol bump and possible multi-versioned handling on the server
> >side.
> >
> >Vice versa for the second approach.
> >
> >2. Shall we leave app properties still in message content and enforce
> >schema based topics or make them as extensible tags?
> >
> >The pros of the first approach is again saving message header overhead for
> >apps properties; and the cons are that it enforce schema usage for message
> >content to be partially de-serialized only for app header. At LinkedIn we
> >enforce Avro schemas for auditing purposes, and as a result the Kafka team
> >has to manage the schema registration process / schema repository as well.
> >
> >3. Which properties should be core KAFKA and which should be app
> >properties? For example, shall we make properties that only MM cares about
> >as app properties or Kafka properties?
> >
> >Guozhang
> >
> >On Tue, Oct 14, 2014 at 5:10 AM, Joe Stein  wrote:
> >
> >> I think we could add schemaId(binary) to the MessageAndMetaData
> >>
> >> With the schemaId you can implement different downstream software
> >>pattern
> >> on the messages reliably. I wrote up more thoughts on this use
> >> https://cwiki.apache.org/confluence/display/KAFKA/Schema+based+topics it
> >> should strive to encompass all implementation needs for producer,
> >>broker,
> >> consumer hooks.
> >>
> >> So if the application and tagged fields are important you can package
> >>that
> >> into a specific Kafka topic plug-in and assign it to topic(s).  Kafka
> >> server should be able to validate your expected formats (like
> >> encoders/decoders but in broker by topic regardless of producer) to the
> >> topics that have it enabled. We should have these maintained in the
> >>project
> >> under contrib.
> >>
> >> =- Joestein
> >>
> >> On Mon, Oct 13, 2014 at 11:02 PM, Guozhang Wang 
> >> wrote:
> >>
> >> > Hi Jay,
> >> >
> >> > Thanks for the comments. Replied inline.
> >> >
> >> > Guozhang
> >> >
> >> > On Mon, Oct 13, 2014 at 11:11 AM, Jay Kreps 
> >>wrote:
> >> >
> >> > > I need to take more time to think about this. Here are a few
> >> off-the-cuff
> >> > > remarks:
> >> > >
> >> > > - To date we have tried really, really hard to keep the data model
> >>for
> >> > > message simple since after all you can always add whatever you like
> >> > inside
> >> > > the message body.
> >> > >
> >> > > - For system tags, why not just make these fields first class
> >>fields in
> >> > > message? The purpose of a system tag is presumably that Why have a
> >> bunch
> >> > of
> >> > > key-value pairs versus first-class fields?
> >> > >
> >> >
> >> > Yes, we can alternatively make system t

Review Request 26811: Patch for KAFKA-1196

2014-10-16 Thread Ewen Cheslack-Postava


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26811/
---

Review request for kafka.


Bugs: KAFKA-1196
https://issues.apache.org/jira/browse/KAFKA-1196


Repository: kafka


Description
---

KAFKA-1196 WIP Ensure FetchResponses don't exceed 2GB limit.


Diffs
-

  core/src/main/scala/kafka/api/FetchResponse.scala 
8d085a1f18f803b3cebae4739ad8f58f95a6c600 
  core/src/main/scala/kafka/server/KafkaApis.scala 
85498b4a1368d3506f19c4cfc64934e4d0ac4c90 
  core/src/test/scala/unit/kafka/integration/PrimitiveApiTest.scala 
a5386a03b62956bc440b40783247c8cdf7432315 

Diff: https://reviews.apache.org/r/26811/diff/


Testing
---


Thanks,

Ewen Cheslack-Postava

Re: [DISCUSS] Release 0.8.2-beta before 0.8.2?

2014-10-16 Thread Neha Narkhede

Another JIRA that will be nice to include as part of 0.8.2-beta is
https://issues.apache.org/jira/browse/KAFKA-1481 that fixes the mbean
naming. Looking for people's thoughts on 2 things here -

1. How do folks feel about doing a 0.8.2-beta release right now and 0.8.2
final 4-5 weeks later?
2. Do people want to include any JIRAs (other than the ones mentioned
above) in 0.8.2-beta? If so, it will be great to know now so it will allow
us to move forward with the beta release quickly.

Thanks,
Neha

On Wed, Oct 15, 2014 at 4:46 PM, Neha Narkhede 
wrote:

> Hi,
>
> We have accumulated an impressive list of pretty major features in 0.8.2 -
> Delete topic
> Automated leader rebalancing
> Controlled shutdown
> Offset management
> Parallel recovery
> min.isr and
> clean leader election
>
> In the past, what has worked for major feature releases is a beta release
> prior to a final release. I'm proposing we do the same for 0.8.2. The only
> blockers for 0.8.2-beta, that I know of are -
>
> https://issues.apache.org/jira/browse/KAFKA-1493 (Is a major change and
> requires some thinking about the new dependency. Since it is not fully
> ready and there are things to think about, I suggest we take it out, think
> it end to end and then include it in 0.8.3.)
> https://issues.apache.org/jira/browse/KAFKA-1634 (This has an owner:
> Guozhang Wang)
> https://issues.apache.org/jira/browse/KAFKA-1671 (Has a patch and is
> waiting on a review by Joe Stein)
>
> It seems that 1634 and 1671 can get wrapped up in a week. Do people think
> we should cut 0.8.2-beta by next week?
>
> Thanks,
> Neha
>

[jira] [Updated] (KAFKA-1196) java.lang.IllegalArgumentException Buffer.limit on FetchResponse.scala + 33

2014-10-16 Thread Ewen Cheslack-Postava (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1196:
-
Attachment: KAFKA-1196.patch

> java.lang.IllegalArgumentException Buffer.limit on FetchResponse.scala + 33
> ---
>
> Key: KAFKA-1196
> URL: https://issues.apache.org/jira/browse/KAFKA-1196
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.0
> Environment: running java 1.7, linux and kafka compiled against scala 
> 2.9.2
>Reporter: Gerrit Jansen van Vuuren
>Priority: Blocker
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1196.patch
>
>
> I have 6 topics each with 8 partitions spread over 4 kafka servers.
> the servers are 24 core 72 gig ram.
> While consuming from the topics I get an IlegalArgumentException and all 
> consumption stops, the error keeps on throwing.
> I've tracked it down to FectchResponse.scala line 33
> The error happens when the FetchResponsePartitionData object's readFrom 
> method calls:
> messageSetBuffer.limit(messageSetSize)
> I put in some debug code the the messageSetSize is 671758648, while the 
> buffer.capacity() gives 155733313, for some reason the buffer is smaller than 
> the required message size.
> I don't know the consumer code enough to debug this. It doesn't matter if 
> compression is used or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1196) java.lang.IllegalArgumentException Buffer.limit on FetchResponse.scala + 33

2014-10-16 Thread Ewen Cheslack-Postava (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173941#comment-14173941
 ] 

Ewen Cheslack-Postava commented on KAFKA-1196:
--

Created reviewboard https://reviews.apache.org/r/26811/diff/
 against branch origin/trunk

> java.lang.IllegalArgumentException Buffer.limit on FetchResponse.scala + 33
> ---
>
> Key: KAFKA-1196
> URL: https://issues.apache.org/jira/browse/KAFKA-1196
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.0
> Environment: running java 1.7, linux and kafka compiled against scala 
> 2.9.2
>Reporter: Gerrit Jansen van Vuuren
>Priority: Blocker
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1196.patch
>
>
> I have 6 topics each with 8 partitions spread over 4 kafka servers.
> the servers are 24 core 72 gig ram.
> While consuming from the topics I get an IlegalArgumentException and all 
> consumption stops, the error keeps on throwing.
> I've tracked it down to FectchResponse.scala line 33
> The error happens when the FetchResponsePartitionData object's readFrom 
> method calls:
> messageSetBuffer.limit(messageSetSize)
> I put in some debug code the the messageSetSize is 671758648, while the 
> buffer.capacity() gives 155733313, for some reason the buffer is smaller than 
> the required message size.
> I don't know the consumer code enough to debug this. It doesn't matter if 
> compression is used or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1196) java.lang.IllegalArgumentException Buffer.limit on FetchResponse.scala + 33

2014-10-16 Thread Ewen Cheslack-Postava (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-1196:
-
Assignee: Ewen Cheslack-Postava
  Status: Patch Available  (was: Open)

> java.lang.IllegalArgumentException Buffer.limit on FetchResponse.scala + 33
> ---
>
> Key: KAFKA-1196
> URL: https://issues.apache.org/jira/browse/KAFKA-1196
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.8.0
> Environment: running java 1.7, linux and kafka compiled against scala 
> 2.9.2
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
>  Labels: newbie
> Fix For: 0.9.0
>
> Attachments: KAFKA-1196.patch
>
>
> I have 6 topics each with 8 partitions spread over 4 kafka servers.
> the servers are 24 core 72 gig ram.
> While consuming from the topics I get an IlegalArgumentException and all 
> consumption stops, the error keeps on throwing.
> I've tracked it down to FectchResponse.scala line 33
> The error happens when the FetchResponsePartitionData object's readFrom 
> method calls:
> messageSetBuffer.limit(messageSetSize)
> I put in some debug code the the messageSetSize is 671758648, while the 
> buffer.capacity() gives 155733313, for some reason the buffer is smaller than 
> the required message size.
> I don't know the consumer code enough to debug this. It doesn't matter if 
> compression is used or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1196) java.lang.IllegalArgumentException Buffer.limit on FetchResponse.scala + 33

2014-10-16 Thread Ewen Cheslack-Postava (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173943#comment-14173943
]

Ewen Cheslack-Postava commented on KAFKA-1196:
--

This is a wip patch to fix this issue, which previous discussion suggests was
due to the FetchResponse exceeding 2GB. My approach to triggering the issue,
however, doesn't exhibit exactly the same issue but does cause an unrecoverable
error that causes the consumer connection to terminate. (For reference, it
causes the server to fail when FetchResponseSend.writeTo calls expectIncomplete
and sendSize is negative due to overflow. This confuses the server since it
looks like the message is already done sending and the server forcibly closes
the consumer's connection.)

The patch addresses the core issue by ensuring the returned message doesn't
exceed 2GB by dropping parts of it in a way that otherwise shouldn't affect the
consumer. But there are a lot of points that still need to be addressed:

* I started by building an integration test to trigger the issue, included in
PrimitiveApiTest. However, since we necessarily need to have > 2GB data to
trigger the issue, it's probably too expensive to include in this way. Offline
discussion suggests maybe a system test would be a better place to include
this. It's still included here for completeness.
* The implementation filters to a subset of the data in FetchResponse. The main
reason for this is that this process needs to know the exact (or at least
conservative estimate) size of serialized data, which only FetchResponse knows.
But it's also a bit weird compared to other message classes, which are case
classes and don't modify those inputs.
* Algorithm for choosing subset to return: initial approach is to remove random
elements until we get below the limit. This is simple to understand and avoids
starvation of specific TopicAndPartitions. Any concerns with this basic
approach?
* I'm pretty sure I've managed to keep the < 2GB case to effectively the same
computational cost (computing the serialized size, grouped data, etc. exactly
once as before). However, for the > 2GB case I've only ensured correctness. In
particular, the progressive removal and reevaluation of serialized size could
potentially be very bad for very large data sets (e.g. starting a mirror maker
against a large data set with large # of partitions from scratch).
* Note that the algorithm never deals with the actual message data, only
metadata about what messages are available. This is relevant since this is what
suggested the approach in the patch could still be performant --
ReplicaManager.readMessageSets processes the entire FetchRequest and filters it
down because the metadata involved is relatively small.
* Based on the previous two points, this really needs some more realistic large
scale system tests to make sure this approach is not only correct, but provides
reasonable performance (or indicates we need to revise the algorithm for
selecting a subset of the data).
* Testing isn't really complete -- I triggered the issue with 4 topics * 600
MB/topic, which is > 2GB. Another obvious case to check is when some partitions
contain > 2GB on their own.
* I'd like someone to help sanity check the exact maximum FetchResponse
serialized size we limit messages to. It's not Int.MaxValue because the
FetchResponseSend class adds 4 + FetchResponse.sizeInBytes for it's own size.
I'd like a sanity check that the extra 4 bytes is enough -- is there any
additional wrapping we might need to account for? Getting a test to hit exactly
that narrow range could be tricky.
* The tests include both immediate-response and purgatory paths, but the
purgatory version requires a timeout in the test, which could end up being
flaky + wasting time, but it doesn't look like there's a great way to mock that
right now. Maybe this doesn't matter if it moves to a system test?
* One case this doesn't handle yet is when the data reaches > 2GB after it's in
the purgatory. The result is correct, but the response is not sent as soon as
that condition is satisfied. This is because it looks like evaluating this
exactly would require calling readMessageSets and evaluating the size of the
message for every DelayedFetch.isSatisifed call. This sounds like it could end
up being pretty expensive. Maybe there's a better way, perhaps an approximate
scheme?
* The test requires some extra bytes in the fetchSize for each partition,
presumably for overhead in encoding. I haven't tracked down exactly how big
that should be, but I'm guessing it could end up affecting the results of more
comprehensive tests.

70 matches

Mail list logo