[GitHub] spark pull request: [SPARK-4229][Streaming] consistent hadoop conf...

2015-07-29 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/7772 [SPARK-4229][Streaming] consistent hadoop configuration, streaming only You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-125977592 If we're talking about this issue https://issues.apache.org/jira/browse/HADOOP-11209 unless there's something arcade about hadoop's jira, it looks like that was only

[GitHub] spark pull request: SPARK-9059 Update Direct Kafka Word count exam...

2015-07-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7467#issuecomment-122551255 I thought the idea for this ticket was to have separate examples for accessing offset ranges. The complexity of offsets doesn't really have anything to do

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-16 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-122010820 Just to be clear, are we talking about removing just the one-line changes to SQLContext and JavaSQLContext? Everything else in the PR I think is necessary

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-16 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-122076899 Except that those streaming changes call into SparkHadoopUtil, which was changed in that PR for thread safety reasons. HadoopRDD was changed so there was only 1 lock

[GitHub] spark pull request: [SPARK-8865][STREAMING] FIX BUG: check key in ...

2015-07-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7254#issuecomment-120150880 In case it wasn't clear, LGTM I don't think it needs a test case, because it would be testing behavior that spark doesn't care about (we don't deal with group

[GitHub] spark pull request: [SPARK-8865][STREAMING] FIX BUG: check key in ...

2015-07-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7254#issuecomment-120150575 zookeeper.connect and group.id aren't necessary for anything in the kafka direct stream. But they're expected to be present in a kafka consumer config

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

2015-07-08 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7185#issuecomment-119735193 Yes, you're understanding things correctly. Yes, scala works that way as well. On Wed, Jul 8, 2015 at 4:20 PM, amit-ramesh notificati...@github.com

[GitHub] spark pull request: [SPARK-8833][STREAMING][WIP] Kafka Direct API ...

2015-07-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7235#issuecomment-118864216 The containsKey thing is a good catch. But this basic idea has been discussed before and rejected: https://issues.apache.org/jira/browse/SPARK-6249

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

2015-07-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7185#issuecomment-118987704 In general, I feel like this is solving the problem of too many wrappers by adding more wrappers. I don't know that it's worth it just to get an instance method

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

2015-07-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/7185#discussion_r33977640 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -670,4 +670,17 @@ private class KafkaUtilsPythonHelper

[GitHub] spark pull request: [SPARK-8833][STREAMING][WIP] Kafka Direct API ...

2015-07-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7235#issuecomment-118871369 @guowei2 do you want to open a separate jira / pull request for that containsKey fix? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

2015-07-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/7185#discussion_r33779900 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -670,4 +670,9 @@ private class KafkaUtilsPythonHelper

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

2015-07-02 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7185#issuecomment-118188461 I don't have a problem with the static method, especially if it prevents yet more wrapper classes. My concern was more about the difference in semantics

[GitHub] spark pull request: [SPARK-8127][Streaming][Kafka] KafkaRDD optimi...

2015-06-24 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-114982933 Cheers :) On Wed, Jun 24, 2015 at 2:06 PM, Tathagata Das notificati...@github.com wrote: I forgot to say, thanks Cody! :) â

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-19 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32872490 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -68,6 +68,21 @@ class KafkaRDDSuite extends

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8390] fix docs relate...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6863#issuecomment-113632591 The word count examples don't have any need of accessing offsets. Wouldn't it be better to have separate examples? I don't want someone thinking they need

[GitHub] spark pull request: [SPARK-8390][Streaming][Kafka] fix docs relate...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6863#issuecomment-113663382 Title should be right now On Jun 19, 2015 4:26 PM, Tathagata Das notificati...@github.com wrote: Could you update title to get the ordering right

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8389] Example of gett...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6846#issuecomment-113685902 My original pr was against master On Fri, Jun 19, 2015 at 7:27 PM, Tathagata Das notificati...@github.com wrote: Was this committed to branch-1.4

[GitHub] spark pull request: [SPARK-8127][Streaming][Kafka] KafkaRDD optimi...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113690742 fixed title --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8389] Example of gett...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6846#issuecomment-113526956 Thanks. Not sure if the python side of things is going to continue on SPARK-8389 or on SPARK-8337, but I think this takes care of the java side for now

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113562935 @tdas is there anything else you feel needs to be done on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-19 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32859841 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -399,7 +418,7 @@ object KafkaUtils { val kc

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-19 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32859740 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -158,15 +158,30 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-8320] [Streaming] Add example in stream...

2015-06-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6862#issuecomment-113184801 I'm not a python programmer, but isn't the direct translation of that kafkaStreams = map(lambda _:KafkaUtils.createStream(...), range(0, numStreams

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-06-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-113335277 As far as I know, its still an issue - by default, any checkpoint that relies on hdfs config (e.g. s3 password) won't recover On Jun 18, 2015 6:55 PM

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8390] fix docs relate...

2015-06-17 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6863#issuecomment-112978856 I don't think a doc change caused a python test failure. On Jun 17, 2015 5:27 PM, UCB AMPLab notificati...@github.com wrote: Merged build finished. Test

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8390] fix docs relate...

2015-06-17 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/6863 [Streaming][Kafka][SPARK-8390] fix docs related to HasOffsetRanges You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8390] fix docs relate...

2015-06-17 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6863#issuecomment-113022865 Yeah, the spacing in that document in general is a mess (mix of tabs and spaces, some 2 spaces between sentences, etc). I cleaned it up somewhat. Also further fixed

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8389] Example of gett...

2015-06-17 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6846#discussion_r32620989 --- Diff: external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaDirectKafkaStreamSuite.java --- @@ -89,6 +90,16 @@ public void testKafkaStream

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8389] Example of gett...

2015-06-16 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/6846 [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out o… …f the existing java direct stream api You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r31868393 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -60,6 +62,49 @@ class KafkaRDD[ }.toArray

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-06-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-109378718 I sort of doubt that wait timeout was related to the merge, the only conflict was the single line of MimaExcludes --- If your project is set up for it, you can reply

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-06-04 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-108903935 The only thing I can think of mima complaining about is that this patch is removing a method... even though it's a method in a private class that is only used

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-06-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-108709503 Another ping on this, even if it misses 1.4 Seeing waitUntilLeaderOffset all over the place in test code I'm working on right now made me sad :( --- If your

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-05-13 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-101802768 @tdas just pinging on this to make sure it doesn't get lost in the shuffle, lmk if there's more explanation needed. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-05-10 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-100741752 Is there anything you need to do that couldn't be accomplished by reading from / writing to ZK yourself? Is this just a question of convenient api

[GitHub] spark pull request: [SPARK-7385][Core] Add RDD.foreachPartitionWit...

2015-05-08 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5927#issuecomment-100237401 I think if you're going to decide you really don't like withContext/withIndex etc they should be marked as deprecated, in addition to having a scaladoc reference

[GitHub] spark pull request: [SPARK-7396][Streaming][Example] Update KafkaW...

2015-05-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5936#issuecomment-99482055 FWIW I ran the new version, and it works as well, I'm just concerned about how that exception was caused. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-7396][Streaming][Example] Update KafkaW...

2015-05-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5936#issuecomment-99478975 Not that there's necessarily anything wrong with updating the producer api being used... .. but I don't see how that exception has anything to do with the version

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-05-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-99081192 I just want to re-iterate from the jira discussion that I am thumbs-down to the idea of overloading the meaning of group.id, and thumbs-down to adding options

[GitHub] spark pull request: [SPARK-7385][Core] Add RDD.foreachPartitionWit...

2015-05-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5927#issuecomment-99271021 @tdas yeah, Kafka transactional output was why I originally wanted to add it. Although that usage of taskcontext shown above is better than my

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-05-05 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/5921 [Streaming][Kafka] cleanup tests from SPARK-2808 see if requiring producer acks eliminates the need for waitUntilLeaderOffset calls in tests You can merge this pull request into a Git repository

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-05-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-99229134 Pretty sure this error [error] oro#oro;2.0.8!oro.jar origin location must be absolute: isn't related to the commit --- If your project is set up

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-05-01 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-98228003 At any rate, I tested it out locally against 0.8.1.1 server install, with the IdempotentExample job from my blogpost. Seemed to work ok. --- If your project is set

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-05-01 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-98212160 scalastyle passes locally for me... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-05-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4537#discussion_r29523447 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -53,14 +53,16 @@ class KafkaRDDSuite extends FunSuite

[GitHub] spark pull request: [SPARK-7255] [Streaming] [Documentation] Added...

2015-04-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5808#issuecomment-97907134 Yknow, I just looked at the code for other reasons, and it is the number of retries, not tries (so the default setting of 1 means a max of 2 consecutive attempts

[GitHub] spark pull request: [SPARK-7255] [Streaming] [Documentation] Added...

2015-04-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5808#issuecomment-97883079 Somewhat nitpicky, but I'd say something like maximum number of consecutive trials, just to make it clear it's not a limit for the lifetime of the job. Again

[GitHub] spark pull request: [SPARK-7255] [Streaming] [Documentation] Added...

2015-04-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5808#issuecomment-97930903 I think the code, the name of the conf, and the default setting are all correct. 1 retry means try initially, retry once, then give up, for a total of 2

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-97547514 At this point, it looks like the java and scala tests are passing, but the python tests are timing out. Console output looks like

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2015-04-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/2994#issuecomment-97578847 I have a branch of the directStream api that caches consumers. It had no noticeable impact on processing time. Even at 100 partitions / 200ms batches

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-97580085 No, it works fine locally: Running PySpark tests. Output is in python/unit-tests.log. Testing with Python version: Python 2.6.6 Run

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-97453300 I can do some basic testing against a non-embedded 0.8.1 kafka install. We're not likely to upgrade for our production jobs, so if some of the people who

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4537#discussion_r29341062 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala --- @@ -220,12 +220,22 @@ class KafkaCluster(val kafkaParams

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4537#discussion_r29342107 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -53,14 +53,16 @@ class KafkaRDDSuite extends FunSuite

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-28 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-97085929 @zzcclp the 1.4.0 deadline is Friday, I believe. I fixed the merge conflicts and resolved the MiMa issue (as far as I know), it still passes local tests for me

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-28 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-97123137 @srowen I imagine things are pretty busy, but can you verify this to test on jenkins? @luisobo you could build and test it out --- If your project is set up

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-22 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-95180356 Glad to hear it works for you. Fixing the default arguments for mima was straightforward, but there's a lot that has changed in the test code. Work's been busy

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-15 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-93478250 That maybe makes a certain amount of sense... I'll try replacing the default arguments with multiple overloaded methods, see if that passes mima. --- If your project

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-15 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-93472230 As far as I can tell from jenkins output, it failed during MiMa checks. But if I run sbt '; project streaming-kafka; mima-report-binary-issues

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-89301901 @zzcclp if you want to help try and figure out how reproduce this test failure outside of Jenkins, go for it. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-03-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-82923505 I think its mostly a question of whether committers are comfortable with a PR that changes all of the uses of new Configuration. At this point it'd probably

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-03-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-83021357 Those tests have passed locally for me 3 times in a row... if I get time later I'll try to dig in --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26120488 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -118,6 +123,7 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-10 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-78053440 I am now confused about what the purpose of this PR is. The jira seemed to indicate that the problem was several third-party offset monitoring tools fail to monitor

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26120607 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -84,6 +83,11 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26150162 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -118,6 +123,7 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-77882744 As it stands now, no offsets are stored by spark unless you're checkpointing. Does it really make sense to have an option to automatically store offsets

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26048829 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -84,6 +83,11 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26048624 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -118,6 +123,7 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26012160 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25991083 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990683 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990950 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990568 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -82,8 +83,12 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990850 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/SparkKafkaUtils.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990774 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala --- @@ -239,21 +239,7 @@ class ReliableKafkaReceiver

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990749 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990695 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-02-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-75187343 Spark 1.3.0 was already supposed to be frozen a while back as far as I know. My personal gut feeling is that it'd be better to wait for kafka 0.8.2.1

[GitHub] spark pull request: [SPARK-5731][Streaming][Test] Fix incorrect te...

2015-02-13 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4597#issuecomment-74341376 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-02-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-74199642 I believe that it will not be merged into 1.3 On Feb 12, 2015 8:13 PM, zzcclp notificati...@github.com wrote: Will this RP be merged into 1.3.0

[GitHub] spark pull request: #SPARK-2808 update kafka to version 0.8.2

2015-02-11 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3631#issuecomment-73959340 @helena I updated it, pr is at https://github.com/apache/spark/pull/4537 --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-02-11 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/4537 [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2 i don't think this should be merged until after 1.3.0 is final You can merge this pull request into a Git repository by running: $ git

[GitHub] spark pull request: #SPARK-2808 update kafka to version 0.8.2

2015-02-10 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3631#issuecomment-73728226 This will need some changes to KafkaCluster and possibly other things related to the new api... let me know if you want a hand. --- If your project is set up

[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

2015-02-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4511#discussion_r24472248 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -40,43 +41,70 @@ class KafkaRDDSuite extends

[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

2015-02-10 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/4511 [SPARK-4964] [Streaming] refactor createRDD to take leaders via map instead of array You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

2015-02-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4511#discussion_r24459792 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -154,6 +154,19 @@ object KafkaUtils { jssc.ssc

[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

2015-02-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4511#discussion_r24459700 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -211,12 +220,17 @@ object KafkaUtils { sc

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24268035 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4384#issuecomment-73269084 Thanks for adding the java friendly kafka uitls methods. Your original reason for wanting Array[Leader] rather than Map[TopicAndPartition, Broker] was for java

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24252680 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24252952 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24253794 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -19,16 +19,35 @@ package

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24251761 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -36,14 +36,12 @@ import kafka.utils.VerifiableProperties

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-04 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72893647 Here's a solution for subclassing ConsumerConfig while still silencing the warning. My son is doing ok(ish) now, thanks for the concern. --- If your project

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-04 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24142354 --- Diff: examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72780349 High level consumers connect to ZK. Simple consumers (which is what this is using) connect to brokers directly instead. See https://cwiki.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72779615 Yeah, there's a weird distinction in Kafka between simple consumers and high level consumers in that they have a lot of common configuration parameters, but one

<    1   2   3   4   5   6   7   >