[jira] [Created] (SPARK-30570) Update scalafmt to 1.0.3 with onlyChangedFiles feature

2020-01-19 Thread Cody Koeninger (Jira)
Cody Koeninger created SPARK-30570: -- Summary: Update scalafmt to 1.0.3 with onlyChangedFiles feature Key: SPARK-30570 URL: https://issues.apache.org/jira/browse/SPARK-30570 Project: Spark

Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Cody Koeninger
On Thu, Oct 31, 2019 at 4:30 PM Sean Owen wrote: > > . But it'd be cooler to call these major > releases! Maybe this is just semantics, but my point is the Scala project already does call 2.12 to 2.13 a major release e.g. from https://www.scala-lang.org/download/ "Note that different *major*

Re: Packages to release in 3.0.0-preview

2019-10-31 Thread Cody Koeninger
On Wed, Oct 30, 2019 at 5:57 PM Sean Owen wrote: > Or, frankly, maybe Scala should reconsider the mutual incompatibility > between minor releases. These are basically major releases, and > indeed, it causes exactly this kind of headache. > Not saying binary incompatibility is fun, but 2.12 to

Re: Structured streaming from Kafka by timestamp

2019-02-05 Thread Cody Koeninger
To be more explicit, the easiest thing to do in the short term is use your own instance of KafkaConsumer to get the offsets for the timestamps you're interested in, using offsetsForTimes, and use those for the start / end offsets. See

Re: Back pressure not working on streaming

2019-02-05 Thread Cody Koeninger
That article is pretty old, If you click through the link to the jira mentioned in it, https://issues.apache.org/jira/browse/SPARK-18580 , it's been resolved. On Wed, Jan 2, 2019 at 12:42 AM JF Chen wrote: > > yes, 10 is a very low value for testing initial rate. > And from this article >

Re: Structured streaming from Kafka by timestamp

2019-02-05 Thread Cody Koeninger
To be more explicit, the easiest thing to do in the short term is use your own instance of KafkaConsumer to get the offsets for the timestamps you're interested in, using offsetsForTimes, and use those for the start / end offsets. See

Re: Ask for reviewing on Structured Streaming PRs

2019-01-14 Thread Cody Koeninger
I feel like I've already said my piece on https://github.com/apache/spark/pull/22138 let me know if you have more questions. As for SS in general, I don't have a production SS deployment, so I'm less comfortable with reviewing large changes to it. But if no other committers are working on it...

Re: Automated formatting

2018-11-26 Thread Cody Koeninger
22, 2018 at 7:32 PM Matei Zaharia wrote: > > Can we start by just recommending to contributors that they do this manually? > Then if it seems to work fine, we can try to automate it. > > > On Nov 22, 2018, at 4:40 PM, Cody Koeninger wrote: > > > > I believe scalaf

[jira] [Created] (SPARK-26177) Automated formatting for Scala code

2018-11-26 Thread Cody Koeninger (JIRA)
Cody Koeninger created SPARK-26177: -- Summary: Automated formatting for Scala code Key: SPARK-26177 URL: https://issues.apache.org/jira/browse/SPARK-26177 Project: Spark Issue Type

[jira] [Resolved] (SPARK-26121) [Structured Streaming] Allow users to define prefix of Kafka's consumer group (group.id)

2018-11-26 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-26121. Resolution: Fixed Assignee: Anastasios Zouzias Fix Version/s: 3.0.0

Re: Automated formatting

2018-11-22 Thread Cody Koeninger
hu, Nov 22, 2018 at 9:11 AM Cody Koeninger wrote: >> >> Plugin invocation is ./build/mvn mvn-scalafmt_2.12:format >> >> It takes about 5 seconds, and errors out on the first different file >> that doesn't match formatting. >> >> I made a shell

Re: Automated formatting

2018-11-22 Thread Cody Koeninger
worth a shot. What's the invocation that Shane > could add (after this change goes in) > On Wed, Nov 21, 2018 at 3:27 PM Cody Koeninger wrote: > > > > There's a mvn plugin (sbt as well, but it requires sbt 1.0+) so it > > should be runnable from the PR builder > > > >

Re: Automated formatting

2018-11-21 Thread Cody Koeninger
trokes but not in the details. > Is this something that can be just run in the PR builder? if the rules > are simple and not too hard to maintain, seems like a win. > On Wed, Nov 21, 2018 at 2:26 PM Cody Koeninger wrote: > > > > Definitely not suggesting a mass reformat, just on a per

Re: Automated formatting

2018-11-21 Thread Cody Koeninger
gt; > Is there a way to just check style on PR changes? that's fine. > On Wed, Nov 21, 2018 at 11:40 AM Cody Koeninger wrote: > > > > Is there any appetite for revisiting automating formatting? > > > > I know over the years various people have expressed opposition to

Automated formatting

2018-11-21 Thread Cody Koeninger
Is there any appetite for revisiting automating formatting? I know over the years various people have expressed opposition to it as unnecessary churn in diffs, but having every new contributor greeted with "nit: 4 space indentation for argument lists" isn't very welcoming.

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-19 Thread Cody Koeninger
Anastasios it looks like you already identified the two lines that need to change, the string interpolation that depends on UUID.randomUUID and metadataPath.hashCode. I'd factor that out into a function that returns the group id. That function would also need to take the "parameters" variable

Re: DataSourceV2 sync tomorrow

2018-11-13 Thread Cody Koeninger
Am I the only one for whom the livestream link didn't work last time? Would like to be able to at least watch the discussion this time around. On Tue, Nov 13, 2018 at 6:01 PM Ryan Blue wrote: > > Hi everyone, > I just wanted to send out a reminder that there’s a DSv2 sync tomorrow at > 17:00

[jira] [Commented] (SPARK-25983) spark-sql-kafka-0-10 no longer works with Kafka 0.10.0

2018-11-10 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682464#comment-16682464 ] Cody Koeninger commented on SPARK-25983: Looks like we need an equivalent warning in http

[jira] [Commented] (SPARK-25983) spark-sql-kafka-0-10 no longer works with Kafka 0.10.0

2018-11-10 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682461#comment-16682461 ] Cody Koeninger commented on SPARK-25983: Documentation already explicitly says not to link

Re: [Structured Streaming] Kafka group.id is fixed

2018-11-09 Thread Cody Koeninger
That sounds reasonable to me On Fri, Nov 9, 2018 at 2:26 AM Anastasios Zouzias wrote: > > Hi all, > > I run in the following situation with Spark Structure Streaming (SS) using > Kafka. > > In a project that I work on, there is already a secured Kafka setup where ops > can issue an SSL

Re: Nightly Builds in the docs (in spark-nightly/spark-master-bin/latest? Can't seem to find it)

2018-08-31 Thread Cody Koeninger
Just got a question about this on the user list as well. Worth removing that link to pwendell's directory from the docs? On Sun, Jan 21, 2018 at 12:13 PM, Jacek Laskowski wrote: > Hi, > > http://spark.apache.org/developer-tools.html#nightly-builds reads: > >> Spark nightly packages are

Re: [discuss] replacing SPIP template with Heilmeier's Catechism?

2018-08-31 Thread Cody Koeninger
+1 to Sean's comment On Fri, Aug 31, 2018 at 2:48 PM, Reynold Xin wrote: > Yup all good points. One way I've done it in the past is to have an appendix > section for design sketch, as an expansion to the question "- What is new in > your approach and why do you think it will be successful?" > >

Re: ConcurrentModificationExceptions with CachedKafkaConsumers

2018-08-31 Thread Cody Koeninger
id, as well > as simply moving to use only a single one of our clusters. Neither of these > were successful. I am not able to run a test against master now. > > Regards, > > Bryan Jeffrey > > > > > On Thu, Aug 30, 2018 at 2:56 PM Cody Koeninger wrote: >&g

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-30 Thread Cody Koeninger
Ortiz Fernández wrote: > I can't... do you think that it's a possible bug of this version?? from > Spark or Kafka? > > El mié., 29 ago. 2018 a las 22:28, Cody Koeninger () > escribió: >> >> Are you able to try a recent version of spark? >> >> On Wed, Aug 29, 201

Re: ConcurrentModificationExceptions with CachedKafkaConsumers

2018-08-30 Thread Cody Koeninger
I doubt that fix will get backported to 2.3.x Are you able to test against master? 2.4 with the fix you linked to is likely to hit code freeze soon. >From a quick look at your code, I'm not sure why you're mapping over an array of brokers. It seems like that would result in different streams

[jira] [Resolved] (SPARK-25233) Give the user the option of specifying a fixed minimum message per partition per batch when using kafka direct API with backpressure

2018-08-30 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-25233. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 3 [https

[jira] [Assigned] (SPARK-25233) Give the user the option of specifying a fixed minimum message per partition per batch when using kafka direct API with backpressure

2018-08-30 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-25233: -- Assignee: Reza Safi > Give the user the option of specifying a fixed minimum mess

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Cody Koeninger
Are you able to try a recent version of spark? On Wed, Aug 29, 2018 at 2:10 AM, Guillermo Ortiz Fernández wrote: > I'm using Spark Streaming 2.0.1 with Kafka 0.10, sometimes I get this > exception and Spark dies. > > I couldn't see any error or problem among the machines, anybody has the >

[jira] [Commented] (SPARK-24987) Kafka Cached Consumer Leaking File Descriptors

2018-08-05 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569491#comment-16569491 ] Cody Koeninger commented on SPARK-24987: It was merged to branch-2.3 https://github.com/apache

Re: Viewing transactional markers in client

2018-08-04 Thread Cody Koeninger
Here's one reason I might want to be able to tell whether a given offset is a transactional marker: https://issues.apache.org/jira/browse/SPARK-24720 Alternatively, is there any efficient way to tell what the offset of the last actual record in a topicpartition is (i.e. like endOffsets) On Thu,

[jira] [Commented] (SPARK-25026) Binary releases don't contain Kafka integration modules

2018-08-04 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569283#comment-16569283 ] Cody Koeninger commented on SPARK-25026: I don't think distributions have ever included

[jira] [Resolved] (SPARK-24987) Kafka Cached Consumer Leaking File Descriptors

2018-08-04 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-24987. Resolution: Fixed Fix Version/s: 2.4.0 > Kafka Cached Consumer Leaking F

[jira] [Assigned] (SPARK-24987) Kafka Cached Consumer Leaking File Descriptors

2018-08-04 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-24987: -- Assignee: Yuval Itzchakov > Kafka Cached Consumer Leaking File Descript

Re: Migrating from kafka08 client to kafka010

2018-08-02 Thread Cody Koeninger
Short answer is it isn't necessary. Long answer is that you aren't just changing from 08 to 10, you're changing from the receiver based implementation to the direct stream. Read these: https://github.com/koeninger/kafka-exactly-once

Re: [DISCUSS] SPIP: Standardize SQL logical plans

2018-07-17 Thread Cody Koeninger
According to http://spark.apache.org/improvement-proposals.html the shepherd should be a PMC member, not necessarily the person who proposed the SPIP On Tue, Jul 17, 2018 at 9:13 AM, Wenchen Fan wrote: > I don't know an official answer, but conventionally people who propose the > SPIP would

[jira] [Resolved] (SPARK-24713) AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed

2018-07-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-24713. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21690 [https

[jira] [Assigned] (SPARK-24713) AppMatser of spark streaming kafka OOM if there are hundreds of topics consumed

2018-07-13 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-24713: -- Assignee: Yuanbo Liu > AppMatser of spark streaming kafka OOM if there are hundr

[jira] [Commented] (SPARK-19680) Offsets out of range with no configured reset policy for partitions

2018-07-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540138#comment-16540138 ] Cody Koeninger commented on SPARK-19680: A new consumer group is the easiest thing to do

[jira] [Commented] (SPARK-24720) kafka transaction creates Non-consecutive Offsets (due to transaction offset) making streaming fail when failOnDataLoss=true

2018-07-06 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534797#comment-16534797 ] Cody Koeninger commented on SPARK-24720: What's your plan to tell the difference between gaps

[jira] [Resolved] (SPARK-24743) Update the JavaDirectKafkaWordCount example to support the new API of Kafka

2018-07-05 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-24743. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21717 [https

[jira] [Commented] (SPARK-18258) Sinks need access to offset representation

2018-06-27 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16525137#comment-16525137 ] Cody Koeninger commented on SPARK-18258: [~Yohan123] This ticket is about giving implementors

[jira] [Commented] (SPARK-24507) Description in "Level of Parallelism in Data Receiving" section of Spark Streaming Programming Guide in is not relevan for the recent Kafka direct apprach

2018-06-20 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518287#comment-16518287 ] Cody Koeninger commented on SPARK-24507: You're welcome to submit a doc PR that clarifies

Re: Kafka Offset Storage: Fetching Offsets

2018-06-14 Thread Cody Koeninger
at 3:33 PM, Bryan Jeffrey wrote: > Cody, > > Where is that called in the driver? The only call I see from Subscribe is to > load the offset from checkpoint. > > Get Outlook for Android > > > From: Cody Koeninger > Sent: Thursday, June 14,

Re: Kafka Offset Storage: Fetching Offsets

2018-06-14 Thread Cody Koeninger
m checkpoint. > > Thank you! > > Bryan > > Get Outlook for Android > > ____ > From: Cody Koeninger > Sent: Thursday, June 14, 2018 4:00:31 PM > To: Bryan Jeffrey > Cc: user > Subject: Re: Kafka Offset Storage: Fetching Off

Re: Kafka Offset Storage: Fetching Offsets

2018-06-14 Thread Cody Koeninger
The expectation is that you shouldn't have to manually load offsets from kafka, because the underlying kafka consumer on the driver will start at the offsets associated with the given group id. That's the behavior I see with this example:

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-05-31 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496633#comment-16496633 ] Cody Koeninger commented on SPARK-18057: I'd just modify KafkaTestUtils to match the way things

[jira] [Commented] (SPARK-24067) Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))

2018-05-14 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474549#comment-16474549 ] Cody Koeninger commented on SPARK-24067: [~zsxwing] even in situations where users weren't

[jira] [Resolved] (SPARK-24067) Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))

2018-05-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-24067. Resolution: Fixed Fix Version/s: 2.3.1 Issue resolved by pull request 21300 [https

Re: Time for 2.3.1?

2018-05-11 Thread Cody Koeninger
Sounds good, I'd like to add SPARK-24067 today assuming there's no objections On Thu, May 10, 2018 at 1:22 PM, Henry Robinson wrote: > +1, I'd like to get a release out with SPARK-23852 fixed. The Parquet > community are about to release 1.8.3 - the voting period closes

Re: [Structured-Streaming][Beginner] Out of order messages with Spark kafka readstream from a specific partition

2018-05-10 Thread Cody Koeninger
As long as you aren't doing any spark operations that involve a shuffle, the order you see in spark should be the same as the order in the partition. Can you link to a minimal code example that reproduces the issue? On Wed, May 9, 2018 at 7:05 PM, karthikjay wrote: > On the

[jira] [Commented] (SPARK-24067) Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))

2018-04-25 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452603#comment-16452603 ] Cody Koeninger commented on SPARK-24067: The original PR [https://github.com/apache/spark/pull

[jira] [Commented] (SPARK-24067) Backport SPARK-17147 to 2.3 (Spark Streaming Kafka 0.10 Consumer Can't Handle Non-consecutive Offsets (i.e. Log Compaction))

2018-04-25 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452337#comment-16452337 ] Cody Koeninger commented on SPARK-24067: Given the response on the dev list about criteria

Process for backports?

2018-04-24 Thread Cody Koeninger
https://issues.apache.org/jira/browse/SPARK-24067 is asking to backport a change to the 2.3 branch. My questions - In general are there any concerns about what qualifies for backporting? This adds a configuration variable but shouldn't change default behavior. - Is a separate jira + pr

[jira] [Resolved] (SPARK-21168) KafkaRDD should always set kafka clientId.

2018-04-23 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-21168. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 19887 [https

[jira] [Assigned] (SPARK-21168) KafkaRDD should always set kafka clientId.

2018-04-23 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-21168: -- Assignee: liuzhaokun > KafkaRDD should always set kafka clien

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-04-18 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442838#comment-16442838 ] Cody Koeninger commented on SPARK-18057: [~cricket007] here's a branch with spark 2.1.1 / kafka

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-04-17 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441818#comment-16441818 ] Cody Koeninger commented on SPARK-18057: Ok, if you can figure out what version of spark it is I

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-04-17 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441793#comment-16441793 ] Cody Koeninger commented on SPARK-18057: Just adding the extra dependency on 0.11 probably won't

[jira] [Resolved] (SPARK-22968) java.lang.IllegalStateException: No current assignment for partition kssh-2

2018-04-17 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-22968. Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21038 [https

[jira] [Assigned] (SPARK-22968) java.lang.IllegalStateException: No current assignment for partition kssh-2

2018-04-17 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-22968: -- Assignee: Saisai Shao > java.lang.IllegalStateException: No current assignm

Re: Structured streaming: Tried to fetch $offset but the returned record offset was ${record.offset}"

2018-04-17 Thread Cody Koeninger
Is this possibly related to the recent post on https://issues.apache.org/jira/browse/SPARK-18057 ? On Mon, Apr 16, 2018 at 11:57 AM, ARAVIND SETHURATHNAM < asethurath...@homeaway.com.invalid> wrote: > Hi, > > We have several structured streaming jobs (spark version 2.2.0) consuming > from kafka

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-04-17 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441606#comment-16441606 ] Cody Koeninger commented on SPARK-18057: Out of curiosity, was that a compacted topic

[jira] [Commented] (SPARK-19680) Offsets out of range with no configured reset policy for partitions

2018-04-10 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433039#comment-16433039 ] Cody Koeninger commented on SPARK-19680: [~nerdynick]  If you submit a PR to add documentation

Re: Welcome Zhenhua Wang as a Spark committer

2018-04-02 Thread Cody Koeninger
Congrats! On Mon, Apr 2, 2018 at 12:28 AM, Wenchen Fan wrote: > Hi all, > > The Spark PMC recently added Zhenhua Wang as a committer on the project. > Zhenhua is the major contributor of the CBO project, and has been > contributing across several areas of Spark for a while,

[jira] [Commented] (SPARK-23739) Spark structured streaming long running problem

2018-03-26 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16414326#comment-16414326 ] Cody Koeninger commented on SPARK-23739: Ok, the OutOfMemoryError is probably a separate

[jira] [Commented] (SPARK-23739) Spark structured streaming long running problem

2018-03-23 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411503#comment-16411503 ] Cody Koeninger commented on SPARK-23739: I meant the version of the org.apache.kafka kafka

[jira] [Commented] (SPARK-23739) Spark structured streaming long running problem

2018-03-23 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411396#comment-16411396 ] Cody Koeninger commented on SPARK-23739: What version of the org.apache.kafka artifact

[jira] [Resolved] (SPARK-18580) Use spark.streaming.backpressure.initialRate in DirectKafkaInputDStream

2018-03-21 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-18580. Resolution: Fixed Target Version/s: 2.4.0 >

[jira] [Assigned] (SPARK-18580) Use spark.streaming.backpressure.initialRate in DirectKafkaInputDStream

2018-03-21 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-18580: -- Assignee: Oleksandr Konopko > Use spark.streaming.backpressure.initialR

Re: is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1?

2018-03-16 Thread Cody Koeninger
Should be able to use the 0.8 kafka dstreams with a kafka 0.9 broker On Fri, Mar 16, 2018 at 7:52 AM, kant kodali wrote: > Hi All, > > is it possible to use Spark 2.3.0 along with Kafka 0.9.0.1? > > Thanks, > kant

[jira] [Resolved] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2018-03-16 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-18371. Resolution: Fixed Fix Version/s: 2.4.0 > Spark Streaming backpressure

[jira] [Assigned] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2018-03-16 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger reassigned SPARK-18371: -- Assignee: Sebastian Arzt > Spark Streaming backpressure bug - generates a ba

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 1.1.0

2018-03-05 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387245#comment-16387245 ] Cody Koeninger commented on SPARK-18057: It's probably easiest to keep the KIP discussion

[jira] [Commented] (SPARK-19767) API Doc pages for Streaming with Kafka 0.10 not current

2018-03-05 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387217#comment-16387217 ] Cody Koeninger commented on SPARK-19767: [~nafshartous] I think at this point people are more

Re: Welcoming some new committers

2018-03-02 Thread Cody Koeninger
tions to Spark 2.3 and other past work: > > - Anirudh Ramanathan (contributor to Kubernetes support) > - Bryan Cutler (contributor to PySpark and Arrow support) > - Cody Koeninger (contributor to streaming and Kafka support) > - Erik Erlandson (contributor to Kubernetes support) > - M

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2018-02-20 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370924#comment-16370924 ] Cody Koeninger commented on SPARK-18057: Just doing the upgrade is probably a good starting point

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 10.0.1 to 10.2.0

2018-02-20 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370460#comment-16370460 ] Cody Koeninger commented on SPARK-18057: My guess is that DStream based integrations aren't

Re: KafkaUtils.createStream(..) is removed for API

2018-02-19 Thread Cody Koeninger
I can't speak for committers, but my guess is it's more likely for DStreams in general to stop being supported before that particular integration is removed. On Sun, Feb 18, 2018 at 9:34 PM, naresh Goud wrote: > Thanks Ted. > > I see createDirectStream is

Re: KafkaUtils.createStream(..) is removed for API

2018-02-19 Thread Cody Koeninger
I can't speak for committers, but my guess is it's more likely for DStreams in general to stop being supported before that particular integration is removed. On Sun, Feb 18, 2018 at 9:34 PM, naresh Goud wrote: > Thanks Ted. > > I see createDirectStream is

Re: org.apache.kafka.clients.consumer.OffsetOutOfRangeException

2018-02-12 Thread Cody Koeninger
https://issues.apache.org/jira/browse/SPARK-19680 and https://issues.apache.org/jira/browse/KAFKA-3370 has a good explanation. Verify that it works correctly with auto offset set to latest, to rule out other issues. Then try providing explicit starting offsets reasonably near the beginning of

Re: Providing Kafka configuration as Map of Strings

2018-01-24 Thread Cody Koeninger
Have you tried passing in a Map that happens to have string for all the values? I haven't tested this, but the underlying kafka consumer constructor is documented to take either strings or objects as values, despite the static type. On Wed, Jan 24, 2018 at 2:48 PM, Tecno Brain

Re: uncontinuous offset in kafka will cause the spark streamingfailure

2018-01-24 Thread Cody Koeninger
wise.com> > 抄送: user<user@spark.apache.org>; Cody Koeninger<c...@koeninger.org> > 发送时间: 2018年1月24日(周三) 14:45 > 主题: Re: uncontinuous offset in kafka will cause the spark streamingfailure > > Yes. My spark streaming application works with uncompacted topic. I will >

Re: Contiguous Offsets on non-compacted topics

2018-01-24 Thread Cody Koeninger
Can anyone clarify what (other than the known cases of compaction or transactions) could be causing non-contiguous offsets? That sounds like a potential defect, given that I ran billions of messages a day through kafka 0.8.x series for years without seeing that. On Tue, Jan 23, 2018 at 3:35 PM,

Re: "Got wrong record after seeking to offset" issue

2018-01-18 Thread Cody Koeninger
nd I can run to > check compaction I’m happy to give that a shot too. > > I’ll try consuming from the failed offset if/when the problem manifests > itself again. > > Thanks! > Justin > > > On Wednesday, January 17, 2018, Cody Koeninger <c...@koeninger.org> wrote: >

Re: "Got wrong record after seeking to offset" issue

2018-01-17 Thread Cody Koeninger
That means the consumer on the executor tried to seek to the specified offset, but the message that was returned did not have a matching offset. If the executor can't get the messages the driver told it to get, something's generally wrong. What happens when you try to consume the particular

Re: Which kafka client to use with spark streaming

2017-12-26 Thread Cody Koeninger
Do not add a dependency on kafka-clients, the spark-streaming-kafka library has appropriate transitive dependencies. Either version of the spark-streaming-kafka library should work with 1.0 brokers; what problems were you having? On Mon, Dec 25, 2017 at 7:58 PM, Diogo Munaro Vieira

Re: How to properly execute `foreachPartition` in Spark 2.2

2017-12-18 Thread Cody Koeninger
You can't create a network connection to kafka on the driver and then serialize it to send it the executor. That's likely why you're getting serialization errors. Kafka producers are thread safe and designed for use as a singleton. Use a lazy singleton instance of the producer on the executor,

Re: bulk upsert data batch from Kafka dstream into Postgres db

2017-12-14 Thread Cody Koeninger
Modern versions of postgres have upsert, ie insert into ... on conflict ... do update On Thu, Dec 14, 2017 at 11:26 AM, salemi wrote: > Thank you for your respond. > The approach loads just the data into the DB. I am looking for an approach > that allows me to update

Re: bulk upsert data batch from Kafka dstream into Postgres db

2017-12-14 Thread Cody Koeninger
use foreachPartition(), get a connection from a jdbc connection pool, and insert the data the same way you would in a non-spark program. If you're only doing inserts, postgres COPY will be faster (e.g. https://discuss.pivotal.io/hc/en-us/articles/204237003), but if you're doing updates that's not

Re: [Spark streaming] No assigned partition error during seek

2017-12-01 Thread Cody Koeninger
d the spark async commit useful for our needs”, do you > mean to say the code like below? > > kafkaDStream.asInstanceOf[CanCommitOffsets].commitAsync(ranges) > > > > > > Best Regards > > Richard > > > > > > From: venkat <meven...@gma

Re: Kafka version support

2017-11-30 Thread Cody Koeninger
Are you talking about the broker version, or the kafka-clients artifact version? On Thu, Nov 30, 2017 at 12:17 AM, Raghavendra Pandey wrote: > Just wondering if anyone has tried spark structured streaming kafka > connector (2.2) with Kafka 0.11 or Kafka 1.0 version

Re: [Spark streaming] No assigned partition error during seek

2017-11-30 Thread Cody Koeninger
You mentioned 0.11 version; the latest version of org.apache.kafka kafka-clients artifact supported by DStreams is 0.10.0.1, for which it has an appropriate dependency. Are you manually depending on a different version of the kafka-clients artifact? On Fri, Nov 24, 2017 at 7:39 PM, venks61176

[jira] [Resolved] (SPARK-22561) Dynamically update topics list for spark kafka consumer

2017-11-23 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cody Koeninger resolved SPARK-22561. Resolution: Not A Problem > Dynamically update topics list for spark kafka consu

[jira] [Commented] (SPARK-22561) Dynamically update topics list for spark kafka consumer

2017-11-23 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16264643#comment-16264643 ] Cody Koeninger commented on SPARK-22561: See SubscribePattern http://spark.apache.org/docs

Re: Spark Streaming fails with unable to get records after polling for 512 ms

2017-11-17 Thread Cody Koeninger
eam ? > > Thanks > Jagadish > > On Wed, Nov 15, 2017 at 11:23 AM, Cody Koeninger <c...@koeninger.org> wrote: >> >> spark.streaming.kafka.consumer.poll.ms is a spark configuration, not >> a kafka parameter. >> >> see http://spark.apache.org/docs/l

Re: Spark Streaming fails with unable to get records after polling for 512 ms

2017-11-15 Thread Cody Koeninger
spark.streaming.kafka.consumer.poll.ms is a spark configuration, not a kafka parameter. see http://spark.apache.org/docs/latest/configuration.html On Tue, Nov 14, 2017 at 8:56 PM, jkagitala wrote: > Hi, > > I'm trying to add spark-streaming to our kafka topic. But, I keep

[jira] [Commented] (SPARK-22486) Support synchronous offset commits for Kafka

2017-11-12 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16248891#comment-16248891 ] Cody Koeninger commented on SPARK-22486: Can you identify a clear use case for this, given

[jira] [Commented] (SPARK-19680) Offsets out of range with no configured reset policy for partitions

2017-11-07 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16242438#comment-16242438 ] Cody Koeninger commented on SPARK-19680: If you got an OffsetOutOfRangeException after a job had

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-01 Thread Cody Koeninger
Was there any answer to my question around the effect of changes to the sink api regarding access to underlying offsets? On Wed, Nov 1, 2017 at 11:32 AM, Reynold Xin wrote: > Most of those should be answered by the attached design sketch in the JIRA > ticket. > > On Wed, Nov

Re: FW: Kafka Direct Stream - dynamic topic subscription

2017-10-29 Thread Cody Koeninger
As it says in SPARK-10320 and in the docs at http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html#consumerstrategies , you can use SubscribePattern On Sun, Oct 29, 2017 at 3:56 PM, Ramanan, Buvana (Nokia - US/Murray Hill) wrote: > Hello

  1   2   3   4   5   6   7   8   9   10   >