Yes, it's against master: https://github.com/apache/spark/pull/10256
I'll push the KCL version bump after my local tests finish. On Fri, Dec 11, 2015 at 10:42 AM Nick Pentreath <nick.pentre...@gmail.com> wrote: > Is that PR against master branch? > > S3 read comes from Hadoop / jet3t afaik > > — > Sent from Mailbox <https://www.dropbox.com/mailbox> > > > On Fri, Dec 11, 2015 at 5:38 PM, Brian London <brianmlon...@gmail.com> > wrote: > >> That's good news I've got a PR in to up the SDK version to 1.10.40 and >> the KCL to 1.6.1 which I'm running tests on locally now. >> >> Is the AWS SDK not used for reading/writing from S3 or do we get that for >> free from the Hadoop dependencies? >> >> On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath <nick.pentre...@gmail.com> >> wrote: >> >>> cc'ing dev list >>> >>> Ok, looks like when the KCL version was updated in >>> https://github.com/apache/spark/pull/8957, the AWS SDK version was not, >>> probably leading to dependency conflict, though as Burak mentions its hard >>> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally >>> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up >>> in driver or worker logs, so any exception is getting swallowed somewhere. >>> >>> Run starting. Expected test count is: 4 >>> KinesisStreamSuite: >>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating >>> Kinesis streams for tests. >>> - KinesisUtils API >>> - RDD generation >>> - basic operation *** FAILED *** >>> The code passed to eventually never returned normally. Attempted 13 >>> times over 2.047777 minutes. Last failure message: Set() did not equal >>> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4) >>> Data received does not match data sent. (KinesisStreamSuite.scala:188) >>> - failure recovery *** FAILED *** >>> The code passed to eventually never returned normally. Attempted 63 >>> times over 2.0286383166666666 minutes. Last failure message: >>> isCheckpointPresent was true, but 0 was not greater than 10. >>> (KinesisStreamSuite.scala:228) >>> Run completed in 5 minutes, 0 seconds. >>> Total number of tests run: 4 >>> Suites: completed 1, aborted 0 >>> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0 >>> *** 2 TESTS FAILED *** >>> [INFO] >>> ------------------------------------------------------------------------ >>> [INFO] BUILD FAILURE >>> [INFO] >>> ------------------------------------------------------------------------ >>> >>> >>> KCL 1.3.0 depends on *1.9.37* SDK ( >>> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26) >>> while the Spark Kinesis dependency was kept at *1.9.16.* >>> >>> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS >>> SDK 1.9.37 and everything works. >>> >>> Run starting. Expected test count is: 28 >>> KinesisBackedBlockRDDSuite: >>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating >>> Kinesis streams for tests. >>> - Basic reading from Kinesis >>> - Read data available in both block manager and Kinesis >>> - Read data available only in block manager, not in Kinesis >>> - Read data available only in Kinesis, not in block manager >>> - Read data available partially in block manager, rest in Kinesis >>> - Test isBlockValid skips block fetching from block manager >>> - Test whether RDD is valid after removing blocks from block anager >>> KinesisStreamSuite: >>> - KinesisUtils API >>> - RDD generation >>> - basic operation >>> - failure recovery >>> KinesisReceiverSuite: >>> - check serializability of SerializableAWSCredentials >>> - process records including store and checkpoint >>> - shouldn't store and checkpoint when receiver is stopped >>> - shouldn't checkpoint when exception occurs during store >>> - should set checkpoint time to currentTime + checkpoint interval upon >>> instantiation >>> - should checkpoint if we have exceeded the checkpoint interval >>> - shouldn't checkpoint if we have not exceeded the checkpoint interval >>> - should add to time when advancing checkpoint >>> - shutdown should checkpoint if the reason is TERMINATE >>> - shutdown should not checkpoint if the reason is something other than >>> TERMINATE >>> - retry success on first attempt >>> - retry success on second attempt after a Kinesis throttling exception >>> - retry success on second attempt after a Kinesis dependency exception >>> - retry failed after a shutdown exception >>> - retry failed after an invalid state exception >>> - retry failed after unexpected exception >>> - retry failed after exhausing all retries >>> Run completed in 3 minutes, 28 seconds. >>> Total number of tests run: 28 >>> Suites: completed 4, aborted 0 >>> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0 >>> All tests passed. >>> >>> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can >>> you file a JIRA for this? >>> >>> @dev-list, since KCL brings in AWS SDK dependencies itself, is it >>> necessary to declare an explicit dependency on aws-java-sdk in the Kinesis >>> POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS >>> SDKs are brought in, making things a bit leaner (this can be upgraded in >>> Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with >>> removing the explicit dependency and only depending on KCL. Is aws-java-sdk >>> used anywhere else (AFAIK it is not, but in case I missed something let me >>> know any good reason to keep the explicit dependency)? >>> >>> N >>> >>> >>> >>> On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath < >>> nick.pentre...@gmail.com> wrote: >>> >>>> Yeah also the integration tests need to be specifically run - I would >>>> have thought the contributor would have run those tests and also tested the >>>> change themselves using live Kinesis :( >>>> >>>> — >>>> Sent from Mailbox <https://www.dropbox.com/mailbox> >>>> >>>> >>>> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz <brk...@gmail.com> wrote: >>>> >>>>> I don't think the Kinesis tests specifically ran when that was merged >>>>> into 1.5.2 :( >>>>> https://github.com/apache/spark/pull/8957 >>>>> >>>>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3 >>>>> >>>>> AFAIK pom changes don't trigger the Kinesis tests. >>>>> >>>>> Burak >>>>> >>>>> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath < >>>>> nick.pentre...@gmail.com> wrote: >>>>> >>>>>> Yup also works for me on master branch as I've been testing DynamoDB >>>>>> Streams integration. In fact works with latest KCL 1.6.1 also which I was >>>>>> using. >>>>>> >>>>>> So theKCL version does seem like it could be the issue - somewhere >>>>>> along the line an exception must be getting swallowed. Though the tests >>>>>> should have picked this up? Will dig deeper. >>>>>> >>>>>> — >>>>>> Sent from Mailbox <https://www.dropbox.com/mailbox> >>>>>> >>>>>> >>>>>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London < >>>>>> brianmlon...@gmail.com> wrote: >>>>>> >>>>>>> Yes, it worked in the 1.6 branch as of commit >>>>>>> db5165246f2888537dd0f3d4c5a515875c7358ed. That makes it much less >>>>>>> serious of an issue, although it would be nice to know what the root >>>>>>> cause >>>>>>> is to avoid a regression. >>>>>>> >>>>>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz <brk...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I've noticed this happening when there was some dependency >>>>>>>> conflicts, and it is super hard to debug. >>>>>>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is >>>>>>>> 1.3.0, but it is 1.2.1 in Spark 1.5.1. >>>>>>>> I feel like that seems to be the problem... >>>>>>>> >>>>>>>> Brian, did you verify that it works with the 1.6.0 branch? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Burak >>>>>>>> >>>>>>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London < >>>>>>>> brianmlon...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Nick's symptoms sound identical to mine. I should mention that I >>>>>>>>> just pulled the latest version from github and it seems to be working >>>>>>>>> there. To reproduce: >>>>>>>>> >>>>>>>>> >>>>>>>>> 1. Download spark 1.5.2 from >>>>>>>>> http://spark.apache.org/downloads.html >>>>>>>>> 2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 >>>>>>>>> -DskipTests clean package >>>>>>>>> 3. build/mvn -Pkinesis-asl -DskipTests clean package >>>>>>>>> 4. Then run simultaneously: >>>>>>>>> 1. bin/run-example streaming.KinesisWordCountASL [Kinesis app >>>>>>>>> name] [Kinesis stream name] [endpoint URL] >>>>>>>>> 2. bin/run-example streaming.KinesisWordProducerASL >>>>>>>>> [Kinesis stream name] [endpoint URL] 100 10 >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré < >>>>>>>>> j...@nanthrax.net> wrote: >>>>>>>>> >>>>>>>>>> Hi Nick, >>>>>>>>>> >>>>>>>>>> Just to be sure: don't you see some ClassCastException in the log >>>>>>>>>> ? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Regards >>>>>>>>>> JB >>>>>>>>>> >>>>>>>>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote: >>>>>>>>>> > Could you provide an example / test case and more detail on >>>>>>>>>> what issue >>>>>>>>>> > you're facing? >>>>>>>>>> > >>>>>>>>>> > I've just tested a simple program reading from a dev Kinesis >>>>>>>>>> stream and >>>>>>>>>> > using stream.print() to show the records, and it works under >>>>>>>>>> 1.5.1 but >>>>>>>>>> > doesn't appear to be working under 1.5.2. >>>>>>>>>> > >>>>>>>>>> > UI for 1.5.2: >>>>>>>>>> > >>>>>>>>>> > Inline image 1 >>>>>>>>>> > >>>>>>>>>> > UI for 1.5.1: >>>>>>>>>> > >>>>>>>>>> > Inline image 2 >>>>>>>>>> > >>>>>>>>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London < >>>>>>>>>> brianmlon...@gmail.com >>>>>>>>>> > <mailto:brianmlon...@gmail.com>> wrote: >>>>>>>>>> > >>>>>>>>>> > Has anyone managed to run the Kinesis demo in Spark 1.5.2? >>>>>>>>>> The >>>>>>>>>> > Kinesis ASL that ships with 1.5.2 appears to not work for me >>>>>>>>>> > although 1.5.1 is fine. I spent some time with Amazon >>>>>>>>>> earlier in the >>>>>>>>>> > week and the only thing we could do to make it work is to >>>>>>>>>> change the >>>>>>>>>> > version to 1.5.1. Can someone please attempt to reproduce >>>>>>>>>> before I >>>>>>>>>> > open a JIRA issue for it? >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Jean-Baptiste Onofré >>>>>>>>>> jbono...@apache.org >>>>>>>>>> http://blog.nanthrax.net >>>>>>>>>> Talend - http://www.talend.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >>> >