That's good news I've got a PR in to up the SDK version to 1.10.40 and the KCL to 1.6.1 which I'm running tests on locally now.
Is the AWS SDK not used for reading/writing from S3 or do we get that for free from the Hadoop dependencies? On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath <nick.pentre...@gmail.com> wrote: > cc'ing dev list > > Ok, looks like when the KCL version was updated in > https://github.com/apache/spark/pull/8957, the AWS SDK version was not, > probably leading to dependency conflict, though as Burak mentions its hard > to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally > and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up > in driver or worker logs, so any exception is getting swallowed somewhere. > > Run starting. Expected test count is: 4 > KinesisStreamSuite: > Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating > Kinesis streams for tests. > - KinesisUtils API > - RDD generation > - basic operation *** FAILED *** > The code passed to eventually never returned normally. Attempted 13 > times over 2.047777 minutes. Last failure message: Set() did not equal > Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4) > Data received does not match data sent. (KinesisStreamSuite.scala:188) > - failure recovery *** FAILED *** > The code passed to eventually never returned normally. Attempted 63 > times over 2.0286383166666666 minutes. Last failure message: > isCheckpointPresent was true, but 0 was not greater than 10. > (KinesisStreamSuite.scala:228) > Run completed in 5 minutes, 0 seconds. > Total number of tests run: 4 > Suites: completed 1, aborted 0 > Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0 > *** 2 TESTS FAILED *** > [INFO] > ------------------------------------------------------------------------ > [INFO] BUILD FAILURE > [INFO] > ------------------------------------------------------------------------ > > > KCL 1.3.0 depends on *1.9.37* SDK ( > https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26) > while the Spark Kinesis dependency was kept at *1.9.16.* > > I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS SDK > 1.9.37 and everything works. > > Run starting. Expected test count is: 28 > KinesisBackedBlockRDDSuite: > Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating > Kinesis streams for tests. > - Basic reading from Kinesis > - Read data available in both block manager and Kinesis > - Read data available only in block manager, not in Kinesis > - Read data available only in Kinesis, not in block manager > - Read data available partially in block manager, rest in Kinesis > - Test isBlockValid skips block fetching from block manager > - Test whether RDD is valid after removing blocks from block anager > KinesisStreamSuite: > - KinesisUtils API > - RDD generation > - basic operation > - failure recovery > KinesisReceiverSuite: > - check serializability of SerializableAWSCredentials > - process records including store and checkpoint > - shouldn't store and checkpoint when receiver is stopped > - shouldn't checkpoint when exception occurs during store > - should set checkpoint time to currentTime + checkpoint interval upon > instantiation > - should checkpoint if we have exceeded the checkpoint interval > - shouldn't checkpoint if we have not exceeded the checkpoint interval > - should add to time when advancing checkpoint > - shutdown should checkpoint if the reason is TERMINATE > - shutdown should not checkpoint if the reason is something other than > TERMINATE > - retry success on first attempt > - retry success on second attempt after a Kinesis throttling exception > - retry success on second attempt after a Kinesis dependency exception > - retry failed after a shutdown exception > - retry failed after an invalid state exception > - retry failed after unexpected exception > - retry failed after exhausing all retries > Run completed in 3 minutes, 28 seconds. > Total number of tests run: 28 > Suites: completed 4, aborted 0 > Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0 > All tests passed. > > So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can you > file a JIRA for this? > > @dev-list, since KCL brings in AWS SDK dependencies itself, is it > necessary to declare an explicit dependency on aws-java-sdk in the Kinesis > POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS > SDKs are brought in, making things a bit leaner (this can be upgraded in > Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with > removing the explicit dependency and only depending on KCL. Is aws-java-sdk > used anywhere else (AFAIK it is not, but in case I missed something let me > know any good reason to keep the explicit dependency)? > > N > > > > On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath <nick.pentre...@gmail.com> > wrote: > >> Yeah also the integration tests need to be specifically run - I would >> have thought the contributor would have run those tests and also tested the >> change themselves using live Kinesis :( >> >> — >> Sent from Mailbox <https://www.dropbox.com/mailbox> >> >> >> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz <brk...@gmail.com> wrote: >> >>> I don't think the Kinesis tests specifically ran when that was merged >>> into 1.5.2 :( >>> https://github.com/apache/spark/pull/8957 >>> >>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3 >>> >>> AFAIK pom changes don't trigger the Kinesis tests. >>> >>> Burak >>> >>> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath < >>> nick.pentre...@gmail.com> wrote: >>> >>>> Yup also works for me on master branch as I've been testing DynamoDB >>>> Streams integration. In fact works with latest KCL 1.6.1 also which I was >>>> using. >>>> >>>> So theKCL version does seem like it could be the issue - somewhere >>>> along the line an exception must be getting swallowed. Though the tests >>>> should have picked this up? Will dig deeper. >>>> >>>> — >>>> Sent from Mailbox <https://www.dropbox.com/mailbox> >>>> >>>> >>>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London <brianmlon...@gmail.com> >>>> wrote: >>>> >>>>> Yes, it worked in the 1.6 branch as of commit >>>>> db5165246f2888537dd0f3d4c5a515875c7358ed. That makes it much less >>>>> serious of an issue, although it would be nice to know what the root cause >>>>> is to avoid a regression. >>>>> >>>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz <brk...@gmail.com> wrote: >>>>> >>>>>> I've noticed this happening when there was some dependency conflicts, >>>>>> and it is super hard to debug. >>>>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is >>>>>> 1.3.0, but it is 1.2.1 in Spark 1.5.1. >>>>>> I feel like that seems to be the problem... >>>>>> >>>>>> Brian, did you verify that it works with the 1.6.0 branch? >>>>>> >>>>>> Thanks, >>>>>> Burak >>>>>> >>>>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London < >>>>>> brianmlon...@gmail.com> wrote: >>>>>> >>>>>>> Nick's symptoms sound identical to mine. I should mention that I >>>>>>> just pulled the latest version from github and it seems to be working >>>>>>> there. To reproduce: >>>>>>> >>>>>>> >>>>>>> 1. Download spark 1.5.2 from >>>>>>> http://spark.apache.org/downloads.html >>>>>>> 2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 >>>>>>> -DskipTests clean package >>>>>>> 3. build/mvn -Pkinesis-asl -DskipTests clean package >>>>>>> 4. Then run simultaneously: >>>>>>> 1. bin/run-example streaming.KinesisWordCountASL [Kinesis app >>>>>>> name] [Kinesis stream name] [endpoint URL] >>>>>>> 2. bin/run-example streaming.KinesisWordProducerASL >>>>>>> [Kinesis stream name] [endpoint URL] 100 10 >>>>>>> >>>>>>> >>>>>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré < >>>>>>> j...@nanthrax.net> wrote: >>>>>>> >>>>>>>> Hi Nick, >>>>>>>> >>>>>>>> Just to be sure: don't you see some ClassCastException in the log ? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Regards >>>>>>>> JB >>>>>>>> >>>>>>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote: >>>>>>>> > Could you provide an example / test case and more detail on what >>>>>>>> issue >>>>>>>> > you're facing? >>>>>>>> > >>>>>>>> > I've just tested a simple program reading from a dev Kinesis >>>>>>>> stream and >>>>>>>> > using stream.print() to show the records, and it works under >>>>>>>> 1.5.1 but >>>>>>>> > doesn't appear to be working under 1.5.2. >>>>>>>> > >>>>>>>> > UI for 1.5.2: >>>>>>>> > >>>>>>>> > Inline image 1 >>>>>>>> > >>>>>>>> > UI for 1.5.1: >>>>>>>> > >>>>>>>> > Inline image 2 >>>>>>>> > >>>>>>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London < >>>>>>>> brianmlon...@gmail.com >>>>>>>> > <mailto:brianmlon...@gmail.com>> wrote: >>>>>>>> > >>>>>>>> > Has anyone managed to run the Kinesis demo in Spark 1.5.2? >>>>>>>> The >>>>>>>> > Kinesis ASL that ships with 1.5.2 appears to not work for me >>>>>>>> > although 1.5.1 is fine. I spent some time with Amazon earlier >>>>>>>> in the >>>>>>>> > week and the only thing we could do to make it work is to >>>>>>>> change the >>>>>>>> > version to 1.5.1. Can someone please attempt to reproduce >>>>>>>> before I >>>>>>>> > open a JIRA issue for it? >>>>>>>> > >>>>>>>> > >>>>>>>> >>>>>>>> -- >>>>>>>> Jean-Baptiste Onofré >>>>>>>> jbono...@apache.org >>>>>>>> http://blog.nanthrax.net >>>>>>>> Talend - http://www.talend.com >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>>> >>>>>>>> >>>>>> >>>> >>> >> >