Is that PR against master branch?
S3 read comes from Hadoop / jet3t afaik — Sent from Mailbox On Fri, Dec 11, 2015 at 5:38 PM, Brian London <brianmlon...@gmail.com> wrote: > That's good news I've got a PR in to up the SDK version to 1.10.40 and the > KCL to 1.6.1 which I'm running tests on locally now. > Is the AWS SDK not used for reading/writing from S3 or do we get that for > free from the Hadoop dependencies? > On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath <nick.pentre...@gmail.com> > wrote: >> cc'ing dev list >> >> Ok, looks like when the KCL version was updated in >> https://github.com/apache/spark/pull/8957, the AWS SDK version was not, >> probably leading to dependency conflict, though as Burak mentions its hard >> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally >> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up >> in driver or worker logs, so any exception is getting swallowed somewhere. >> >> Run starting. Expected test count is: 4 >> KinesisStreamSuite: >> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating >> Kinesis streams for tests. >> - KinesisUtils API >> - RDD generation >> - basic operation *** FAILED *** >> The code passed to eventually never returned normally. Attempted 13 >> times over 2.047777 minutes. Last failure message: Set() did not equal >> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4) >> Data received does not match data sent. (KinesisStreamSuite.scala:188) >> - failure recovery *** FAILED *** >> The code passed to eventually never returned normally. Attempted 63 >> times over 2.0286383166666666 minutes. Last failure message: >> isCheckpointPresent was true, but 0 was not greater than 10. >> (KinesisStreamSuite.scala:228) >> Run completed in 5 minutes, 0 seconds. >> Total number of tests run: 4 >> Suites: completed 1, aborted 0 >> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0 >> *** 2 TESTS FAILED *** >> [INFO] >> ------------------------------------------------------------------------ >> [INFO] BUILD FAILURE >> [INFO] >> ------------------------------------------------------------------------ >> >> >> KCL 1.3.0 depends on *1.9.37* SDK ( >> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26) >> while the Spark Kinesis dependency was kept at *1.9.16.* >> >> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS SDK >> 1.9.37 and everything works. >> >> Run starting. Expected test count is: 28 >> KinesisBackedBlockRDDSuite: >> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating >> Kinesis streams for tests. >> - Basic reading from Kinesis >> - Read data available in both block manager and Kinesis >> - Read data available only in block manager, not in Kinesis >> - Read data available only in Kinesis, not in block manager >> - Read data available partially in block manager, rest in Kinesis >> - Test isBlockValid skips block fetching from block manager >> - Test whether RDD is valid after removing blocks from block anager >> KinesisStreamSuite: >> - KinesisUtils API >> - RDD generation >> - basic operation >> - failure recovery >> KinesisReceiverSuite: >> - check serializability of SerializableAWSCredentials >> - process records including store and checkpoint >> - shouldn't store and checkpoint when receiver is stopped >> - shouldn't checkpoint when exception occurs during store >> - should set checkpoint time to currentTime + checkpoint interval upon >> instantiation >> - should checkpoint if we have exceeded the checkpoint interval >> - shouldn't checkpoint if we have not exceeded the checkpoint interval >> - should add to time when advancing checkpoint >> - shutdown should checkpoint if the reason is TERMINATE >> - shutdown should not checkpoint if the reason is something other than >> TERMINATE >> - retry success on first attempt >> - retry success on second attempt after a Kinesis throttling exception >> - retry success on second attempt after a Kinesis dependency exception >> - retry failed after a shutdown exception >> - retry failed after an invalid state exception >> - retry failed after unexpected exception >> - retry failed after exhausing all retries >> Run completed in 3 minutes, 28 seconds. >> Total number of tests run: 28 >> Suites: completed 4, aborted 0 >> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0 >> All tests passed. >> >> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can you >> file a JIRA for this? >> >> @dev-list, since KCL brings in AWS SDK dependencies itself, is it >> necessary to declare an explicit dependency on aws-java-sdk in the Kinesis >> POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS >> SDKs are brought in, making things a bit leaner (this can be upgraded in >> Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with >> removing the explicit dependency and only depending on KCL. Is aws-java-sdk >> used anywhere else (AFAIK it is not, but in case I missed something let me >> know any good reason to keep the explicit dependency)? >> >> N >> >> >> >> On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath <nick.pentre...@gmail.com> >> wrote: >> >>> Yeah also the integration tests need to be specifically run - I would >>> have thought the contributor would have run those tests and also tested the >>> change themselves using live Kinesis :( >>> >>> — >>> Sent from Mailbox <https://www.dropbox.com/mailbox> >>> >>> >>> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz <brk...@gmail.com> wrote: >>> >>>> I don't think the Kinesis tests specifically ran when that was merged >>>> into 1.5.2 :( >>>> https://github.com/apache/spark/pull/8957 >>>> >>>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3 >>>> >>>> AFAIK pom changes don't trigger the Kinesis tests. >>>> >>>> Burak >>>> >>>> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath < >>>> nick.pentre...@gmail.com> wrote: >>>> >>>>> Yup also works for me on master branch as I've been testing DynamoDB >>>>> Streams integration. In fact works with latest KCL 1.6.1 also which I was >>>>> using. >>>>> >>>>> So theKCL version does seem like it could be the issue - somewhere >>>>> along the line an exception must be getting swallowed. Though the tests >>>>> should have picked this up? Will dig deeper. >>>>> >>>>> — >>>>> Sent from Mailbox <https://www.dropbox.com/mailbox> >>>>> >>>>> >>>>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London <brianmlon...@gmail.com> >>>>> wrote: >>>>> >>>>>> Yes, it worked in the 1.6 branch as of commit >>>>>> db5165246f2888537dd0f3d4c5a515875c7358ed. That makes it much less >>>>>> serious of an issue, although it would be nice to know what the root >>>>>> cause >>>>>> is to avoid a regression. >>>>>> >>>>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz <brk...@gmail.com> wrote: >>>>>> >>>>>>> I've noticed this happening when there was some dependency conflicts, >>>>>>> and it is super hard to debug. >>>>>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is >>>>>>> 1.3.0, but it is 1.2.1 in Spark 1.5.1. >>>>>>> I feel like that seems to be the problem... >>>>>>> >>>>>>> Brian, did you verify that it works with the 1.6.0 branch? >>>>>>> >>>>>>> Thanks, >>>>>>> Burak >>>>>>> >>>>>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London < >>>>>>> brianmlon...@gmail.com> wrote: >>>>>>> >>>>>>>> Nick's symptoms sound identical to mine. I should mention that I >>>>>>>> just pulled the latest version from github and it seems to be working >>>>>>>> there. To reproduce: >>>>>>>> >>>>>>>> >>>>>>>> 1. Download spark 1.5.2 from >>>>>>>> http://spark.apache.org/downloads.html >>>>>>>> 2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 >>>>>>>> -DskipTests clean package >>>>>>>> 3. build/mvn -Pkinesis-asl -DskipTests clean package >>>>>>>> 4. Then run simultaneously: >>>>>>>> 1. bin/run-example streaming.KinesisWordCountASL [Kinesis app >>>>>>>> name] [Kinesis stream name] [endpoint URL] >>>>>>>> 2. bin/run-example streaming.KinesisWordProducerASL >>>>>>>> [Kinesis stream name] [endpoint URL] 100 10 >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré < >>>>>>>> j...@nanthrax.net> wrote: >>>>>>>> >>>>>>>>> Hi Nick, >>>>>>>>> >>>>>>>>> Just to be sure: don't you see some ClassCastException in the log ? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Regards >>>>>>>>> JB >>>>>>>>> >>>>>>>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote: >>>>>>>>> > Could you provide an example / test case and more detail on what >>>>>>>>> issue >>>>>>>>> > you're facing? >>>>>>>>> > >>>>>>>>> > I've just tested a simple program reading from a dev Kinesis >>>>>>>>> stream and >>>>>>>>> > using stream.print() to show the records, and it works under >>>>>>>>> 1.5.1 but >>>>>>>>> > doesn't appear to be working under 1.5.2. >>>>>>>>> > >>>>>>>>> > UI for 1.5.2: >>>>>>>>> > >>>>>>>>> > Inline image 1 >>>>>>>>> > >>>>>>>>> > UI for 1.5.1: >>>>>>>>> > >>>>>>>>> > Inline image 2 >>>>>>>>> > >>>>>>>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London < >>>>>>>>> brianmlon...@gmail.com >>>>>>>>> > <mailto:brianmlon...@gmail.com>> wrote: >>>>>>>>> > >>>>>>>>> > Has anyone managed to run the Kinesis demo in Spark 1.5.2? >>>>>>>>> The >>>>>>>>> > Kinesis ASL that ships with 1.5.2 appears to not work for me >>>>>>>>> > although 1.5.1 is fine. I spent some time with Amazon earlier >>>>>>>>> in the >>>>>>>>> > week and the only thing we could do to make it work is to >>>>>>>>> change the >>>>>>>>> > version to 1.5.1. Can someone please attempt to reproduce >>>>>>>>> before I >>>>>>>>> > open a JIRA issue for it? >>>>>>>>> > >>>>>>>>> > >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Jean-Baptiste Onofré >>>>>>>>> jbono...@apache.org >>>>>>>>> http://blog.nanthrax.net >>>>>>>>> Talend - http://www.talend.com >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> >>> >>