Yes, it's against master: https://github.com/apache/spark/pull/10256

I'll push the KCL version bump after my local tests finish.

On Fri, Dec 11, 2015 at 10:42 AM Nick Pentreath <nick.pentre...@gmail.com>
wrote:

> Is that PR against master branch?
>
> S3 read comes from Hadoop / jet3t afaik
>
> —
> Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Fri, Dec 11, 2015 at 5:38 PM, Brian London <brianmlon...@gmail.com>
> wrote:
>
>> That's good news  I've got a PR in to up the SDK version to 1.10.40 and
>> the KCL to 1.6.1 which I'm running tests on locally now.
>>
>> Is the AWS SDK not used for reading/writing from S3 or do we get that for
>> free from the Hadoop dependencies?
>>
>> On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath <nick.pentre...@gmail.com>
>> wrote:
>>
>>> cc'ing dev list
>>>
>>> Ok, looks like when the KCL version was updated in
>>> https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
>>> probably leading to dependency conflict, though as Burak mentions its hard
>>> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
>>> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
>>> in driver or worker logs, so any exception is getting swallowed somewhere.
>>>
>>> Run starting. Expected test count is: 4
>>> KinesisStreamSuite:
>>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>>> Kinesis streams for tests.
>>> - KinesisUtils API
>>> - RDD generation
>>> - basic operation *** FAILED ***
>>>   The code passed to eventually never returned normally. Attempted 13
>>> times over 2.047777 minutes. Last failure message: Set() did not equal
>>> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)
>>>   Data received does not match data sent. (KinesisStreamSuite.scala:188)
>>> - failure recovery *** FAILED ***
>>>   The code passed to eventually never returned normally. Attempted 63
>>> times over 2.0286383166666666 minutes. Last failure message:
>>> isCheckpointPresent was true, but 0 was not greater than 10.
>>> (KinesisStreamSuite.scala:228)
>>> Run completed in 5 minutes, 0 seconds.
>>> Total number of tests run: 4
>>> Suites: completed 1, aborted 0
>>> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
>>> *** 2 TESTS FAILED ***
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] BUILD FAILURE
>>> [INFO]
>>> ------------------------------------------------------------------------
>>>
>>>
>>> KCL 1.3.0 depends on *1.9.37* SDK (
>>> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
>>> while the Spark Kinesis dependency was kept at *1.9.16.*
>>>
>>> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS
>>> SDK 1.9.37 and everything works.
>>>
>>> Run starting. Expected test count is: 28
>>> KinesisBackedBlockRDDSuite:
>>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>>> Kinesis streams for tests.
>>> - Basic reading from Kinesis
>>> - Read data available in both block manager and Kinesis
>>> - Read data available only in block manager, not in Kinesis
>>> - Read data available only in Kinesis, not in block manager
>>> - Read data available partially in block manager, rest in Kinesis
>>> - Test isBlockValid skips block fetching from block manager
>>> - Test whether RDD is valid after removing blocks from block anager
>>> KinesisStreamSuite:
>>> - KinesisUtils API
>>> - RDD generation
>>> - basic operation
>>> - failure recovery
>>> KinesisReceiverSuite:
>>> - check serializability of SerializableAWSCredentials
>>> - process records including store and checkpoint
>>> - shouldn't store and checkpoint when receiver is stopped
>>> - shouldn't checkpoint when exception occurs during store
>>> - should set checkpoint time to currentTime + checkpoint interval upon
>>> instantiation
>>> - should checkpoint if we have exceeded the checkpoint interval
>>> - shouldn't checkpoint if we have not exceeded the checkpoint interval
>>> - should add to time when advancing checkpoint
>>> - shutdown should checkpoint if the reason is TERMINATE
>>> - shutdown should not checkpoint if the reason is something other than
>>> TERMINATE
>>> - retry success on first attempt
>>> - retry success on second attempt after a Kinesis throttling exception
>>> - retry success on second attempt after a Kinesis dependency exception
>>> - retry failed after a shutdown exception
>>> - retry failed after an invalid state exception
>>> - retry failed after unexpected exception
>>> - retry failed after exhausing all retries
>>> Run completed in 3 minutes, 28 seconds.
>>> Total number of tests run: 28
>>> Suites: completed 4, aborted 0
>>> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
>>> All tests passed.
>>>
>>> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can
>>> you file a JIRA for this?
>>>
>>> @dev-list, since KCL brings in AWS SDK dependencies itself, is it
>>> necessary to declare an explicit dependency on aws-java-sdk in the Kinesis
>>> POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS
>>> SDKs are brought in, making things a bit leaner (this can be upgraded in
>>> Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with
>>> removing the explicit dependency and only depending on KCL. Is aws-java-sdk
>>> used anywhere else (AFAIK it is not, but in case I missed something let me
>>> know any good reason to keep the explicit dependency)?
>>>
>>> N
>>>
>>>
>>>
>>> On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath <
>>> nick.pentre...@gmail.com> wrote:
>>>
>>>> Yeah also the integration tests need to be specifically run - I would
>>>> have thought the contributor would have run those tests and also tested the
>>>> change themselves using live Kinesis :(
>>>>
>>>> —
>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>
>>>>
>>>> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz <brk...@gmail.com> wrote:
>>>>
>>>>> I don't think the Kinesis tests specifically ran when that was merged
>>>>> into 1.5.2 :(
>>>>> https://github.com/apache/spark/pull/8957
>>>>>
>>>>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3
>>>>>
>>>>> AFAIK pom changes don't trigger the Kinesis tests.
>>>>>
>>>>> Burak
>>>>>
>>>>> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath <
>>>>> nick.pentre...@gmail.com> wrote:
>>>>>
>>>>>> Yup also works for me on master branch as I've been testing DynamoDB
>>>>>> Streams integration. In fact works with latest KCL 1.6.1 also which I was
>>>>>> using.
>>>>>>
>>>>>> So theKCL version does seem like it could be the issue - somewhere
>>>>>> along the line an exception must be getting swallowed. Though the tests
>>>>>> should have picked this up? Will dig deeper.
>>>>>>
>>>>>> —
>>>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London <
>>>>>> brianmlon...@gmail.com> wrote:
>>>>>>
>>>>>>> Yes, it worked in the 1.6 branch as of commit
>>>>>>> db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less
>>>>>>> serious of an issue, although it would be nice to know what the root 
>>>>>>> cause
>>>>>>> is to avoid a regression.
>>>>>>>
>>>>>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz <brk...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I've noticed this happening when there was some dependency
>>>>>>>> conflicts, and it is super hard to debug.
>>>>>>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is
>>>>>>>> 1.3.0, but it is 1.2.1 in Spark 1.5.1.
>>>>>>>> I feel like that seems to be the problem...
>>>>>>>>
>>>>>>>> Brian, did you verify that it works with the 1.6.0 branch?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Burak
>>>>>>>>
>>>>>>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London <
>>>>>>>> brianmlon...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Nick's symptoms sound identical to mine.  I should mention that I
>>>>>>>>> just pulled the latest version from github and it seems to be working
>>>>>>>>> there.  To reproduce:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    1. Download spark 1.5.2 from
>>>>>>>>>    http://spark.apache.org/downloads.html
>>>>>>>>>    2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0
>>>>>>>>>    -DskipTests clean package
>>>>>>>>>    3. build/mvn -Pkinesis-asl -DskipTests clean package
>>>>>>>>>    4. Then run simultaneously:
>>>>>>>>>    1. bin/run-example streaming.KinesisWordCountASL [Kinesis app
>>>>>>>>>       name] [Kinesis stream name] [endpoint URL]
>>>>>>>>>       2.   bin/run-example streaming.KinesisWordProducerASL
>>>>>>>>>       [Kinesis stream name] [endpoint URL] 100 10
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré <
>>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Nick,
>>>>>>>>>>
>>>>>>>>>> Just to be sure: don't you see some ClassCastException in the log
>>>>>>>>>> ?
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Regards
>>>>>>>>>> JB
>>>>>>>>>>
>>>>>>>>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
>>>>>>>>>> > Could you provide an example / test case and more detail on
>>>>>>>>>> what issue
>>>>>>>>>> > you're facing?
>>>>>>>>>> >
>>>>>>>>>> > I've just tested a simple program reading from a dev Kinesis
>>>>>>>>>> stream and
>>>>>>>>>> > using stream.print() to show the records, and it works under
>>>>>>>>>> 1.5.1 but
>>>>>>>>>> > doesn't appear to be working under 1.5.2.
>>>>>>>>>> >
>>>>>>>>>> > UI for 1.5.2:
>>>>>>>>>> >
>>>>>>>>>> > Inline image 1
>>>>>>>>>> >
>>>>>>>>>> > UI for 1.5.1:
>>>>>>>>>> >
>>>>>>>>>> > Inline image 2
>>>>>>>>>> >
>>>>>>>>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London <
>>>>>>>>>> brianmlon...@gmail.com
>>>>>>>>>> > <mailto:brianmlon...@gmail.com>> wrote:
>>>>>>>>>> >
>>>>>>>>>> >     Has anyone managed to run the Kinesis demo in Spark 1.5.2?
>>>>>>>>>> The
>>>>>>>>>> >     Kinesis ASL that ships with 1.5.2 appears to not work for me
>>>>>>>>>> >     although 1.5.1 is fine. I spent some time with Amazon
>>>>>>>>>> earlier in the
>>>>>>>>>> >     week and the only thing we could do to make it work is to
>>>>>>>>>> change the
>>>>>>>>>> >     version to 1.5.1.  Can someone please attempt to reproduce
>>>>>>>>>> before I
>>>>>>>>>> >     open a JIRA issue for it?
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>>> jbono...@apache.org
>>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>

Reply via email to