Is that PR against master branch?



S3 read comes from Hadoop / jet3t afaik



—
Sent from Mailbox

On Fri, Dec 11, 2015 at 5:38 PM, Brian London <brianmlon...@gmail.com>
wrote:

> That's good news  I've got a PR in to up the SDK version to 1.10.40 and the
> KCL to 1.6.1 which I'm running tests on locally now.
> Is the AWS SDK not used for reading/writing from S3 or do we get that for
> free from the Hadoop dependencies?
> On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath <nick.pentre...@gmail.com>
> wrote:
>> cc'ing dev list
>>
>> Ok, looks like when the KCL version was updated in
>> https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
>> probably leading to dependency conflict, though as Burak mentions its hard
>> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
>> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
>> in driver or worker logs, so any exception is getting swallowed somewhere.
>>
>> Run starting. Expected test count is: 4
>> KinesisStreamSuite:
>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>> Kinesis streams for tests.
>> - KinesisUtils API
>> - RDD generation
>> - basic operation *** FAILED ***
>>   The code passed to eventually never returned normally. Attempted 13
>> times over 2.047777 minutes. Last failure message: Set() did not equal
>> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)
>>   Data received does not match data sent. (KinesisStreamSuite.scala:188)
>> - failure recovery *** FAILED ***
>>   The code passed to eventually never returned normally. Attempted 63
>> times over 2.0286383166666666 minutes. Last failure message:
>> isCheckpointPresent was true, but 0 was not greater than 10.
>> (KinesisStreamSuite.scala:228)
>> Run completed in 5 minutes, 0 seconds.
>> Total number of tests run: 4
>> Suites: completed 1, aborted 0
>> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
>> *** 2 TESTS FAILED ***
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] BUILD FAILURE
>> [INFO]
>> ------------------------------------------------------------------------
>>
>>
>> KCL 1.3.0 depends on *1.9.37* SDK (
>> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
>> while the Spark Kinesis dependency was kept at *1.9.16.*
>>
>> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS SDK
>> 1.9.37 and everything works.
>>
>> Run starting. Expected test count is: 28
>> KinesisBackedBlockRDDSuite:
>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>> Kinesis streams for tests.
>> - Basic reading from Kinesis
>> - Read data available in both block manager and Kinesis
>> - Read data available only in block manager, not in Kinesis
>> - Read data available only in Kinesis, not in block manager
>> - Read data available partially in block manager, rest in Kinesis
>> - Test isBlockValid skips block fetching from block manager
>> - Test whether RDD is valid after removing blocks from block anager
>> KinesisStreamSuite:
>> - KinesisUtils API
>> - RDD generation
>> - basic operation
>> - failure recovery
>> KinesisReceiverSuite:
>> - check serializability of SerializableAWSCredentials
>> - process records including store and checkpoint
>> - shouldn't store and checkpoint when receiver is stopped
>> - shouldn't checkpoint when exception occurs during store
>> - should set checkpoint time to currentTime + checkpoint interval upon
>> instantiation
>> - should checkpoint if we have exceeded the checkpoint interval
>> - shouldn't checkpoint if we have not exceeded the checkpoint interval
>> - should add to time when advancing checkpoint
>> - shutdown should checkpoint if the reason is TERMINATE
>> - shutdown should not checkpoint if the reason is something other than
>> TERMINATE
>> - retry success on first attempt
>> - retry success on second attempt after a Kinesis throttling exception
>> - retry success on second attempt after a Kinesis dependency exception
>> - retry failed after a shutdown exception
>> - retry failed after an invalid state exception
>> - retry failed after unexpected exception
>> - retry failed after exhausing all retries
>> Run completed in 3 minutes, 28 seconds.
>> Total number of tests run: 28
>> Suites: completed 4, aborted 0
>> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
>> All tests passed.
>>
>> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can you
>> file a JIRA for this?
>>
>> @dev-list, since KCL brings in AWS SDK dependencies itself, is it
>> necessary to declare an explicit dependency on aws-java-sdk in the Kinesis
>> POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS
>> SDKs are brought in, making things a bit leaner (this can be upgraded in
>> Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with
>> removing the explicit dependency and only depending on KCL. Is aws-java-sdk
>> used anywhere else (AFAIK it is not, but in case I missed something let me
>> know any good reason to keep the explicit dependency)?
>>
>> N
>>
>>
>>
>> On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath <nick.pentre...@gmail.com>
>> wrote:
>>
>>> Yeah also the integration tests need to be specifically run - I would
>>> have thought the contributor would have run those tests and also tested the
>>> change themselves using live Kinesis :(
>>>
>>> —
>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>
>>>
>>> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz <brk...@gmail.com> wrote:
>>>
>>>> I don't think the Kinesis tests specifically ran when that was merged
>>>> into 1.5.2 :(
>>>> https://github.com/apache/spark/pull/8957
>>>>
>>>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3
>>>>
>>>> AFAIK pom changes don't trigger the Kinesis tests.
>>>>
>>>> Burak
>>>>
>>>> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath <
>>>> nick.pentre...@gmail.com> wrote:
>>>>
>>>>> Yup also works for me on master branch as I've been testing DynamoDB
>>>>> Streams integration. In fact works with latest KCL 1.6.1 also which I was
>>>>> using.
>>>>>
>>>>> So theKCL version does seem like it could be the issue - somewhere
>>>>> along the line an exception must be getting swallowed. Though the tests
>>>>> should have picked this up? Will dig deeper.
>>>>>
>>>>> —
>>>>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>>>>
>>>>>
>>>>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London <brianmlon...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes, it worked in the 1.6 branch as of commit
>>>>>> db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less
>>>>>> serious of an issue, although it would be nice to know what the root 
>>>>>> cause
>>>>>> is to avoid a regression.
>>>>>>
>>>>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz <brk...@gmail.com> wrote:
>>>>>>
>>>>>>> I've noticed this happening when there was some dependency conflicts,
>>>>>>> and it is super hard to debug.
>>>>>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is
>>>>>>> 1.3.0, but it is 1.2.1 in Spark 1.5.1.
>>>>>>> I feel like that seems to be the problem...
>>>>>>>
>>>>>>> Brian, did you verify that it works with the 1.6.0 branch?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Burak
>>>>>>>
>>>>>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London <
>>>>>>> brianmlon...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Nick's symptoms sound identical to mine.  I should mention that I
>>>>>>>> just pulled the latest version from github and it seems to be working
>>>>>>>> there.  To reproduce:
>>>>>>>>
>>>>>>>>
>>>>>>>>    1. Download spark 1.5.2 from
>>>>>>>>    http://spark.apache.org/downloads.html
>>>>>>>>    2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0
>>>>>>>>    -DskipTests clean package
>>>>>>>>    3. build/mvn -Pkinesis-asl -DskipTests clean package
>>>>>>>>    4. Then run simultaneously:
>>>>>>>>    1. bin/run-example streaming.KinesisWordCountASL [Kinesis app
>>>>>>>>       name] [Kinesis stream name] [endpoint URL]
>>>>>>>>       2.   bin/run-example streaming.KinesisWordProducerASL
>>>>>>>>       [Kinesis stream name] [endpoint URL] 100 10
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré <
>>>>>>>> j...@nanthrax.net> wrote:
>>>>>>>>
>>>>>>>>> Hi Nick,
>>>>>>>>>
>>>>>>>>> Just to be sure: don't you see some ClassCastException in the log ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Regards
>>>>>>>>> JB
>>>>>>>>>
>>>>>>>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
>>>>>>>>> > Could you provide an example / test case and more detail on what
>>>>>>>>> issue
>>>>>>>>> > you're facing?
>>>>>>>>> >
>>>>>>>>> > I've just tested a simple program reading from a dev Kinesis
>>>>>>>>> stream and
>>>>>>>>> > using stream.print() to show the records, and it works under
>>>>>>>>> 1.5.1 but
>>>>>>>>> > doesn't appear to be working under 1.5.2.
>>>>>>>>> >
>>>>>>>>> > UI for 1.5.2:
>>>>>>>>> >
>>>>>>>>> > Inline image 1
>>>>>>>>> >
>>>>>>>>> > UI for 1.5.1:
>>>>>>>>> >
>>>>>>>>> > Inline image 2
>>>>>>>>> >
>>>>>>>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London <
>>>>>>>>> brianmlon...@gmail.com
>>>>>>>>> > <mailto:brianmlon...@gmail.com>> wrote:
>>>>>>>>> >
>>>>>>>>> >     Has anyone managed to run the Kinesis demo in Spark 1.5.2?
>>>>>>>>> The
>>>>>>>>> >     Kinesis ASL that ships with 1.5.2 appears to not work for me
>>>>>>>>> >     although 1.5.1 is fine. I spent some time with Amazon earlier
>>>>>>>>> in the
>>>>>>>>> >     week and the only thing we could do to make it work is to
>>>>>>>>> change the
>>>>>>>>> >     version to 1.5.1.  Can someone please attempt to reproduce
>>>>>>>>> before I
>>>>>>>>> >     open a JIRA issue for it?
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jean-Baptiste Onofré
>>>>>>>>> jbono...@apache.org
>>>>>>>>> http://blog.nanthrax.net
>>>>>>>>> Talend - http://www.talend.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to