Re: Spark streaming with Kinesis broken?

2015-12-11 Thread Nick Pentreath
cc'ing dev list

Ok, looks like when the KCL version was updated in
https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
probably leading to dependency conflict, though as Burak mentions its hard
to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
in driver or worker logs, so any exception is getting swallowed somewhere.

Run starting. Expected test count is: 4
KinesisStreamSuite:
Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
Kinesis streams for tests.
- KinesisUtils API
- RDD generation
- basic operation *** FAILED ***
  The code passed to eventually never returned normally. Attempted 13 times
over 2.04 minutes. Last failure message: Set() did not equal Set(5, 10,
1, 6, 9, 2, 7, 3, 8, 4)
  Data received does not match data sent. (KinesisStreamSuite.scala:188)
- failure recovery *** FAILED ***
  The code passed to eventually never returned normally. Attempted 63 times
over 2.02863831 minutes. Last failure message: isCheckpointPresent
was true, but 0 was not greater than 10. (KinesisStreamSuite.scala:228)
Run completed in 5 minutes, 0 seconds.
Total number of tests run: 4
Suites: completed 1, aborted 0
Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
*** 2 TESTS FAILED ***
[INFO]

[INFO] BUILD FAILURE
[INFO]



KCL 1.3.0 depends on *1.9.37* SDK (
https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
while the Spark Kinesis dependency was kept at *1.9.16.*

I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS SDK
1.9.37 and everything works.

Run starting. Expected test count is: 28
KinesisBackedBlockRDDSuite:
Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
Kinesis streams for tests.
- Basic reading from Kinesis
- Read data available in both block manager and Kinesis
- Read data available only in block manager, not in Kinesis
- Read data available only in Kinesis, not in block manager
- Read data available partially in block manager, rest in Kinesis
- Test isBlockValid skips block fetching from block manager
- Test whether RDD is valid after removing blocks from block anager
KinesisStreamSuite:
- KinesisUtils API
- RDD generation
- basic operation
- failure recovery
KinesisReceiverSuite:
- check serializability of SerializableAWSCredentials
- process records including store and checkpoint
- shouldn't store and checkpoint when receiver is stopped
- shouldn't checkpoint when exception occurs during store
- should set checkpoint time to currentTime + checkpoint interval upon
instantiation
- should checkpoint if we have exceeded the checkpoint interval
- shouldn't checkpoint if we have not exceeded the checkpoint interval
- should add to time when advancing checkpoint
- shutdown should checkpoint if the reason is TERMINATE
- shutdown should not checkpoint if the reason is something other than
TERMINATE
- retry success on first attempt
- retry success on second attempt after a Kinesis throttling exception
- retry success on second attempt after a Kinesis dependency exception
- retry failed after a shutdown exception
- retry failed after an invalid state exception
- retry failed after unexpected exception
- retry failed after exhausing all retries
Run completed in 3 minutes, 28 seconds.
Total number of tests run: 28
Suites: completed 4, aborted 0
Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
All tests passed.

So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can you
file a JIRA for this?

@dev-list, since KCL brings in AWS SDK dependencies itself, is it necessary
to declare an explicit dependency on aws-java-sdk in the Kinesis POM? Also,
from KCL 1.5.0+, only the relevant components used from the AWS SDKs are
brought in, making things a bit leaner (this can be upgraded in Spark
1.7/2.0 perhaps). All local tests (and integration tests) pass with
removing the explicit dependency and only depending on KCL. Is aws-java-sdk
used anywhere else (AFAIK it is not, but in case I missed something let me
know any good reason to keep the explicit dependency)?

N



On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath 
wrote:

> Yeah also the integration tests need to be specifically run - I would have
> thought the contributor would have run those tests and also tested the
> change themselves using live Kinesis :(
>
> —
> Sent from Mailbox 
>
>
> On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz  wrote:
>
>> I don't think the Kinesis tests specifically ran when that was merged
>> into 1.5.2 :(
>> https://github.com/apache/spark/pull/8957
>>
>> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3
>>
>> AFAIK pom changes don't trigger 

Re: Spark streaming with Kinesis broken?

2015-12-11 Thread Brian London
Yes, it's against master: https://github.com/apache/spark/pull/10256

I'll push the KCL version bump after my local tests finish.

On Fri, Dec 11, 2015 at 10:42 AM Nick Pentreath 
wrote:

> Is that PR against master branch?
>
> S3 read comes from Hadoop / jet3t afaik
>
> —
> Sent from Mailbox 
>
>
> On Fri, Dec 11, 2015 at 5:38 PM, Brian London 
> wrote:
>
>> That's good news  I've got a PR in to up the SDK version to 1.10.40 and
>> the KCL to 1.6.1 which I'm running tests on locally now.
>>
>> Is the AWS SDK not used for reading/writing from S3 or do we get that for
>> free from the Hadoop dependencies?
>>
>> On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath 
>> wrote:
>>
>>> cc'ing dev list
>>>
>>> Ok, looks like when the KCL version was updated in
>>> https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
>>> probably leading to dependency conflict, though as Burak mentions its hard
>>> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
>>> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
>>> in driver or worker logs, so any exception is getting swallowed somewhere.
>>>
>>> Run starting. Expected test count is: 4
>>> KinesisStreamSuite:
>>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>>> Kinesis streams for tests.
>>> - KinesisUtils API
>>> - RDD generation
>>> - basic operation *** FAILED ***
>>>   The code passed to eventually never returned normally. Attempted 13
>>> times over 2.04 minutes. Last failure message: Set() did not equal
>>> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)
>>>   Data received does not match data sent. (KinesisStreamSuite.scala:188)
>>> - failure recovery *** FAILED ***
>>>   The code passed to eventually never returned normally. Attempted 63
>>> times over 2.02863831 minutes. Last failure message:
>>> isCheckpointPresent was true, but 0 was not greater than 10.
>>> (KinesisStreamSuite.scala:228)
>>> Run completed in 5 minutes, 0 seconds.
>>> Total number of tests run: 4
>>> Suites: completed 1, aborted 0
>>> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
>>> *** 2 TESTS FAILED ***
>>> [INFO]
>>> 
>>> [INFO] BUILD FAILURE
>>> [INFO]
>>> 
>>>
>>>
>>> KCL 1.3.0 depends on *1.9.37* SDK (
>>> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
>>> while the Spark Kinesis dependency was kept at *1.9.16.*
>>>
>>> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS
>>> SDK 1.9.37 and everything works.
>>>
>>> Run starting. Expected test count is: 28
>>> KinesisBackedBlockRDDSuite:
>>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>>> Kinesis streams for tests.
>>> - Basic reading from Kinesis
>>> - Read data available in both block manager and Kinesis
>>> - Read data available only in block manager, not in Kinesis
>>> - Read data available only in Kinesis, not in block manager
>>> - Read data available partially in block manager, rest in Kinesis
>>> - Test isBlockValid skips block fetching from block manager
>>> - Test whether RDD is valid after removing blocks from block anager
>>> KinesisStreamSuite:
>>> - KinesisUtils API
>>> - RDD generation
>>> - basic operation
>>> - failure recovery
>>> KinesisReceiverSuite:
>>> - check serializability of SerializableAWSCredentials
>>> - process records including store and checkpoint
>>> - shouldn't store and checkpoint when receiver is stopped
>>> - shouldn't checkpoint when exception occurs during store
>>> - should set checkpoint time to currentTime + checkpoint interval upon
>>> instantiation
>>> - should checkpoint if we have exceeded the checkpoint interval
>>> - shouldn't checkpoint if we have not exceeded the checkpoint interval
>>> - should add to time when advancing checkpoint
>>> - shutdown should checkpoint if the reason is TERMINATE
>>> - shutdown should not checkpoint if the reason is something other than
>>> TERMINATE
>>> - retry success on first attempt
>>> - retry success on second attempt after a Kinesis throttling exception
>>> - retry success on second attempt after a Kinesis dependency exception
>>> - retry failed after a shutdown exception
>>> - retry failed after an invalid state exception
>>> - retry failed after unexpected exception
>>> - retry failed after exhausing all retries
>>> Run completed in 3 minutes, 28 seconds.
>>> Total number of tests run: 28
>>> Suites: completed 4, aborted 0
>>> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
>>> All tests passed.
>>>
>>> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can
>>> you file a JIRA for this?
>>>
>>> @dev-list, since KCL brings in AWS SDK dependencies itself, is it
>>> necessary to declare an explicit 

Re: Spark streaming with Kinesis broken?

2015-12-11 Thread Brian London
That's good news  I've got a PR in to up the SDK version to 1.10.40 and the
KCL to 1.6.1 which I'm running tests on locally now.

Is the AWS SDK not used for reading/writing from S3 or do we get that for
free from the Hadoop dependencies?

On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath 
wrote:

> cc'ing dev list
>
> Ok, looks like when the KCL version was updated in
> https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
> probably leading to dependency conflict, though as Burak mentions its hard
> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
> in driver or worker logs, so any exception is getting swallowed somewhere.
>
> Run starting. Expected test count is: 4
> KinesisStreamSuite:
> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
> Kinesis streams for tests.
> - KinesisUtils API
> - RDD generation
> - basic operation *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 13
> times over 2.04 minutes. Last failure message: Set() did not equal
> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)
>   Data received does not match data sent. (KinesisStreamSuite.scala:188)
> - failure recovery *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 63
> times over 2.02863831 minutes. Last failure message:
> isCheckpointPresent was true, but 0 was not greater than 10.
> (KinesisStreamSuite.scala:228)
> Run completed in 5 minutes, 0 seconds.
> Total number of tests run: 4
> Suites: completed 1, aborted 0
> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
> *** 2 TESTS FAILED ***
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
>
>
> KCL 1.3.0 depends on *1.9.37* SDK (
> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
> while the Spark Kinesis dependency was kept at *1.9.16.*
>
> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS SDK
> 1.9.37 and everything works.
>
> Run starting. Expected test count is: 28
> KinesisBackedBlockRDDSuite:
> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
> Kinesis streams for tests.
> - Basic reading from Kinesis
> - Read data available in both block manager and Kinesis
> - Read data available only in block manager, not in Kinesis
> - Read data available only in Kinesis, not in block manager
> - Read data available partially in block manager, rest in Kinesis
> - Test isBlockValid skips block fetching from block manager
> - Test whether RDD is valid after removing blocks from block anager
> KinesisStreamSuite:
> - KinesisUtils API
> - RDD generation
> - basic operation
> - failure recovery
> KinesisReceiverSuite:
> - check serializability of SerializableAWSCredentials
> - process records including store and checkpoint
> - shouldn't store and checkpoint when receiver is stopped
> - shouldn't checkpoint when exception occurs during store
> - should set checkpoint time to currentTime + checkpoint interval upon
> instantiation
> - should checkpoint if we have exceeded the checkpoint interval
> - shouldn't checkpoint if we have not exceeded the checkpoint interval
> - should add to time when advancing checkpoint
> - shutdown should checkpoint if the reason is TERMINATE
> - shutdown should not checkpoint if the reason is something other than
> TERMINATE
> - retry success on first attempt
> - retry success on second attempt after a Kinesis throttling exception
> - retry success on second attempt after a Kinesis dependency exception
> - retry failed after a shutdown exception
> - retry failed after an invalid state exception
> - retry failed after unexpected exception
> - retry failed after exhausing all retries
> Run completed in 3 minutes, 28 seconds.
> Total number of tests run: 28
> Suites: completed 4, aborted 0
> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
> All tests passed.
>
> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can you
> file a JIRA for this?
>
> @dev-list, since KCL brings in AWS SDK dependencies itself, is it
> necessary to declare an explicit dependency on aws-java-sdk in the Kinesis
> POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS
> SDKs are brought in, making things a bit leaner (this can be upgraded in
> Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with
> removing the explicit dependency and only depending on KCL. Is aws-java-sdk
> used anywhere else (AFAIK it is not, but in case I missed something let me
> know any good reason to keep the explicit dependency)?
>
> N
>
>
>
> On Fri, Dec 11, 2015 at 6:55 AM, Nick Pentreath 
> wrote:
>
>> Yeah also the integration tests need to be specifically run - I 

Re: Spark streaming with Kinesis broken?

2015-12-11 Thread Nick Pentreath
Is that PR against master branch?




S3 read comes from Hadoop / jet3t afaik



—
Sent from Mailbox

On Fri, Dec 11, 2015 at 5:38 PM, Brian London 
wrote:

> That's good news  I've got a PR in to up the SDK version to 1.10.40 and the
> KCL to 1.6.1 which I'm running tests on locally now.
> Is the AWS SDK not used for reading/writing from S3 or do we get that for
> free from the Hadoop dependencies?
> On Fri, Dec 11, 2015 at 5:07 AM Nick Pentreath 
> wrote:
>> cc'ing dev list
>>
>> Ok, looks like when the KCL version was updated in
>> https://github.com/apache/spark/pull/8957, the AWS SDK version was not,
>> probably leading to dependency conflict, though as Burak mentions its hard
>> to debug as no exceptions seem to get thrown... I've tested 1.5.2 locally
>> and on my 1.5.2 EC2 cluster, and no data is received, and nothing shows up
>> in driver or worker logs, so any exception is getting swallowed somewhere.
>>
>> Run starting. Expected test count is: 4
>> KinesisStreamSuite:
>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>> Kinesis streams for tests.
>> - KinesisUtils API
>> - RDD generation
>> - basic operation *** FAILED ***
>>   The code passed to eventually never returned normally. Attempted 13
>> times over 2.04 minutes. Last failure message: Set() did not equal
>> Set(5, 10, 1, 6, 9, 2, 7, 3, 8, 4)
>>   Data received does not match data sent. (KinesisStreamSuite.scala:188)
>> - failure recovery *** FAILED ***
>>   The code passed to eventually never returned normally. Attempted 63
>> times over 2.02863831 minutes. Last failure message:
>> isCheckpointPresent was true, but 0 was not greater than 10.
>> (KinesisStreamSuite.scala:228)
>> Run completed in 5 minutes, 0 seconds.
>> Total number of tests run: 4
>> Suites: completed 1, aborted 0
>> Tests: succeeded 2, failed 2, canceled 0, ignored 0, pending 0
>> *** 2 TESTS FAILED ***
>> [INFO]
>> 
>> [INFO] BUILD FAILURE
>> [INFO]
>> 
>>
>>
>> KCL 1.3.0 depends on *1.9.37* SDK (
>> https://github.com/awslabs/amazon-kinesis-client/blob/1.3.0/pom.xml#L26)
>> while the Spark Kinesis dependency was kept at *1.9.16.*
>>
>> I've run the integration tests on branch-1.5 (1.5.3-SNAPSHOT) with AWS SDK
>> 1.9.37 and everything works.
>>
>> Run starting. Expected test count is: 28
>> KinesisBackedBlockRDDSuite:
>> Using endpoint URL https://kinesis.eu-west-1.amazonaws.com for creating
>> Kinesis streams for tests.
>> - Basic reading from Kinesis
>> - Read data available in both block manager and Kinesis
>> - Read data available only in block manager, not in Kinesis
>> - Read data available only in Kinesis, not in block manager
>> - Read data available partially in block manager, rest in Kinesis
>> - Test isBlockValid skips block fetching from block manager
>> - Test whether RDD is valid after removing blocks from block anager
>> KinesisStreamSuite:
>> - KinesisUtils API
>> - RDD generation
>> - basic operation
>> - failure recovery
>> KinesisReceiverSuite:
>> - check serializability of SerializableAWSCredentials
>> - process records including store and checkpoint
>> - shouldn't store and checkpoint when receiver is stopped
>> - shouldn't checkpoint when exception occurs during store
>> - should set checkpoint time to currentTime + checkpoint interval upon
>> instantiation
>> - should checkpoint if we have exceeded the checkpoint interval
>> - shouldn't checkpoint if we have not exceeded the checkpoint interval
>> - should add to time when advancing checkpoint
>> - shutdown should checkpoint if the reason is TERMINATE
>> - shutdown should not checkpoint if the reason is something other than
>> TERMINATE
>> - retry success on first attempt
>> - retry success on second attempt after a Kinesis throttling exception
>> - retry success on second attempt after a Kinesis dependency exception
>> - retry failed after a shutdown exception
>> - retry failed after an invalid state exception
>> - retry failed after unexpected exception
>> - retry failed after exhausing all retries
>> Run completed in 3 minutes, 28 seconds.
>> Total number of tests run: 28
>> Suites: completed 4, aborted 0
>> Tests: succeeded 28, failed 0, canceled 0, ignored 0, pending 0
>> All tests passed.
>>
>> So this is a regression in Spark Streaming Kinesis 1.5.2 - @Brian can you
>> file a JIRA for this?
>>
>> @dev-list, since KCL brings in AWS SDK dependencies itself, is it
>> necessary to declare an explicit dependency on aws-java-sdk in the Kinesis
>> POM? Also, from KCL 1.5.0+, only the relevant components used from the AWS
>> SDKs are brought in, making things a bit leaner (this can be upgraded in
>> Spark 1.7/2.0 perhaps). All local tests (and integration tests) pass with
>> removing the explicit dependency and only depending on KCL. Is aws-java-sdk
>> used anywhere 

Re: Spark streaming with Kinesis broken?

2015-12-10 Thread Brian London
Nick's symptoms sound identical to mine.  I should mention that I just
pulled the latest version from github and it seems to be working there.  To
reproduce:


   1. Download spark 1.5.2 from http://spark.apache.org/downloads.html
   2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests
   clean package
   3. build/mvn -Pkinesis-asl -DskipTests clean package
   4. Then run simultaneously:
   1. bin/run-example streaming.KinesisWordCountASL [Kinesis app name]
  [Kinesis stream name] [endpoint URL]
  2.   bin/run-example streaming.KinesisWordProducerASL [Kinesis stream
  name] [endpoint URL] 100 10


On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré 
wrote:

> Hi Nick,
>
> Just to be sure: don't you see some ClassCastException in the log ?
>
> Thanks,
> Regards
> JB
>
> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
> > Could you provide an example / test case and more detail on what issue
> > you're facing?
> >
> > I've just tested a simple program reading from a dev Kinesis stream and
> > using stream.print() to show the records, and it works under 1.5.1 but
> > doesn't appear to be working under 1.5.2.
> >
> > UI for 1.5.2:
> >
> > Inline image 1
> >
> > UI for 1.5.1:
> >
> > Inline image 2
> >
> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London  > > wrote:
> >
> > Has anyone managed to run the Kinesis demo in Spark 1.5.2?  The
> > Kinesis ASL that ships with 1.5.2 appears to not work for me
> > although 1.5.1 is fine. I spent some time with Amazon earlier in the
> > week and the only thing we could do to make it work is to change the
> > version to 1.5.1.  Can someone please attempt to reproduce before I
> > open a JIRA issue for it?
> >
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Spark streaming with Kinesis broken?

2015-12-10 Thread Jean-Baptiste Onofré

Hi Nick,

Just to be sure: don't you see some ClassCastException in the log ?

Thanks,
Regards
JB

On 12/10/2015 07:56 PM, Nick Pentreath wrote:

Could you provide an example / test case and more detail on what issue
you're facing?

I've just tested a simple program reading from a dev Kinesis stream and
using stream.print() to show the records, and it works under 1.5.1 but
doesn't appear to be working under 1.5.2.

UI for 1.5.2:

Inline image 1

UI for 1.5.1:

Inline image 2

On Thu, Dec 10, 2015 at 5:50 PM, Brian London > wrote:

Has anyone managed to run the Kinesis demo in Spark 1.5.2?  The
Kinesis ASL that ships with 1.5.2 appears to not work for me
although 1.5.1 is fine. I spent some time with Amazon earlier in the
week and the only thing we could do to make it work is to change the
version to 1.5.1.  Can someone please attempt to reproduce before I
open a JIRA issue for it?




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark streaming with Kinesis broken?

2015-12-10 Thread Brian London
Yes, it worked in the 1.6 branch as of commit
db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less serious
of an issue, although it would be nice to know what the root cause is to
avoid a regression.

On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz  wrote:

> I've noticed this happening when there was some dependency conflicts, and
> it is super hard to debug.
> It seems that the KinesisClientLibrary version in Spark 1.5.2 is 1.3.0,
> but it is 1.2.1 in Spark 1.5.1.
> I feel like that seems to be the problem...
>
> Brian, did you verify that it works with the 1.6.0 branch?
>
> Thanks,
> Burak
>
> On Thu, Dec 10, 2015 at 11:45 AM, Brian London 
> wrote:
>
>> Nick's symptoms sound identical to mine.  I should mention that I just
>> pulled the latest version from github and it seems to be working there.  To
>> reproduce:
>>
>>
>>1. Download spark 1.5.2 from http://spark.apache.org/downloads.html
>>2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests
>>clean package
>>3. build/mvn -Pkinesis-asl -DskipTests clean package
>>4. Then run simultaneously:
>>1. bin/run-example streaming.KinesisWordCountASL [Kinesis app name]
>>   [Kinesis stream name] [endpoint URL]
>>   2.   bin/run-example streaming.KinesisWordProducerASL [Kinesis
>>   stream name] [endpoint URL] 100 10
>>
>>
>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré 
>> wrote:
>>
>>> Hi Nick,
>>>
>>> Just to be sure: don't you see some ClassCastException in the log ?
>>>
>>> Thanks,
>>> Regards
>>> JB
>>>
>>> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
>>> > Could you provide an example / test case and more detail on what issue
>>> > you're facing?
>>> >
>>> > I've just tested a simple program reading from a dev Kinesis stream and
>>> > using stream.print() to show the records, and it works under 1.5.1 but
>>> > doesn't appear to be working under 1.5.2.
>>> >
>>> > UI for 1.5.2:
>>> >
>>> > Inline image 1
>>> >
>>> > UI for 1.5.1:
>>> >
>>> > Inline image 2
>>> >
>>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London >> > > wrote:
>>> >
>>> > Has anyone managed to run the Kinesis demo in Spark 1.5.2?  The
>>> > Kinesis ASL that ships with 1.5.2 appears to not work for me
>>> > although 1.5.1 is fine. I spent some time with Amazon earlier in
>>> the
>>> > week and the only thing we could do to make it work is to change
>>> the
>>> > version to 1.5.1.  Can someone please attempt to reproduce before I
>>> > open a JIRA issue for it?
>>> >
>>> >
>>>
>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>


Re: Spark streaming with Kinesis broken?

2015-12-10 Thread Burak Yavuz
I've noticed this happening when there was some dependency conflicts, and
it is super hard to debug.
It seems that the KinesisClientLibrary version in Spark 1.5.2 is 1.3.0, but
it is 1.2.1 in Spark 1.5.1.
I feel like that seems to be the problem...

Brian, did you verify that it works with the 1.6.0 branch?

Thanks,
Burak

On Thu, Dec 10, 2015 at 11:45 AM, Brian London 
wrote:

> Nick's symptoms sound identical to mine.  I should mention that I just
> pulled the latest version from github and it seems to be working there.  To
> reproduce:
>
>
>1. Download spark 1.5.2 from http://spark.apache.org/downloads.html
>2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests
>clean package
>3. build/mvn -Pkinesis-asl -DskipTests clean package
>4. Then run simultaneously:
>1. bin/run-example streaming.KinesisWordCountASL [Kinesis app name]
>   [Kinesis stream name] [endpoint URL]
>   2.   bin/run-example streaming.KinesisWordProducerASL [Kinesis
>   stream name] [endpoint URL] 100 10
>
>
> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Nick,
>>
>> Just to be sure: don't you see some ClassCastException in the log ?
>>
>> Thanks,
>> Regards
>> JB
>>
>> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
>> > Could you provide an example / test case and more detail on what issue
>> > you're facing?
>> >
>> > I've just tested a simple program reading from a dev Kinesis stream and
>> > using stream.print() to show the records, and it works under 1.5.1 but
>> > doesn't appear to be working under 1.5.2.
>> >
>> > UI for 1.5.2:
>> >
>> > Inline image 1
>> >
>> > UI for 1.5.1:
>> >
>> > Inline image 2
>> >
>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London > > > wrote:
>> >
>> > Has anyone managed to run the Kinesis demo in Spark 1.5.2?  The
>> > Kinesis ASL that ships with 1.5.2 appears to not work for me
>> > although 1.5.1 is fine. I spent some time with Amazon earlier in the
>> > week and the only thing we could do to make it work is to change the
>> > version to 1.5.1.  Can someone please attempt to reproduce before I
>> > open a JIRA issue for it?
>> >
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


Re: Spark streaming with Kinesis broken?

2015-12-10 Thread Nick Pentreath
Yup also works for me on master branch as I've been testing DynamoDB Streams 
integration. In fact works with latest KCL 1.6.1 also which I was using.




So theKCL version does seem like it could be the issue - somewhere along the 
line an exception must be getting swallowed. Though the tests should have 
picked this up? Will dig deeper.




—
Sent from Mailbox

On Thu, Dec 10, 2015 at 11:07 PM, Brian London 
wrote:

> Yes, it worked in the 1.6 branch as of commit
> db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less serious
> of an issue, although it would be nice to know what the root cause is to
> avoid a regression.
> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz  wrote:
>> I've noticed this happening when there was some dependency conflicts, and
>> it is super hard to debug.
>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is 1.3.0,
>> but it is 1.2.1 in Spark 1.5.1.
>> I feel like that seems to be the problem...
>>
>> Brian, did you verify that it works with the 1.6.0 branch?
>>
>> Thanks,
>> Burak
>>
>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London 
>> wrote:
>>
>>> Nick's symptoms sound identical to mine.  I should mention that I just
>>> pulled the latest version from github and it seems to be working there.  To
>>> reproduce:
>>>
>>>
>>>1. Download spark 1.5.2 from http://spark.apache.org/downloads.html
>>>2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests
>>>clean package
>>>3. build/mvn -Pkinesis-asl -DskipTests clean package
>>>4. Then run simultaneously:
>>>1. bin/run-example streaming.KinesisWordCountASL [Kinesis app name]
>>>   [Kinesis stream name] [endpoint URL]
>>>   2.   bin/run-example streaming.KinesisWordProducerASL [Kinesis
>>>   stream name] [endpoint URL] 100 10
>>>
>>>
>>> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré 
>>> wrote:
>>>
 Hi Nick,

 Just to be sure: don't you see some ClassCastException in the log ?

 Thanks,
 Regards
 JB

 On 12/10/2015 07:56 PM, Nick Pentreath wrote:
 > Could you provide an example / test case and more detail on what issue
 > you're facing?
 >
 > I've just tested a simple program reading from a dev Kinesis stream and
 > using stream.print() to show the records, and it works under 1.5.1 but
 > doesn't appear to be working under 1.5.2.
 >
 > UI for 1.5.2:
 >
 > Inline image 1
 >
 > UI for 1.5.1:
 >
 > Inline image 2
 >
 > On Thu, Dec 10, 2015 at 5:50 PM, Brian London  > wrote:
 >
 > Has anyone managed to run the Kinesis demo in Spark 1.5.2?  The
 > Kinesis ASL that ships with 1.5.2 appears to not work for me
 > although 1.5.1 is fine. I spent some time with Amazon earlier in
 the
 > week and the only thing we could do to make it work is to change
 the
 > version to 1.5.1.  Can someone please attempt to reproduce before I
 > open a JIRA issue for it?
 >
 >

 --
 Jean-Baptiste Onofré
 jbono...@apache.org
 http://blog.nanthrax.net
 Talend - http://www.talend.com

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


>>

Re: Spark streaming with Kinesis broken?

2015-12-10 Thread Burak Yavuz
I don't think the Kinesis tests specifically ran when that was merged into
1.5.2 :(
https://github.com/apache/spark/pull/8957
https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3

AFAIK pom changes don't trigger the Kinesis tests.

Burak

On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath 
wrote:

> Yup also works for me on master branch as I've been testing DynamoDB
> Streams integration. In fact works with latest KCL 1.6.1 also which I was
> using.
>
> So theKCL version does seem like it could be the issue - somewhere along
> the line an exception must be getting swallowed. Though the tests should
> have picked this up? Will dig deeper.
>
> —
> Sent from Mailbox 
>
>
> On Thu, Dec 10, 2015 at 11:07 PM, Brian London 
> wrote:
>
>> Yes, it worked in the 1.6 branch as of commit
>> db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less
>> serious of an issue, although it would be nice to know what the root cause
>> is to avoid a regression.
>>
>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz  wrote:
>>
>>> I've noticed this happening when there was some dependency conflicts,
>>> and it is super hard to debug.
>>> It seems that the KinesisClientLibrary version in Spark 1.5.2 is 1.3.0,
>>> but it is 1.2.1 in Spark 1.5.1.
>>> I feel like that seems to be the problem...
>>>
>>> Brian, did you verify that it works with the 1.6.0 branch?
>>>
>>> Thanks,
>>> Burak
>>>
>>> On Thu, Dec 10, 2015 at 11:45 AM, Brian London 
>>> wrote:
>>>
 Nick's symptoms sound identical to mine.  I should mention that I just
 pulled the latest version from github and it seems to be working there.  To
 reproduce:


1. Download spark 1.5.2 from http://spark.apache.org/downloads.html
2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests
clean package
3. build/mvn -Pkinesis-asl -DskipTests clean package
4. Then run simultaneously:
1. bin/run-example streaming.KinesisWordCountASL [Kinesis app name]
   [Kinesis stream name] [endpoint URL]
   2.   bin/run-example streaming.KinesisWordProducerASL [Kinesis
   stream name] [endpoint URL] 100 10


 On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré 
 wrote:

> Hi Nick,
>
> Just to be sure: don't you see some ClassCastException in the log ?
>
> Thanks,
> Regards
> JB
>
> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
> > Could you provide an example / test case and more detail on what
> issue
> > you're facing?
> >
> > I've just tested a simple program reading from a dev Kinesis stream
> and
> > using stream.print() to show the records, and it works under 1.5.1
> but
> > doesn't appear to be working under 1.5.2.
> >
> > UI for 1.5.2:
> >
> > Inline image 1
> >
> > UI for 1.5.1:
> >
> > Inline image 2
> >
> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London <
> brianmlon...@gmail.com
> > > wrote:
> >
> > Has anyone managed to run the Kinesis demo in Spark 1.5.2?  The
> > Kinesis ASL that ships with 1.5.2 appears to not work for me
> > although 1.5.1 is fine. I spent some time with Amazon earlier in
> the
> > week and the only thing we could do to make it work is to change
> the
> > version to 1.5.1.  Can someone please attempt to reproduce
> before I
> > open a JIRA issue for it?
> >
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>>>
>


Spark streaming with Kinesis broken?

2015-12-10 Thread Brian London
Has anyone managed to run the Kinesis demo in Spark 1.5.2?  The Kinesis ASL
that ships with 1.5.2 appears to not work for me although 1.5.1 is fine. I
spent some time with Amazon earlier in the week and the only thing we could
do to make it work is to change the version to 1.5.1.  Can someone please
attempt to reproduce before I open a JIRA issue for it?


Re: Spark streaming with Kinesis broken?

2015-12-10 Thread Nick Pentreath
Yeah also the integration tests need to be specifically run - I would have 
thought the contributor would have run those tests and also tested the change 
themselves using live Kinesis :(



—
Sent from Mailbox

On Fri, Dec 11, 2015 at 6:18 AM, Burak Yavuz  wrote:

> I don't think the Kinesis tests specifically ran when that was merged into
> 1.5.2 :(
> https://github.com/apache/spark/pull/8957
> https://github.com/apache/spark/commit/883bd8fccf83aae7a2a847c9a6ca129fac86e6a3
> AFAIK pom changes don't trigger the Kinesis tests.
> Burak
> On Thu, Dec 10, 2015 at 8:09 PM, Nick Pentreath 
> wrote:
>> Yup also works for me on master branch as I've been testing DynamoDB
>> Streams integration. In fact works with latest KCL 1.6.1 also which I was
>> using.
>>
>> So theKCL version does seem like it could be the issue - somewhere along
>> the line an exception must be getting swallowed. Though the tests should
>> have picked this up? Will dig deeper.
>>
>> —
>> Sent from Mailbox 
>>
>>
>> On Thu, Dec 10, 2015 at 11:07 PM, Brian London 
>> wrote:
>>
>>> Yes, it worked in the 1.6 branch as of commit
>>> db5165246f2888537dd0f3d4c5a515875c7358ed.  That makes it much less
>>> serious of an issue, although it would be nice to know what the root cause
>>> is to avoid a regression.
>>>
>>> On Thu, Dec 10, 2015 at 4:03 PM Burak Yavuz  wrote:
>>>
 I've noticed this happening when there was some dependency conflicts,
 and it is super hard to debug.
 It seems that the KinesisClientLibrary version in Spark 1.5.2 is 1.3.0,
 but it is 1.2.1 in Spark 1.5.1.
 I feel like that seems to be the problem...

 Brian, did you verify that it works with the 1.6.0 branch?

 Thanks,
 Burak

 On Thu, Dec 10, 2015 at 11:45 AM, Brian London 
 wrote:

> Nick's symptoms sound identical to mine.  I should mention that I just
> pulled the latest version from github and it seems to be working there.  
> To
> reproduce:
>
>
>1. Download spark 1.5.2 from http://spark.apache.org/downloads.html
>2. build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests
>clean package
>3. build/mvn -Pkinesis-asl -DskipTests clean package
>4. Then run simultaneously:
>1. bin/run-example streaming.KinesisWordCountASL [Kinesis app name]
>   [Kinesis stream name] [endpoint URL]
>   2.   bin/run-example streaming.KinesisWordProducerASL [Kinesis
>   stream name] [endpoint URL] 100 10
>
>
> On Thu, Dec 10, 2015 at 2:05 PM Jean-Baptiste Onofré 
> wrote:
>
>> Hi Nick,
>>
>> Just to be sure: don't you see some ClassCastException in the log ?
>>
>> Thanks,
>> Regards
>> JB
>>
>> On 12/10/2015 07:56 PM, Nick Pentreath wrote:
>> > Could you provide an example / test case and more detail on what
>> issue
>> > you're facing?
>> >
>> > I've just tested a simple program reading from a dev Kinesis stream
>> and
>> > using stream.print() to show the records, and it works under 1.5.1
>> but
>> > doesn't appear to be working under 1.5.2.
>> >
>> > UI for 1.5.2:
>> >
>> > Inline image 1
>> >
>> > UI for 1.5.1:
>> >
>> > Inline image 2
>> >
>> > On Thu, Dec 10, 2015 at 5:50 PM, Brian London <
>> brianmlon...@gmail.com
>> > > wrote:
>> >
>> > Has anyone managed to run the Kinesis demo in Spark 1.5.2?  The
>> > Kinesis ASL that ships with 1.5.2 appears to not work for me
>> > although 1.5.1 is fine. I spent some time with Amazon earlier in
>> the
>> > week and the only thing we could do to make it work is to change
>> the
>> > version to 1.5.1.  Can someone please attempt to reproduce
>> before I
>> > open a JIRA issue for it?
>> >
>> >
>>
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>

>>