Wow, great job. We will take a look and try our application again with your
patch.


On Tue, Aug 26, 2014 at 5:31 AM, Kousuke Saruta <saru...@oss.nttdata.co.jp>
wrote:

> Hi Shengzhe
>
> I faced to same situation.
>
> I think, Connection and ConnectionManager have some race condition issues
> and the error you mentioned may be caused by the issues.
> Now I'm trying to resolve the issue in https://github.com/apache/
> spark/pull/2019.
> Please check it out.
>
> - Kousuke
>
>
> (2014/08/26 8:53), yao wrote:
>
>> Hi Folks,
>>
>> We are testing our home-made KMeans algorithm using Spark on Yarn.
>> Recently, we've found that the application failed frequently when doing
>> clustering over 300,000,000 users (each user is represented by a feature
>> vector and the whole data set is around 600,000,000). After digging into
>> the job log, we've found that there are many CancelledKeyException throwed
>> by ConnectionManager but not observed other exceptions. We double frequent
>> CancelledKeyException brings the whole application down since the
>> application often failed on the third or fourth iteration for large
>> datasets. Welcome to any directional suggestions.
>>
>> *Errors in job log*:
>>
>> java.nio.channels.CancelledKeyException
>>          at
>> org.apache.spark.network.ConnectionManager.run(
>> ConnectionManager.scala:363)
>>          at
>> org.apache.spark.network.ConnectionManager$$anon$4.run(
>> ConnectionManager.scala:116)
>> 14/08/25 19:04:32 INFO ConnectionManager: Removing ReceivingConnection to
>> ConnectionManagerId(lsv-289.rfiserve.net,43199)
>> 14/08/25 19:04:32 ERROR ConnectionManager: Corresponding
>> SendingConnectionManagerId not found
>> 14/08/25 19:04:32 INFO ConnectionManager: Key not valid ?
>> sun.nio.ch.SelectionKeyImpl@2570cd62
>> 14/08/25 19:04:32 INFO ConnectionManager: key already cancelled ?
>> sun.nio.ch.SelectionKeyImpl@2570cd62
>> java.nio.channels.CancelledKeyException
>>          at
>> org.apache.spark.network.ConnectionManager.run(
>> ConnectionManager.scala:363)
>>          at
>> org.apache.spark.network.ConnectionManager$$anon$4.run(
>> ConnectionManager.scala:116)
>> 14/08/25 19:04:32 INFO ConnectionManager: Removing ReceivingConnection to
>> ConnectionManagerId(lsv-289.rfiserve.net,56727)
>> 14/08/25 19:04:32 INFO ConnectionManager: Removing SendingConnection to
>> ConnectionManagerId(lsv-289.rfiserve.net,56727)
>> 14/08/25 19:04:32 INFO ConnectionManager: Removing SendingConnection to
>> ConnectionManagerId(lsv-289.rfiserve.net,56727)
>> 14/08/25 19:04:32 INFO ConnectionManager: Key not valid ?
>> sun.nio.ch.SelectionKeyImpl@37c8b85a
>> 14/08/25 19:04:32 INFO ConnectionManager: key already cancelled ?
>> sun.nio.ch.SelectionKeyImpl@37c8b85a
>> java.nio.channels.CancelledKeyException
>>          at
>> org.apache.spark.network.ConnectionManager.run(
>> ConnectionManager.scala:287)
>>          at
>> org.apache.spark.network.ConnectionManager$$anon$4.run(
>> ConnectionManager.scala:116)
>> 14/08/25 19:04:32 INFO ConnectionManager: Removing SendingConnection to
>> ConnectionManagerId(lsv-668.rfiserve.net,41913)
>> 14/08/25 19:04:32 INFO ConnectionManager: Removing ReceivingConnection to
>> ConnectionManagerId(lsv-668.rfiserve.net,41913)
>> 14/08/25 19:04:32 INFO ConnectionManager: Key not valid ?
>> sun.nio.ch.SelectionKeyImpl@fcea3a4
>> 14/08/25 19:04:32 ERROR ConnectionManager: Corresponding
>> SendingConnectionManagerId not found
>> 14/08/25 19:04:32 INFO ConnectionManager: key already cancelled ?
>> sun.nio.ch.SelectionKeyImpl@fcea3a4
>>
>>
>> Best
>> Shengzhe
>>
>>
>

Reply via email to