from:"Tom Seddon"

What is the relationship between reduceByKey and spark.driver.maxResultSize?

2015-12-11 Thread Tom Seddon

I have a job that is running into intermittent errors with  [SparkDriver]
java.lang.OutOfMemoryError: Java heap space.  Before I was getting this
error I was getting errors saying the result size exceed the
spark.driver.maxResultSize.
This does not make any sense to me, as there are no actions in my job that
send data to the driver - just a pull of data from S3, a map and
reduceByKey and then conversion to dataframe and saveAsTable action that
puts the results back on S3.

I've found a few references to reduceByKey and spark.driver.maxResultSize
having some importance, but cannot fathom how this setting could be related.

Would greatly appreciated any advice.

Thanks in advance,

Tom

java.lang.NoSuchMethodError and yarn-client mode

2015-09-09 Thread Tom Seddon

Hi,

I have a problem trying to get a fairly simple app working which makes use
of native avro libraries.  The app runs fine on my local machine and in
yarn-cluster mode, but when I try to run it on EMR yarn-client mode I get
the error below.  I'm aware this is a version problem, as EMR runs an
earlier version of avro, and I am trying to use avro-1.7.7.

What's confusing me a great deal is the fact that this runs fine in
yarn-cluster mode.

What is it about yarn-cluster mode that means the application has access to
the correct version of the avro library?  I need to run in yarn-client mode
as I will be caching data to the driver machine in between batches.  I
think in yarn-cluster mode the driver can run on any machine in the cluster
so this would not work.

Grateful for any advice as I'm really stuck on this.  AWS support are
trying but they don't seem to know why this is happening either!

Just to note, I'm aware of Databricks spark-avro project and have used it.
This is an investigation to see if I can use RDDs instead of dataframes.

java.lang.NoSuchMethodError:
org.apache.avro.Schema$Parser.parse(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/avro/Schema;
at ophan.thrift.event.Event.(Event.java:10)
at SimpleApp$.main(SimpleApp.scala:25)
at SimpleApp.main(SimpleApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Thanks,

Tom

Re: java.lang.NoSuchMethodError and yarn-client mode

2015-09-09 Thread Tom Seddon

Thanks for your reply Aniket.

Ok I've done this and I'm still confused.  Output from running locally
shows:

file:/home/tom/spark-avro/target/scala-2.10/simpleapp.jar
file:/home/tom/spark-1.4.0-bin-hadoop2.4/conf/
file:/home/tom/spark-1.4.0-bin-hadoop2.4/lib/spark-assembly-1.4.0-hadoop2.4.0.jar
file:/home/tom/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar
file:/home/tom/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar
file:/home/tom/spark-1.4.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar
file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/sunjce_provider.jar
file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/zipfs.jar
file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/localedata.jar
file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/dnsns.jar
file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/sunec.jar
file:/usr/lib/jvm/java-7-oracle/jre/lib/ext/sunpkcs11.jar
saving text file...
done!

In yarn-client mode:

file:/home/hadoop/simpleapp.jar
file:/usr/lib/hadoop/hadoop-auth-2.6.0-amzn-0.jar
...
*file:/usr/lib/hadoop-mapreduce/avro-1.7.4.jar*
...

And in yarn-cluster mode:
file:/mnt/yarn/usercache/hadoop/appcache/application_1441787021820_0004/container_1441787021820_0004_01_01/__app__.jar
...
*file:/usr/lib/hadoop/lib/avro-1.7.4.jar*
...
saving text file...
done!

In yarn-cluster mode it doesn't appear to have sight of the fat jar
(simpleapp), but can see avro-1.7.4, but runs fine!

Thanks,

Tom

On Wed, Sep 9, 2015 at 9:49 AM Aniket Bhatnagar <aniket.bhatna...@gmail.com>
wrote:

> Hi Tom
>
> There has to be a difference in classpaths in yarn-client and yarn-cluster
> mode. Perhaps a good starting point would be to print classpath as a first
> thing in SimpleApp.main. It should give clues around why it works in
> yarn-cluster mode.
>
> Thanks,
> Aniket
>
> On Wed, Sep 9, 2015, 2:11 PM Tom Seddon <mr.tom.sed...@gmail.com> wrote:
>
>> Hi,
>>
>> I have a problem trying to get a fairly simple app working which makes
>> use of native avro libraries.  The app runs fine on my local machine and in
>> yarn-cluster mode, but when I try to run it on EMR yarn-client mode I get
>> the error below.  I'm aware this is a version problem, as EMR runs an
>> earlier version of avro, and I am trying to use avro-1.7.7.
>>
>> What's confusing me a great deal is the fact that this runs fine in
>> yarn-cluster mode.
>>
>> What is it about yarn-cluster mode that means the application has access
>> to the correct version of the avro library?  I need to run in yarn-client
>> mode as I will be caching data to the driver machine in between batches.  I
>> think in yarn-cluster mode the driver can run on any machine in the cluster
>> so this would not work.
>>
>> Grateful for any advice as I'm really stuck on this.  AWS support are
>> trying but they don't seem to know why this is happening either!
>>
>> Just to note, I'm aware of Databricks spark-avro project and have used
>> it.  This is an investigation to see if I can use RDDs instead of
>> dataframes.
>>
>> java.lang.NoSuchMethodError:
>> org.apache.avro.Schema$Parser.parse(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/avro/Schema;
>> at ophan.thrift.event.Event.(Event.java:10)
>> at SimpleApp$.main(SimpleApp.scala:25)
>> at SimpleApp.main(SimpleApp.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> Thanks,
>>
>> Tom
>>
>>
>>

Re: SparkSQL DF.explode with Nulls

2015-06-05 Thread Tom Seddon

I figured it out.  Needed a block style map and a check for null.  The case
class is just to name the transformed columns.

case class Component(name: String, loadTimeMs: Long)
avroFile.filter($lazyComponents.components.isNotNull)
  .explode($lazyComponents.components)
{ case Row(lazyComponents: Seq[Row]) = lazyComponents
  .map { x = val name = x.getString(0);
  val loadTimeMs = if (x.isNullAt(1)) 0 else x.getLong(1);
  Component(name, loadTimeMs) } }
  .select('pageViewId, 'name, 'loadTimeMs).take(20).foreach(println)

On Thu, Jun 4, 2015 at 12:05 PM Tom Seddon mr.tom.sed...@gmail.com wrote:

 Hi,

 I've worked out how to use explode on my input avro dataset with the
 following structure
 root
  |-- pageViewId: string (nullable = false)
  |-- components: array (nullable = true)
  ||-- element: struct (containsNull = false)
  |||-- name: string (nullable = false)
  |||-- loadTimeMs: long (nullable = true)


 I'm trying to turn this into this layout with repeated pageViewIds for
 each row of my components:
 root
  |-- pageViewId: string (nullable = false)
  |-- name: string (nullable = false)
  |-- loadTimeMs: long (nullable = true)

 Explode words fine for the first 10 records using this bit of code, but my
 big problem is that loadTimeMs has nulls in it, which I think is causing
 the error.  Any ideas how I can trap those nulls?  Perhaps by converting to
 zeros and then I can deal with them later?  I tried writing a udf which
 just takes the loadTimeMs column and swaps nulls for zeros, but this
 separates the struct and then I don't know how to use explode.

 avroFile.filter($lazyComponents.components.isNotNull)
 .explode($lazyComponents.components)
 { case Row(lazyComponents: Seq[Row]) = lazyComponents
 .map(x = x.getString(0) - x.getLong(1))}
 .select('pageViewId, '_1, '_2)
 .take(10).foreach(println)

 15/06/04 12:01:21 ERROR Executor: Exception in task 0.0 in stage 19.0 (TID
 65)
 java.lang.RuntimeException: Failed to check null bit for primitive long
 value.
 at scala.sys.package$.error(package.scala:27)
 at
 org.apache.spark.sql.catalyst.expressions.GenericRow.getLong(rows.scala:87)
 at
 $line127.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1$$anonfun$apply$1.apply(console:33)
 at
 $line127.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1$$anonfun$apply$1.apply(console:33)
 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
 at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at
 $line127.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(console:33)
 at
 $line127.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(console:33)
 at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:55)
 at
 org.apache.spark.sql.catalyst.expressions.UserDefinedGenerator.eval(generators.scala:89)
 at
 org.apache.spark.sql.execution.Generate$$anonfun$2$$anonfun$apply$1.apply(Generate.scala:71)
 at
 org.apache.spark.sql.execution.Generate$$anonfun$2$$anonfun$apply$1.apply(Generate.scala:70)
 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 at
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
 at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 at scala.collection.AbstractIterator.to(Iterator.scala:1157)
 at
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
 at
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
 at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
 at
 org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:122)
 at
 org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:122)
 at
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
 at
 org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61

SparkSQL DF.explode with Nulls

2015-06-04 Thread Tom Seddon

Hi,

I've worked out how to use explode on my input avro dataset with the
following structure
root
 |-- pageViewId: string (nullable = false)
 |-- components: array (nullable = true)
 ||-- element: struct (containsNull = false)
 |||-- name: string (nullable = false)
 |||-- loadTimeMs: long (nullable = true)


I'm trying to turn this into this layout with repeated pageViewIds for each
row of my components:
root
 |-- pageViewId: string (nullable = false)
 |-- name: string (nullable = false)
 |-- loadTimeMs: long (nullable = true)

Explode words fine for the first 10 records using this bit of code, but my
big problem is that loadTimeMs has nulls in it, which I think is causing
the error.  Any ideas how I can trap those nulls?  Perhaps by converting to
zeros and then I can deal with them later?  I tried writing a udf which
just takes the loadTimeMs column and swaps nulls for zeros, but this
separates the struct and then I don't know how to use explode.

avroFile.filter($lazyComponents.components.isNotNull)
.explode($lazyComponents.components)
{ case Row(lazyComponents: Seq[Row]) = lazyComponents
.map(x = x.getString(0) - x.getLong(1))}
.select('pageViewId, '_1, '_2)
.take(10).foreach(println)

15/06/04 12:01:21 ERROR Executor: Exception in task 0.0 in stage 19.0 (TID
65)
java.lang.RuntimeException: Failed to check null bit for primitive long
value.
at scala.sys.package$.error(package.scala:27)
at
org.apache.spark.sql.catalyst.expressions.GenericRow.getLong(rows.scala:87)
at
$line127.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1$$anonfun$apply$1.apply(console:33)
at
$line127.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1$$anonfun$apply$1.apply(console:33)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at
$line127.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(console:33)
at
$line127.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(console:33)
at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:55)
at
org.apache.spark.sql.catalyst.expressions.UserDefinedGenerator.eval(generators.scala:89)
at
org.apache.spark.sql.execution.Generate$$anonfun$2$$anonfun$apply$1.apply(Generate.scala:71)
at
org.apache.spark.sql.execution.Generate$$anonfun$2$$anonfun$apply$1.apply(Generate.scala:70)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:308)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:122)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:122)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

PySpark saveAsTextFile gzip

2015-01-15 Thread Tom Seddon

Hi,

I've searched but can't seem to find a PySpark example.  How do I write
compressed text file output to S3 using PySpark saveAsTextFile?

Thanks,

Tom

Efficient way to split an input data set into different output files

2014-11-19 Thread Tom Seddon

I'm trying to set up a PySpark ETL job that takes in JSON log files and
spits out fact table files for upload to Redshift.  Is there an efficient
way to send different event types to different outputs without having to
just read the same cached RDD twice?  I have my first RDD which is just a
json parsed version of the input data, and I need to create a flattened
page views dataset off this based on eventType = 'INITIAL', and then a page
events dataset from the same RDD based on eventType  = 'ADDITIONAL'.
Ideally I'd like the output files for both these tables to be written at
the same time, so I'm picturing a function with one input RDD in and two
RDDs out, or a function utilising two CSV writers.  I'm using mapPartitions
at the moment to write to files like this:

def write_records(records):
output = StringIO.StringIO()
writer = vlad.CsvUnicodeWriter(output, dialect='excel')
for record in records:
writer.writerow(record)
return [output.getvalue()]

and I use this in the call to write the file as follows (pageviews and
events get created off the same json parsed RDD by filtering on INITIAL or
ADDITIONAL respectively):

pageviews.mapPartitions(writeRecords).saveAsTextFile('s3n://output/pageviews/')
events.mapPartitions(writeRecords).saveAsTextFile(''s3n://output/events/)

Is there a way to change this so that both are written in the same process?

Re: Broadcast failure with variable size of ~ 500mb with key already cancelled ?

2014-11-11 Thread Tom Seddon

Hi,

Just wondering if anyone has any advice about this issue, as I am
experiencing the same thing.  I'm working with multiple broadcast variables
in PySpark, most of which are small, but one of around 4.5GB, using 10
workers at 31GB memory each and driver with same spec.  It's not running
out of memory as far as I can see, but definitely only happens when I add
the large broadcast.  Would be most grateful for advice.

I tried playing around with the last 3 conf settings below, but no luck:

SparkConf().set(spark.master.memory, 26)
.set(spark.executor.memory, 26)
.set(spark.worker.memory, 26)
.set(spark.driver.memory, 26).
.set(spark.storage.memoryFraction,1)
.set(spark.core.connection.ack.wait.timeout,6000)
.set(spark.akka.frameSize,50)

Thanks,

Tom


On 24 October 2014 12:31, htailor hemant.tai...@live.co.uk wrote:

 Hi All,

 I am relatively new to spark and currently having troubles with
 broadcasting
 large variables ~500mb in size. Th
 e broadcast fails with an error shown below and the memory usage on the
 hosts also blow up.

 Our hardware consists of 8 hosts (1 x 64gb (driver) and 7 x 32gb (workers))
 and we are using Spark 1.1 (Python) via Cloudera CDH 5.2.

 We have managed to replicate the error using a test script shown below. I
 would be interested to know if anyone has seen this before with
 broadcasting
 or know of a fix.

 === ERROR ==

 14/10/24 08:20:04 INFO BlockManager: Found block rdd_11_31 locally
 14/10/24 08:20:08 INFO ConnectionManager: Key not valid ?
 sun.nio.ch.SelectionKeyImpl@fbc6caf
 14/10/24 08:20:08 INFO ConnectionManager: Removing ReceivingConnection to
 ConnectionManagerId(pigeon3.ldn.ebs.io,55316)
 14/10/24 08:20:08 INFO ConnectionManager: Removing SendingConnection to
 ConnectionManagerId(pigeon3.ldn.ebs.io,55316)
 14/10/24 08:20:08 INFO ConnectionManager: Removing SendingConnection to
 ConnectionManagerId(pigeon3.ldn.ebs.io,55316)
 14/10/24 08:20:08 INFO ConnectionManager: key already cancelled ?
 sun.nio.ch.SelectionKeyImpl@fbc6caf
 java.nio.channels.CancelledKeyException
 at
 org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
 at

 org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
 14/10/24 08:20:13 INFO ConnectionManager: Removing ReceivingConnection to
 ConnectionManagerId(pigeon7.ldn.ebs.io,52524)
 14/10/24 08:20:13 INFO ConnectionManager: Removing SendingConnection to
 ConnectionManagerId(pigeon7.ldn.ebs.io,52524)
 14/10/24 08:20:13 INFO ConnectionManager: Removing SendingConnection to
 ConnectionManagerId(pigeon7.ldn.ebs.io,52524)
 14/10/24 08:20:15 INFO ConnectionManager: Removing ReceivingConnection to
 ConnectionManagerId(toppigeon.ldn.ebs.io,34370)
 14/10/24 08:20:15 INFO ConnectionManager: Key not valid ?
 sun.nio.ch.SelectionKeyImpl@3ecfdb7e
 14/10/24 08:20:15 INFO ConnectionManager: Removing SendingConnection to
 ConnectionManagerId(toppigeon.ldn.ebs.io,34370)
 14/10/24 08:20:15 INFO ConnectionManager: Removing SendingConnection to
 ConnectionManagerId(toppigeon.ldn.ebs.io,34370)
 14/10/24 08:20:15 INFO ConnectionManager: key already cancelled ?
 sun.nio.ch.SelectionKeyImpl@3ecfdb7e
 java.nio.channels.CancelledKeyException
 at
 org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:386)
 at

 org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
 14/10/24 08:20:19 INFO ConnectionManager: Removing ReceivingConnection to
 ConnectionManagerId(pigeon8.ldn.ebs.io,48628)
 14/10/24 08:20:19 INFO ConnectionManager: Removing SendingConnection to
 ConnectionManagerId(pigeon8.ldn.ebs.io,48628)
 14/10/24 08:20:19 INFO ConnectionManager: Removing ReceivingConnection to
 ConnectionManagerId(pigeon6.ldn.ebs.io,44996)
 14/10/24 08:20:19 INFO ConnectionManager: Removing SendingConnection to
 ConnectionManagerId(pigeon6.ldn.ebs.io,44996)
 14/10/24 08:20:27 ERROR SendingConnection: Exception while reading
 SendingConnection to ConnectionManagerId(pigeon8.ldn.ebs.io,48628)
 java.nio.channels.ClosedChannelException
 at
 sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
 at
 org.apache.spark.network.SendingConnection.read(Connection.scala:390)
 at

 org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 14/10/24 08:20:27 ERROR SendingConnection: Exception while reading
 SendingConnection to ConnectionManagerId(pigeon6.ldn.ebs.io,44996)
 java.nio.channels.ClosedChannelException
 at
 sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)

Re: ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId

2014-11-11 Thread Tom Seddon

Yes please can you share.  I am getting this error after expanding my
application to include a large broadcast variable. Would be good to know if
it can be fixed with configuration.

On 23 October 2014 18:04, Michael Campbell michael.campb...@gmail.com
wrote:

 Can you list what your fix was so others can benefit?

 On Wed, Oct 22, 2014 at 8:15 PM, arthur.hk.c...@gmail.com 
 arthur.hk.c...@gmail.com wrote:

 Hi,

 I have managed to resolve it because a wrong setting. Please ignore this .

 Regards
 Arthur

 On 23 Oct, 2014, at 5:14 am, arthur.hk.c...@gmail.com 
 arthur.hk.c...@gmail.com wrote:


 14/10/23 05:09:04 WARN ConnectionManager: All connections not cleaned up

What is the relationship between reduceByKey and spark.driver.maxResultSize?

java.lang.NoSuchMethodError and yarn-client mode

Re: java.lang.NoSuchMethodError and yarn-client mode

Re: SparkSQL DF.explode with Nulls

SparkSQL DF.explode with Nulls

PySpark saveAsTextFile gzip

Efficient way to split an input data set into different output files

Re: Broadcast failure with variable size of ~ 500mb with key already cancelled ?

Re: ERROR ConnectionManager: Corresponding SendingConnection to ConnectionManagerId

9 matches

Site Navigation

Mail list logo

Footer information