subject:"Using Spark Context as an attribute of a class cannot be used"

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin

That's an interesting question for which I do not know the answer.
Probably a question for someone with more knowledge of the internals
of the shell interpreter...

On Mon, Nov 24, 2014 at 2:19 PM, aecc  wrote:
> Ok, great, I'm gonna do do it that way, thanks :). However I still don't
> understand why this object should be serialized and shipped?
>
> aaa.s and sc are both the same object org.apache.spark.SparkContext@1f222881
>
> However this :
> aaa.s.parallelize(1 to 10).filter(_ == myNumber).count
>
> Needs to be serialized, and this:
>
> sc.parallelize(1 to 10).filter(_ == myNumber).count
>
> does not.


-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc

Ok, great, I'm gonna do do it that way, thanks :). However I still don't
understand why this object should be serialized and shipped?

aaa.s and sc are both the same object org.apache.spark.SparkContext@1f222881

However this :
aaa.s.parallelize(1 to 10).filter(_ == myNumber).count

Needs to be serialized, and this:

sc.parallelize(1 to 10).filter(_ == myNumber).count

does not.

2014-11-24 23:13 GMT+01:00 Marcelo Vanzin [via Apache Spark User List] <
ml-node+s1001560n19692...@n3.nabble.com>:

> On Mon, Nov 24, 2014 at 1:56 PM, aecc <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19692&i=0>> wrote:
> > I checked sqlContext, they use it in the same way I would like to use my
> > class, they make the class Serializable with transient. Does this
> affects
> > somehow the whole pipeline of data moving? I mean, will I get
> performance
> > issues when doing this because now the class will be Serialized for some
> > reason that I still don't understand?
>
> If you want to do the same thing, your "AAA" needs to be serializable
> and you need to mark all non-serializable fields as "@transient". The
> only performance penalty you'll be paying is the serialization /
> deserialization of the "AAA" instance, which most probably will be
> really small compared to the actual work the task will be doing.
>
> Unless your class is holding a whole lot of data, in which case you
> should start thinking about using a broadcast instead.
>
> --
> Marcelo
>
> -
> To unsubscribe, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=19692&i=1>
> For additional commands, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=19692&i=2>
>
>
>
> ----------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19692.html
>  To unsubscribe from Using Spark Context as an attribute of a class cannot
> be used, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=19668&code=YWxlc3NhbmRyb2FlY2NAZ21haWwuY29tfDE5NjY4fDE2MzQ0ODgyMDU=>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Alessandro Chacón
Aecc_ORG




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19694.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin

On Mon, Nov 24, 2014 at 1:56 PM, aecc  wrote:
> I checked sqlContext, they use it in the same way I would like to use my
> class, they make the class Serializable with transient. Does this affects
> somehow the whole pipeline of data moving? I mean, will I get performance
> issues when doing this because now the class will be Serialized for some
> reason that I still don't understand?

If you want to do the same thing, your "AAA" needs to be serializable
and you need to mark all non-serializable fields as "@transient". The
only performance penalty you'll be paying is the serialization /
deserialization of the "AAA" instance, which most probably will be
really small compared to the actual work the task will be doing.

Unless your class is holding a whole lot of data, in which case you
should start thinking about using a broadcast instead.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc

Yes, I'm running this in the Shell. In my compiled Jar it works perfectly,
the issue is I need to do this on the shell.

Any available workarounds?

I checked sqlContext, they use it in the same way I would like to use my
class, they make the class Serializable with transient. Does this affects
somehow the whole pipeline of data moving? I mean, will I get performance
issues when doing this because now the class will be Serialized for some
reason that I still don't understand?


2014-11-24 22:33 GMT+01:00 Marcelo Vanzin [via Apache Spark User List] <
ml-node+s1001560n19687...@n3.nabble.com>:

> Hello,
>
> On Mon, Nov 24, 2014 at 12:07 PM, aecc <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19687&i=0>> wrote:
> > This is the stacktrace:
> >
> > org.apache.spark.SparkException: Job aborted due to stage failure: Task
> not
> > serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$AAA
> > - field (class "$iwC$$iwC$$iwC$$iwC", name: "aaa", type: "class
> > $iwC$$iwC$$iwC$$iwC$AAA")
>
> Ah. Looks to me that you're trying to run this in spark-shell, right?
>
> I'm not 100% sure of how it works internally, but I think the Scala
> repl works a little differently than regular Scala code in this
> regard. When you declare a "val" in the shell it will behave
> differently than a "val" inside a method in a compiled Scala class -
> the former will behave like an instance variable, the latter like a
> local variable. So, this is probably why you're running into this.
>
> Try compiling your code and running it outside the shell to see how it
> goes. I'm not sure whether there's a workaround for this when trying
> things out in the shell - maybe declare an `object` to hold your
> constants? Never really tried, so YMMV.
>
> --
> Marcelo
>
> -
> To unsubscribe, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=19687&i=1>
> For additional commands, e-mail: [hidden email]
> <http://user/SendEmail.jtp?type=node&node=19687&i=2>
>
>
>
> --------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19687.html
>  To unsubscribe from Using Spark Context as an attribute of a class cannot
> be used, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=19668&code=YWxlc3NhbmRyb2FlY2NAZ21haWwuY29tfDE5NjY4fDE2MzQ0ODgyMDU=>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Alessandro Chacón
Aecc_ORG




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19690.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin

Hello,

On Mon, Nov 24, 2014 at 12:07 PM, aecc  wrote:
> This is the stacktrace:
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task not
> serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$AAA
> - field (class "$iwC$$iwC$$iwC$$iwC", name: "aaa", type: "class
> $iwC$$iwC$$iwC$$iwC$AAA")

Ah. Looks to me that you're trying to run this in spark-shell, right?

I'm not 100% sure of how it works internally, but I think the Scala
repl works a little differently than regular Scala code in this
regard. When you declare a "val" in the shell it will behave
differently than a "val" inside a method in a compiled Scala class -
the former will behave like an instance variable, the latter like a
local variable. So, this is probably why you're running into this.

Try compiling your code and running it outside the shell to see how it
goes. I'm not sure whether there's a workaround for this when trying
things out in the shell - maybe declare an `object` to hold your
constants? Never really tried, so YMMV.

-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc

If I actually instead of using myNumber I use the 5 value, the exception is
not given. E.g:

aaa.s.parallelize(1 to 10).filter(_ == 5).count

Works perfectly



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19680.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc

Marcelo Vanzin wrote
> Do you expect to be able to use the spark context on the remote task?

Not At all, what I want to create is a wrapper of the SparkContext, to be
used only on the driver node.
I would like to have in this "AAA" wrapper several attributes, such as the
SparkContext and other configurations for my project.

I tested using -Dsun.io.serialization.extendedDebugInfo=true

This is the stacktrace:

org.apache.spark.SparkException: Job aborted due to stage failure: Task not
serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$AAA
- field (class "$iwC$$iwC$$iwC$$iwC", name: "aaa", type: "class
$iwC$$iwC$$iwC$$iwC$AAA")
- object (class "$iwC$$iwC$$iwC$$iwC", $iwC$$iwC$$iwC$$iwC@24e57dcb)
- field (class "$iwC$$iwC$$iwC", name: "$iw", type: "class
$iwC$$iwC$$iwC$$iwC")
- object (class "$iwC$$iwC$$iwC", $iwC$$iwC$$iwC@178cc62b)
- field (class "$iwC$$iwC", name: "$iw", type: "class $iwC$$iwC$$iwC")
- object (class "$iwC$$iwC", $iwC$$iwC@1e9f5eeb)
- field (class "$iwC", name: "$iw", type: "class $iwC$$iwC")
- object (class "$iwC", $iwC@37d8e87e)
- field (class "$line18.$read", name: "$iw", type: "class $iwC")
- object (class "$line18.$read", $line18.$read@124551f)
- field (class "$iwC$$iwC$$iwC", name: "$VAL15", type: "class
$line18.$read")
- object (class "$iwC$$iwC$$iwC", $iwC$$iwC$$iwC@2e846e6b)
- field (class "$iwC$$iwC$$iwC$$iwC", name: "$outer", type: "class
$iwC$$iwC$$iwC")
- object (class "$iwC$$iwC$$iwC$$iwC", $iwC$$iwC$$iwC$$iwC@4b31ba1b)
- field (class "$iwC$$iwC$$iwC$$iwC$$anonfun$1", name: "$outer", type:
"class $iwC$$iwC$$iwC$$iwC")
- object (class "$iwC$$iwC$$iwC$$iwC$$anonfun$1", )
- field (class "org.apache.spark.rdd.FilteredRDD", name: "f", type:
"interface scala.Function1")
- root object (class "org.apache.spark.rdd.FilteredRDD", FilteredRDD[3] 
at
filter at :20)
    at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)

I actually don't understand much about this stack trace. If you can help me,
I would appreciate it.

Transient didn't work either

Thanks a lot



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668p19679.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread Marcelo Vanzin

Do you expect to be able to use the spark context on the remote task?

If you do, that won't work. You'll need to rethink what it is you're
trying to do, since SparkContext is not serializable and it doesn't
make sense to make it so. If you don't, you could mark the field as
@transient.

But the two examples you posted shouldn't be creating a reference to
the "aaa" variable in the serialized task. You could use
-Dsun.io.serialization.extendedDebugInfo=true to debug these things.


On Mon, Nov 24, 2014 at 10:15 AM, aecc  wrote:
> Hello guys,
>
> I'm using Spark 1.0.0 and Kryo serialization
> In the Spark Shell, when I create a class that contains as an attribute the
> SparkContext, in this way:
>
> class AAA(val s: SparkContext) { }
> val aaa = new AAA(sc)
>
> and I execute any action using that attribute like:
>
> val myNumber = 5
> aaa.s.textFile("FILE").filter(_ == myNumber.toString).count
> or
> aaa.s.parallelize(1 to 10).filter(_ == myNumber).count
>
> Returns a NonSerializibleException:
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task not
> serializable: java.io.NotSerializableException:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$AAA
> at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
> at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:770)
> at
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:713)
> at
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:697)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1176)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
>
> Any thoughts about how to solve this issue and how can I give a workaround
> to it? I'm actually developing an Api that will need the usage of this
> SparkContext several times in different locations, so I will needed to be
> accessible.
>
> Thanks a lot for the cooperation
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>



-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Using Spark Context as an attribute of a class cannot be used

2014-11-24 Thread aecc

Hello guys,

I'm using Spark 1.0.0 and Kryo serialization
In the Spark Shell, when I create a class that contains as an attribute the
SparkContext, in this way:

class AAA(val s: SparkContext) { }
val aaa = new AAA(sc)

and I execute any action using that attribute like:

val myNumber = 5
aaa.s.textFile("FILE").filter(_ == myNumber.toString).count
or
aaa.s.parallelize(1 to 10).filter(_ == myNumber).count

Returns a NonSerializibleException:

org.apache.spark.SparkException: Job aborted due to stage failure: Task not
serializable: java.io.NotSerializableException:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$AAA
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:770)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:713)
at
org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:697)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1176)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Any thoughts about how to solve this issue and how can I give a workaround
to it? I'm actually developing an Api that will need the usage of this
SparkContext several times in different locations, so I will needed to be
accessible.

Thanks a lot for the cooperation



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-Context-as-an-attribute-of-a-class-cannot-be-used-tp19668.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Using Spark Context as an attribute of a class cannot be used

Re: Using Spark Context as an attribute of a class cannot be used

Re: Using Spark Context as an attribute of a class cannot be used

Re: Using Spark Context as an attribute of a class cannot be used

Re: Using Spark Context as an attribute of a class cannot be used

Re: Using Spark Context as an attribute of a class cannot be used

Re: Using Spark Context as an attribute of a class cannot be used

Re: Using Spark Context as an attribute of a class cannot be used

Using Spark Context as an attribute of a class cannot be used

9 matches

Site Navigation

Mail list logo

Footer information