I am trying to find the code that cleans up uncached RDD.
Thanks,
Nasrulla
From: Charoes
Sent: Tuesday, May 21, 2019 5:10 PM
To: Nasrulla Khan Haris
Cc: Wenchen Fan ; dev@spark.apache.org
Subject: Re: RDD object Out of scope.
If you cached a RDD and hold a reference of that RDD in your code,
https://github.com/google/zetasql
If you cached a RDD and hold a reference of that RDD in your code, then
your RDD will NOT be cleaned up.
There is a ReferenceQueue in ContextCleaner, which is used to keep tracking
the reference of RDD, Broadcast, and Accumulator etc.
On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris
wrote:
>
Thanks Sean, that makes sense.
Regards,
Nasrulla
-Original Message-
From: Sean Owen
Sent: Tuesday, May 21, 2019 6:24 PM
To: Nasrulla Khan Haris
Cc: dev@spark.apache.org
Subject: Re: RDD object Out of scope.
I'm not clear what you're asking. An RDD itself is just an object in the
Thanks, I'll check it out.
Arun Mahadevan 于 2019年5月21日周二 01:31写道:
> Heres the proposal for supporting it in "append" mode -
> https://github.com/apache/spark/pull/23576. You could see if it addresses
> your requirement and post your feedback in the PR.
> For "update" mode its going to be much
I'm not clear what you're asking. An RDD itself is just an object in
the JVM. It will be garbage collected if there are no references. What
else would there be to clean up in your case? ContextCleaner handles
cleaned up of persisted RDDs, etc.
On Tue, May 21, 2019 at 7:39 PM Nasrulla Khan Haris
When you cache a dataframe, you actually cache a logical plan. That's why
re-creating the dataframe doesn't work: Spark finds out the logical plan is
cached and picks the cached data.
You need to uncache the dataframe, or go back to the SQL way:
df.createTempView("abc")
spark.table("abc").cache()
RDD is kind of a pointer to the actual data. Unless it's cached, we don't
need to clean up the RDD.
On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris
wrote:
> HI Spark developers,
>
>
>
> Can someone point out the code where RDD objects go out of scope ?. I
> found the contextcleaner
>
Thanks for reply Wenchen, I am curious as what happens when RDD goes out of
scope when it is not cached.
Nasrulla
From: Wenchen Fan
Sent: Tuesday, May 21, 2019 6:28 AM
To: Nasrulla Khan Haris
Cc: dev@spark.apache.org
Subject: Re: RDD object Out of scope.
RDD is kind of a pointer to the
Tough one. Yes it's because Hive is still 'included' with the
no-Hadoop build. I think the avro scope is on purpose in that it's
meant to use the version in the larger Hadoop installation it will run
on. But, I suspect you'll find 1.7 doesn't work. Yes, there's a rat's
nest of compatibility
The scopes for avro-1.8.2.jar and avro-mapred-1.8.2-hadoop2.jar are different
org.apache.avro
avro
${avro.version}
${hadoop.deps.scope}
...
org.apache.avro
avro-mapred
${avro.version}
${avro.mapred.classifier}
${hive.deps.scope}
What needs to be done then? At a minimum,
Hello,
I'm developing a DataSourceV2 reader for the ROOT (https://root.cern/)
file format to replace a previous DSV1 source that was in use before.
I have a bare skeleton of the reader, which can properly load the
files and pass their schema into Spark 2.4.3, but any operation on the
resulting
Are you sure that your schema conversion is correct? If you're running with
a recent Spark version, then that line is probably `name.hashCode()`. That
file was last updated 6 months ago so I think it is likely that `name` is
the null in your version.
On Tue, May 21, 2019 at 11:39 AM Andrew Melo
Hi Ryan,
On Tue, May 21, 2019 at 2:48 PM Ryan Blue wrote:
>
> Are you sure that your schema conversion is correct? If you're running with a
> recent Spark version, then that line is probably `name.hashCode()`. That file
> was last updated 6 months ago so I think it is likely that `name` is the
14 matches
Mail list logo