date:20190521

RE: RDD object Out of scope.

2019-05-21 Thread Nasrulla Khan Haris

I am trying to find the code that cleans up uncached RDD. Thanks, Nasrulla From: Charoes Sent: Tuesday, May 21, 2019 5:10 PM To: Nasrulla Khan Haris Cc: Wenchen Fan ; dev@spark.apache.org Subject: Re: RDD object Out of scope. If you cached a RDD and hold a reference of that RDD in your code,

https://github.com/google/zetasql

2019-05-21 Thread kant kodali

https://github.com/google/zetasql

Re: RDD object Out of scope.

2019-05-21 Thread Charoes

If you cached a RDD and hold a reference of that RDD in your code, then your RDD will NOT be cleaned up. There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the reference of RDD, Broadcast, and Accumulator etc. On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris wrote: >

RE: RDD object Out of scope.

2019-05-21 Thread Nasrulla Khan Haris

Thanks Sean, that makes sense. Regards, Nasrulla -Original Message- From: Sean Owen Sent: Tuesday, May 21, 2019 6:24 PM To: Nasrulla Khan Haris Cc: dev@spark.apache.org Subject: Re: RDD object Out of scope. I'm not clear what you're asking. An RDD itself is just an object in the

Re: What's the root cause of not supporting multiple aggregations in structured streaming?

2019-05-21 Thread 张万新

Thanks, I'll check it out. Arun Mahadevan 于 2019年5月21日周二 01:31写道： > Heres the proposal for supporting it in "append" mode - > https://github.com/apache/spark/pull/23576. You could see if it addresses > your requirement and post your feedback in the PR. > For "update" mode its going to be much

Re: RDD object Out of scope.

2019-05-21 Thread Sean Owen

I'm not clear what you're asking. An RDD itself is just an object in the JVM. It will be garbage collected if there are no references. What else would there be to clean up in your case? ContextCleaner handles cleaned up of persisted RDDs, etc. On Tue, May 21, 2019 at 7:39 PM Nasrulla Khan Haris

Re: Access to live data of cached dataFrame

2019-05-21 Thread Wenchen Fan

When you cache a dataframe, you actually cache a logical plan. That's why re-creating the dataframe doesn't work: Spark finds out the logical plan is cached and picks the cached data. You need to uncache the dataframe, or go back to the SQL way: df.createTempView("abc") spark.table("abc").cache()

Re: RDD object Out of scope.

2019-05-21 Thread Wenchen Fan

RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean up the RDD. On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris wrote: > HI Spark developers, > > > > Can someone point out the code where RDD objects go out of scope ?. I > found the contextcleaner >

RE: RDD object Out of scope.

2019-05-21 Thread Nasrulla Khan Haris

Thanks for reply Wenchen, I am curious as what happens when RDD goes out of scope when it is not cached. Nasrulla From: Wenchen Fan Sent: Tuesday, May 21, 2019 6:28 AM To: Nasrulla Khan Haris Cc: dev@spark.apache.org Subject: Re: RDD object Out of scope. RDD is kind of a pointer to the

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-21 Thread Sean Owen

Tough one. Yes it's because Hive is still 'included' with the no-Hadoop build. I think the avro scope is on purpose in that it's meant to use the version in the larger Hadoop installation it will run on. But, I suspect you'll find 1.7 doesn't work. Yes, there's a rat's nest of compatibility

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-21 Thread Michael Heuer

The scopes for avro-1.8.2.jar and avro-mapred-1.8.2-hadoop2.jar are different org.apache.avro avro ${avro.version} ${hadoop.deps.scope} ... org.apache.avro avro-mapred ${avro.version} ${avro.mapred.classifier} ${hive.deps.scope} What needs to be done then? At a minimum,

DataSourceV2Reader Q

2019-05-21 Thread Andrew Melo

Hello, I'm developing a DataSourceV2 reader for the ROOT (https://root.cern/) file format to replace a previous DSV1 source that was in use before. I have a bare skeleton of the reader, which can properly load the files and pass their schema into Spark 2.4.3, but any operation on the resulting

Re: DataSourceV2Reader Q

2019-05-21 Thread Ryan Blue

Are you sure that your schema conversion is correct? If you're running with a recent Spark version, then that line is probably `name.hashCode()`. That file was last updated 6 months ago so I think it is likely that `name` is the null in your version. On Tue, May 21, 2019 at 11:39 AM Andrew Melo

Re: DataSourceV2Reader Q

2019-05-21 Thread Andrew Melo

Hi Ryan, On Tue, May 21, 2019 at 2:48 PM Ryan Blue wrote: > > Are you sure that your schema conversion is correct? If you're running with a > recent Spark version, then that line is probably `name.hashCode()`. That file > was last updated 6 months ago so I think it is likely that `name` is the

RE: RDD object Out of scope.

https://github.com/google/zetasql

Re: RDD object Out of scope.

RE: RDD object Out of scope.

Re: What's the root cause of not supporting multiple aggregations in structured streaming?

Re: RDD object Out of scope.

Re: Access to live data of cached dataFrame

Re: RDD object Out of scope.

RE: RDD object Out of scope.

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

DataSourceV2Reader Q

Re: DataSourceV2Reader Q

Re: DataSourceV2Reader Q

14 matches

Site Navigation

Mail list logo

Footer information