RE: RDD object Out of scope.

2019-05-21 Thread Nasrulla Khan Haris
Thanks Sean, that makes sense. Regards, Nasrulla -Original Message- From: Sean Owen Sent: Tuesday, May 21, 2019 6:24 PM To: Nasrulla Khan Haris Cc: dev@spark.apache.org Subject: Re: RDD object Out of scope. I'm not clear what you're asking. An RDD itself is just an object in the

Re: What's the root cause of not supporting multiple aggregations in structured streaming?

2019-05-21 Thread 张万新
Thanks, I'll check it out. Arun Mahadevan 于 2019年5月21日周二 01:31写道: > Heres the proposal for supporting it in "append" mode - > https://github.com/apache/spark/pull/23576. You could see if it addresses > your requirement and post your feedback in the PR. > For "update" mode its going to be much

Re: RDD object Out of scope.

2019-05-21 Thread Sean Owen
I'm not clear what you're asking. An RDD itself is just an object in the JVM. It will be garbage collected if there are no references. What else would there be to clean up in your case? ContextCleaner handles cleaned up of persisted RDDs, etc. On Tue, May 21, 2019 at 7:39 PM Nasrulla Khan Haris

https://github.com/google/zetasql

2019-05-21 Thread kant kodali
https://github.com/google/zetasql

RE: RDD object Out of scope.

2019-05-21 Thread Nasrulla Khan Haris
I am trying to find the code that cleans up uncached RDD. Thanks, Nasrulla From: Charoes Sent: Tuesday, May 21, 2019 5:10 PM To: Nasrulla Khan Haris Cc: Wenchen Fan ; dev@spark.apache.org Subject: Re: RDD object Out of scope. If you cached a RDD and hold a reference of that RDD in your code,

Re: RDD object Out of scope.

2019-05-21 Thread Charoes
If you cached a RDD and hold a reference of that RDD in your code, then your RDD will NOT be cleaned up. There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the reference of RDD, Broadcast, and Accumulator etc. On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris wrote: >

Re: DataSourceV2Reader Q

2019-05-21 Thread Andrew Melo
Hi Ryan, On Tue, May 21, 2019 at 2:48 PM Ryan Blue wrote: > > Are you sure that your schema conversion is correct? If you're running with a > recent Spark version, then that line is probably `name.hashCode()`. That file > was last updated 6 months ago so I think it is likely that `name` is the

Re: DataSourceV2Reader Q

2019-05-21 Thread Ryan Blue
Are you sure that your schema conversion is correct? If you're running with a recent Spark version, then that line is probably `name.hashCode()`. That file was last updated 6 months ago so I think it is likely that `name` is the null in your version. On Tue, May 21, 2019 at 11:39 AM Andrew Melo

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-21 Thread Sean Owen
Tough one. Yes it's because Hive is still 'included' with the no-Hadoop build. I think the avro scope is on purpose in that it's meant to use the version in the larger Hadoop installation it will run on. But, I suspect you'll find 1.7 doesn't work. Yes, there's a rat's nest of compatibility

DataSourceV2Reader Q

2019-05-21 Thread Andrew Melo
Hello, I'm developing a DataSourceV2 reader for the ROOT (https://root.cern/) file format to replace a previous DSV1 source that was in use before. I have a bare skeleton of the reader, which can properly load the files and pass their schema into Spark 2.4.3, but any operation on the resulting

Re: Hadoop version(s) compatible with spark-2.4.3-bin-without-hadoop-scala-2.12

2019-05-21 Thread Michael Heuer
The scopes for avro-1.8.2.jar and avro-mapred-1.8.2-hadoop2.jar are different org.apache.avro avro ${avro.version} ${hadoop.deps.scope} ... org.apache.avro avro-mapred ${avro.version} ${avro.mapred.classifier} ${hive.deps.scope} What needs to be done then? At a minimum,

RE: RDD object Out of scope.

2019-05-21 Thread Nasrulla Khan Haris
Thanks for reply Wenchen, I am curious as what happens when RDD goes out of scope when it is not cached. Nasrulla From: Wenchen Fan Sent: Tuesday, May 21, 2019 6:28 AM To: Nasrulla Khan Haris Cc: dev@spark.apache.org Subject: Re: RDD object Out of scope. RDD is kind of a pointer to the

Re: Access to live data of cached dataFrame

2019-05-21 Thread Wenchen Fan
When you cache a dataframe, you actually cache a logical plan. That's why re-creating the dataframe doesn't work: Spark finds out the logical plan is cached and picks the cached data. You need to uncache the dataframe, or go back to the SQL way: df.createTempView("abc") spark.table("abc").cache()

Re: RDD object Out of scope.

2019-05-21 Thread Wenchen Fan
RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean up the RDD. On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris wrote: > HI Spark developers, > > > > Can someone point out the code where RDD objects go out of scope ?. I > found the contextcleaner >