Re: Why hint does not traverse down subquery alias

2019-06-11 Thread John Zhuge
A meaningful error message will be great!

On Tue, Jun 11, 2019 at 6:15 PM Maryann Xue  wrote:

> BTW, I've actually just done some work on hint error handling, which might
> be helpful to what you mentioned:
>
> https://github.com/apache/spark/pull/24653
>
> On Tue, Jun 11, 2019 at 8:04 PM Maryann Xue 
> wrote:
>
>> I believe in the SQL standard, the original name cannot be accessed once
>> it’s aliased.
>>
>> On Tue, Jun 11, 2019 at 7:54 PM John Zhuge  wrote:
>>
>>> Yeah, it is a touch scenario.
>>>
>>> I actually have much simpler cases:
>>>
>>> 1) select /*+ broadcast(t1) */ * from db.t1 join db.t2 on t1.id = t2.id;
>>> 2) select /*+ broadcast(t1) */ * from db.t1 a1 join db.t2 a2 on a1.id =
>>> a2.id;
>>>
>>> 2) is the same as 1) but with aliases. Many users were surprised that 2)
>>> stopped working.
>>>
>>> Thanks,
>>> John
>>>
>>>
>>> On Tue, Jun 11, 2019 at 4:38 PM Maryann Xue 
>>> wrote:
>>>
 Yes, and for a good reason: the hint relation has exactly the same
 scope with other elements of queries/sub-queries.

 Suppose there's a query like:

 select /*+ broadcast(s) */ from (select a, b from s) t join (select a,
 b from t) s on t1.a = t2.b

 If we allowed the hint resolving to "cross" the scopes, we'd end up
 with a really confusing spec.


 Thanks,
 Maryann

 On Tue, Jun 11, 2019 at 5:26 PM John Zhuge  wrote:

> Hi Reynold and Maryann,
>
> ResolveHints javadoc indicates the traversal does not go past subquery
> alias. Is there any specific reason?
>
> Thanks,
> John Zhuge
>

>>>
>>> --
>>> John Zhuge
>>>
>>

-- 
John Zhuge


Re: Why hint does not traverse down subquery alias

2019-06-11 Thread Maryann Xue
BTW, I've actually just done some work on hint error handling, which might
be helpful to what you mentioned:

https://github.com/apache/spark/pull/24653

On Tue, Jun 11, 2019 at 8:04 PM Maryann Xue 
wrote:

> I believe in the SQL standard, the original name cannot be accessed once
> it’s aliased.
>
> On Tue, Jun 11, 2019 at 7:54 PM John Zhuge  wrote:
>
>> Yeah, it is a touch scenario.
>>
>> I actually have much simpler cases:
>>
>> 1) select /*+ broadcast(t1) */ * from db.t1 join db.t2 on t1.id = t2.id;
>> 2) select /*+ broadcast(t1) */ * from db.t1 a1 join db.t2 a2 on a1.id =
>> a2.id;
>>
>> 2) is the same as 1) but with aliases. Many users were surprised that 2)
>> stopped working.
>>
>> Thanks,
>> John
>>
>>
>> On Tue, Jun 11, 2019 at 4:38 PM Maryann Xue 
>> wrote:
>>
>>> Yes, and for a good reason: the hint relation has exactly the same scope
>>> with other elements of queries/sub-queries.
>>>
>>> Suppose there's a query like:
>>>
>>> select /*+ broadcast(s) */ from (select a, b from s) t join (select a, b
>>> from t) s on t1.a = t2.b
>>>
>>> If we allowed the hint resolving to "cross" the scopes, we'd end up with
>>> a really confusing spec.
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Tue, Jun 11, 2019 at 5:26 PM John Zhuge  wrote:
>>>
 Hi Reynold and Maryann,

 ResolveHints javadoc indicates the traversal does not go past subquery
 alias. Is there any specific reason?

 Thanks,
 John Zhuge

>>>
>>
>> --
>> John Zhuge
>>
>


Re: Why hint does not traverse down subquery alias

2019-06-11 Thread Maryann Xue
I believe in the SQL standard, the original name cannot be accessed once
it’s aliased.

On Tue, Jun 11, 2019 at 7:54 PM John Zhuge  wrote:

> Yeah, it is a touch scenario.
>
> I actually have much simpler cases:
>
> 1) select /*+ broadcast(t1) */ * from db.t1 join db.t2 on t1.id = t2.id;
> 2) select /*+ broadcast(t1) */ * from db.t1 a1 join db.t2 a2 on a1.id =
> a2.id;
>
> 2) is the same as 1) but with aliases. Many users were surprised that 2)
> stopped working.
>
> Thanks,
> John
>
>
> On Tue, Jun 11, 2019 at 4:38 PM Maryann Xue  wrote:
>
>> Yes, and for a good reason: the hint relation has exactly the same scope
>> with other elements of queries/sub-queries.
>>
>> Suppose there's a query like:
>>
>> select /*+ broadcast(s) */ from (select a, b from s) t join (select a, b
>> from t) s on t1.a = t2.b
>>
>> If we allowed the hint resolving to "cross" the scopes, we'd end up with
>> a really confusing spec.
>>
>>
>> Thanks,
>> Maryann
>>
>> On Tue, Jun 11, 2019 at 5:26 PM John Zhuge  wrote:
>>
>>> Hi Reynold and Maryann,
>>>
>>> ResolveHints javadoc indicates the traversal does not go past subquery
>>> alias. Is there any specific reason?
>>>
>>> Thanks,
>>> John Zhuge
>>>
>>
>
> --
> John Zhuge
>


FlatMapGroupsInPandasExec with multiple record batches

2019-06-11 Thread Terry Kim
Hi,

I see the following comment in FlatMapGroupsInPandasExec.scala

:
"It's possible to further split one group into multiple record batches to
reduce the memory footprint on the Java side, this is left as future work."

I checked the JIRA but could not find anything related to this. Is there a
plan to support this scenario?

Thanks,
Terry


Re: Why hint does not traverse down subquery alias

2019-06-11 Thread John Zhuge
Yeah, it is a touch scenario.

I actually have much simpler cases:

1) select /*+ broadcast(t1) */ * from db.t1 join db.t2 on t1.id = t2.id;
2) select /*+ broadcast(t1) */ * from db.t1 a1 join db.t2 a2 on a1.id =
a2.id;

2) is the same as 1) but with aliases. Many users were surprised that 2)
stopped working.

Thanks,
John


On Tue, Jun 11, 2019 at 4:38 PM Maryann Xue  wrote:

> Yes, and for a good reason: the hint relation has exactly the same scope
> with other elements of queries/sub-queries.
>
> Suppose there's a query like:
>
> select /*+ broadcast(s) */ from (select a, b from s) t join (select a, b
> from t) s on t1.a = t2.b
>
> If we allowed the hint resolving to "cross" the scopes, we'd end up with a
> really confusing spec.
>
>
> Thanks,
> Maryann
>
> On Tue, Jun 11, 2019 at 5:26 PM John Zhuge  wrote:
>
>> Hi Reynold and Maryann,
>>
>> ResolveHints javadoc indicates the traversal does not go past subquery
>> alias. Is there any specific reason?
>>
>> Thanks,
>> John Zhuge
>>
>

-- 
John Zhuge


Re: Why hint does not traverse down subquery alias

2019-06-11 Thread Maryann Xue
Yes, and for a good reason: the hint relation has exactly the same scope
with other elements of queries/sub-queries.

Suppose there's a query like:

select /*+ broadcast(s) */ from (select a, b from s) t join (select a, b
from t) s on t1.a = t2.b

If we allowed the hint resolving to "cross" the scopes, we'd end up with a
really confusing spec.


Thanks,
Maryann

On Tue, Jun 11, 2019 at 5:26 PM John Zhuge  wrote:

> Hi Reynold and Maryann,
>
> ResolveHints javadoc indicates the traversal does not go past subquery
> alias. Is there any specific reason?
>
> Thanks,
> John Zhuge
>


RE: Adding Custom finalize method to RDDs.

2019-06-11 Thread Nasrulla Khan Haris
I want to delete some files which I created In my datasource api,  as soon as 
the RDD is cleaned up.

Thanks,
Nasrulla

From: Vinoo Ganesh 
Sent: Monday, June 10, 2019 1:32 PM
To: Nasrulla Khan Haris ; 
dev@spark.apache.org
Subject: Re: Adding Custom finalize method to RDDs.

Generally overriding the finalize() method is an antipattern (it was in fact 
deprecated in java 11  
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Object.html#finalize())
 . What’s the use case here?

From: Nasrulla Khan Haris 
mailto:nasrulla.k...@microsoft.com.INVALID>>
Date: Monday, June 10, 2019 at 15:44
To: "dev@spark.apache.org" 
mailto:dev@spark.apache.org>>
Subject: RE: Adding Custom finalize method to RDDs.

Hello Everyone,
Is there a way  to do it from user-code ?

Thanks,
Nasrulla

From: Nasrulla Khan Haris 
mailto:nasrulla.k...@microsoft.com.INVALID>>
Sent: Sunday, June 9, 2019 5:30 PM
To: dev@spark.apache.org
Subject: Adding Custom finalize method to RDDs.

Hi All,

Is there a way to add custom finalize method to RDD objects to add custom logic 
when RDDs are destructed by JVM ?

Thanks,
Nasrulla



Why hint does not traverse down subquery alias

2019-06-11 Thread John Zhuge
Hi Reynold and Maryann,

ResolveHints javadoc indicates the traversal does not go past subquery
alias. Is there any specific reason?

Thanks,
John Zhuge


Re: [SS] Why EventTimeStatsAccum for event-time watermark not a named accumulator?

2019-06-11 Thread Jacek Laskowski
Hi,

After some thinking about it, I may have found out the reason why not to
expose EventTimeStatsAccum as a named accumulator. The reason is that it's
an internal part of how event-time watermark works and should not be
exposed via web UI as much as if it was part of a Spark app (the web UI is
meant for).

With that being said, I'm wondering why is EventTimeStatsAccum not a SQL
metric then? With that, it'd be in web UI, but just in the physical plan of
a streaming query.

WDYT?

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
The Internals of Spark SQL https://bit.ly/spark-sql-internals
The Internals of Spark Structured Streaming
https://bit.ly/spark-structured-streaming
The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
Follow me at https://twitter.com/jaceklaskowski



On Mon, Jun 10, 2019 at 8:59 PM Jacek Laskowski  wrote:

> Hi,
>
> I'm curious why EventTimeStatsAccum is not a named accumulator (not to
> mention SQLMetric) so the event-time watermark could be monitored in web UI?
>
> I've changed the code for EventTimeWatermarkExec physical operator to
> register EventTimeStatsAccum as a named accumulator and the values are
> properly propagated back to the driver and the web UI. It seems to be
> working fine (and it's just a one-day coding).
>
> It went fairly easy to have a very initial prototype so I'm wondering why
> it's not been included? Has this been considered? Should I file a
> JIRA ticket and send a pull request for review? Please guide as I found it
> very helpful (and surprisingly easy to implement so I'm worried I'm missing
> something important). Thanks.
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://about.me/JacekLaskowski
> The Internals of Spark SQL https://bit.ly/spark-sql-internals
> The Internals of Spark Structured Streaming
> https://bit.ly/spark-structured-streaming
> The Internals of Apache Kafka https://bit.ly/apache-kafka-internals
> Follow me at https://twitter.com/jaceklaskowski
>
>