Re: Re: spark+kafka+dynamic resource allocation

2023-01-28 Thread Lingzhe Sun
Thank you for the response. But the reference does not seem to be answering any 
of those questions.

BS
Lingzhe Sun
 
From: ashok34...@yahoo.com
Date: 2023-01-29 04:01
To: User; Lingzhe Sun
Subject: Re: spark+kafka+dynamic resource allocation
Hi,

Worth checking this link

https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation

On Saturday, 28 January 2023 at 06:18:28 GMT, Lingzhe Sun 
 wrote: 


Hi all,

I'm wondering if dynamic resource allocation works in spark+kafka streaming 
applications. Here're some questions:
Will structured streaming be supported?
Is the number of consumers always equal to the number of the partitions of 
subscribed topic (let's say there's only one topic)?
If consumers is evenly distributed across executors, will newly added 
executor(through dynamic resource allocation) trigger a consumer reassignment?
Would it be simply a bad idea to use dynamic resource allocation in streaming 
app, because there's no way to scale down number of executors unless no data is 
coming in?
Any thoughts are welcomed.

Lingzhe Sun 
Hirain Technology


Fwd: Spark-submit doesn't load all app classes in the classpath

2023-01-28 Thread Soheil Pourbafrani
Hello all,

I'm using Oozie to manage a Spark application on YARN cluster, in
yarn-cluster mode.

Recently I made some changes to the application in which the Hikari lib was
involved. Surprisingly when I started the job, I got ClassNotFound
exception for the Hikari classes. I'm passing a shade jar file that
contains all dependencies, and I was thinking how come the Hikari class is
available in the Shade jar but missing in the driver's classpath?
Surprisingly the issue was fixed after adding the parameter
--driver-class-path with the shade jar file as its value.

Can anyone help me to figure out why only after adding this parameter to
the spark-submit command, the Hikari classes were loaded in the classpath?

Thanks


Re: spark+kafka+dynamic resource allocation

2023-01-28 Thread ashok34...@yahoo.com.INVALID
 Hi,
Worth checking this link
https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation

On Saturday, 28 January 2023 at 06:18:28 GMT, Lingzhe Sun 
 wrote:  
 
 #yiv9684413148 body {line-height:1.5;}#yiv9684413148 ol, #yiv9684413148 ul 
{margin-top:0px;margin-bottom:0px;list-style-position:inside;}#yiv9684413148 
body {font-size:10.5pt;font-family:'Microsoft YaHei UI';color:rgb(0, 0, 
0);line-height:1.5;}#yiv9684413148 body 
{font-size:10.5pt;font-family:'Microsoft YaHei UI';color:rgb(0, 0, 
0);line-height:1.5;}Hi all,
I'm wondering if dynamic resource allocation works in spark+kafka streaming 
applications. Here're some questions:   
   - Will structured streaming be supported?
   - Is the number of consumers always equal to the number of the partitions of 
subscribed topic (let's say there's only one topic)?
   - If consumers is evenly distributed across executors, will newly added 
executor(through dynamic resource allocation) trigger a consumer reassignment?
   - Would it be simply a bad idea to use dynamic resource allocation in 
streaming app, because there's no way to scale down number of executors unless 
no data is coming in?
Any thoughts are welcomed.
Lingzhe SunHirain Technology  

Re: Spark SQL question

2023-01-28 Thread Bjørn Jørgensen
Hi Mich.
This is a Spark user group mailing list where people can ask *any*
questions about spark.
You know SQL and streaming, but I don't think it's necessary to start a
replay with "*LOL*" to the question that's being asked.
No questions are to stupid to be asked.


lør. 28. jan. 2023 kl. 09:22 skrev Mich Talebzadeh <
mich.talebza...@gmail.com>:

> LOL
>
> First one
>
> spark-sql> select 1 as `data.group` from abc group by data.group;
> 1
> Time taken: 0.198 seconds, Fetched 1 row(s)
>
> means that are assigning alias data.group to select and you are using that
> alias -> data.group in your group by statement
>
>
> This is equivalent to
>
>
> spark-sql> select 1 as `data.group` from abc group by 1;
>
> 1
>
> With regard to your second sql
>
>
> select 1 as *`data.group`* from tbl group by `*data.group`;*
>
>
> *will throw an error *
>
>
> *spark-sql> select 1 as `data.group` from abc group by `data.group`;*
>
> *Error in query: cannot resolve '`data.group`' given input columns:
> [spark_catalog.elayer.abc.keyword, spark_catalog.elayer.abc.occurence];
> line 1 pos 43;*
>
> *'Aggregate ['`data.group`], [1 AS data.group#225]*
>
> *+- SubqueryAlias spark_catalog.elayer.abc*
>
> *   +- HiveTableRelation [`elayer`.`abc`,
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols:
> [keyword#226, occurence#227L], Partition Cols: []]*
>
> `data.group` with quotes is neither the name of the column or its alias
>
>
> *HTH*
>
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 27 Jan 2023 at 23:36, Kohki Nishio  wrote:
>
>> this SQL works
>>
>> select 1 as *`data.group`* from tbl group by *data.group*
>>
>>
>> Since there's no such field as *data,* I thought the SQL has to look
>> like this
>>
>> select 1 as *`data.group`* from tbl group by `*data.group`*
>>
>>
>>  But that gives and error (cannot resolve '`data.group`') ... I'm no
>> expert in SQL, but feel like it's a strange behavior... does anybody have a
>> good explanation for it ?
>>
>> Thanks
>>
>> --
>> Kohki Nishio
>>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297


Re: Spark SQL question

2023-01-28 Thread Mich Talebzadeh
LOL

First one

spark-sql> select 1 as `data.group` from abc group by data.group;
1
Time taken: 0.198 seconds, Fetched 1 row(s)

means that are assigning alias data.group to select and you are using that
alias -> data.group in your group by statement


This is equivalent to


spark-sql> select 1 as `data.group` from abc group by 1;

1

With regard to your second sql


select 1 as *`data.group`* from tbl group by `*data.group`;*


*will throw an error *


*spark-sql> select 1 as `data.group` from abc group by `data.group`;*

*Error in query: cannot resolve '`data.group`' given input columns:
[spark_catalog.elayer.abc.keyword, spark_catalog.elayer.abc.occurence];
line 1 pos 43;*

*'Aggregate ['`data.group`], [1 AS data.group#225]*

*+- SubqueryAlias spark_catalog.elayer.abc*

*   +- HiveTableRelation [`elayer`.`abc`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols:
[keyword#226, occurence#227L], Partition Cols: []]*

`data.group` with quotes is neither the name of the column or its alias


*HTH*



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 27 Jan 2023 at 23:36, Kohki Nishio  wrote:

> this SQL works
>
> select 1 as *`data.group`* from tbl group by *data.group*
>
>
> Since there's no such field as *data,* I thought the SQL has to look like
> this
>
> select 1 as *`data.group`* from tbl group by `*data.group`*
>
>
>  But that gives and error (cannot resolve '`data.group`') ... I'm no
> expert in SQL, but feel like it's a strange behavior... does anybody have a
> good explanation for it ?
>
> Thanks
>
> --
> Kohki Nishio
>