[jira] [Updated] (SPARK-42280) add spark.yarn.archive/jars similar option for spark on K8S

2023-04-08 Thread Leibniz Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leibniz Hu updated SPARK-42280:
---
Description: 
For spark on yarn,  there are `spark.yarn.archive` and `spark.yarn.jars` to 
distribute spark runtime jars before driver/executor starts up.

 

I'd like to propose similar functionality for spark on K8S. The benefits are:
 # accelerating workloads migration from yarn to K8S which use the above feature
 # explore new version of spark more easily without to rebuild the spark image
 # currently, there's really no other way to add additional/extension jars to 
executors on k8s before startup . 

  was:
For spark on yarn,  there are `spark.yarn.archive` and `spark.yarn.jars` to 
distribute spark runtime jars before driver/executor starts up.

 

I'd like to propose similar functionality for spark on K8S. The benefits are:
 # accelerating workloads migration from yarn to K8S which use the above feature
 # explore new version of spark more easily without to rebuild the spark image
 # currently, there's really no other way to add additional/extension jars to 
executors on k8s before startup.


> add spark.yarn.archive/jars similar option for spark on K8S
> ---
>
> Key: SPARK-42280
> URL: https://issues.apache.org/jira/browse/SPARK-42280
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.2, 3.3.1
>Reporter: YE
>Priority: Major
>
> For spark on yarn,  there are `spark.yarn.archive` and `spark.yarn.jars` to 
> distribute spark runtime jars before driver/executor starts up.
>  
> I'd like to propose similar functionality for spark on K8S. The benefits are:
>  # accelerating workloads migration from yarn to K8S which use the above 
> feature
>  # explore new version of spark more easily without to rebuild the spark image
>  # currently, there's really no other way to add additional/extension jars to 
> executors on k8s before startup . 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43017) Connect to multiple hive metastore using single sparkcontext (i.e without stopping it")

2023-04-07 Thread Leibniz Hu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709833#comment-17709833
 ] 

Leibniz Hu edited comment on SPARK-43017 at 4/8/23 6:44 AM:


[~pravin1406]  :

my blog maybe helpful for you, but it's written in chinese:

[https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/]
 

it's about how to load hive configuration dynamically after SparkContext has 
been created ( without adding hive-site.xml when spark-submit )

you can read the code block part.


was (Author: JIRAUSER299406):
[~pravin1406]  my blog maybe helpful for you, but it's written in chinese:

[https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/]
 

it's about how to load hive configuration dynamically after SparkContext has 
been created ( without adding hive-site.xml when spark-submit )

you can read the code block part.

> Connect to multiple hive metastore using single sparkcontext (i.e without 
> stopping it")
> ---
>
> Key: SPARK-43017
> URL: https://issues.apache.org/jira/browse/SPARK-43017
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Spark Submit
>Affects Versions: 3.2.0
>Reporter: Pravin Rathore
>Priority: Critical
>
> I wanted to join 2 hive tables which are stored in 2 different hive 
> metastores.
> I would like fetch the first table, create a tempview. Change hive metastore 
> uri, connect to that metastore and fetch the second table , create a tempview 
> and then use tempviews to do further. 
> I launched 2 different instances of hive on my local , individually i'm able 
> to connect to both, but when i try to connect to them one after other, the 
> second time the old URI's is getting used.
> Any workaround this.
> Please help out :\ 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43017) Connect to multiple hive metastore using single sparkcontext (i.e without stopping it")

2023-04-07 Thread Leibniz Hu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709833#comment-17709833
 ] 

Leibniz Hu edited comment on SPARK-43017 at 4/8/23 6:44 AM:


[~pravin1406]  my blog maybe helpful for you, but it's written in chinese:

[https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/]
 

it's about how to load hive configuration dynamically after SparkContext has 
been created ( without adding hive-site.xml when spark-submit )

you can read the code block part.


was (Author: JIRAUSER299406):
my blog maybe helpful for you, but it's written in chinese:

[https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/]
 

it's about how to load hive configuration dynamically after SparkContext has 
been created ( without adding hive-site.xml when spark-submit )

you can read the code block part.

> Connect to multiple hive metastore using single sparkcontext (i.e without 
> stopping it")
> ---
>
> Key: SPARK-43017
> URL: https://issues.apache.org/jira/browse/SPARK-43017
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Spark Submit
>Affects Versions: 3.2.0
>Reporter: Pravin Rathore
>Priority: Critical
>
> I wanted to join 2 hive tables which are stored in 2 different hive 
> metastores.
> I would like fetch the first table, create a tempview. Change hive metastore 
> uri, connect to that metastore and fetch the second table , create a tempview 
> and then use tempviews to do further. 
> I launched 2 different instances of hive on my local , individually i'm able 
> to connect to both, but when i try to connect to them one after other, the 
> second time the old URI's is getting used.
> Any workaround this.
> Please help out :\ 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43017) Connect to multiple hive metastore using single sparkcontext (i.e without stopping it")

2023-04-07 Thread Leibniz Hu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709833#comment-17709833
 ] 

Leibniz Hu edited comment on SPARK-43017 at 4/8/23 6:43 AM:


my blog maybe helpful for you, but it's written in chinese:

[https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/]
 

it's about how to load hive configuration dynamically after SparkContext has 
been created ( without adding hive-site.xml when spark-submit )

you can read the code block part.


was (Author: JIRAUSER299406):
my blog maybe helpful for you, but it's written in chinese:

[https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/]
 

 

you can read the code block part.

> Connect to multiple hive metastore using single sparkcontext (i.e without 
> stopping it")
> ---
>
> Key: SPARK-43017
> URL: https://issues.apache.org/jira/browse/SPARK-43017
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Spark Submit
>Affects Versions: 3.2.0
>Reporter: Pravin Rathore
>Priority: Critical
>
> I wanted to join 2 hive tables which are stored in 2 different hive 
> metastores.
> I would like fetch the first table, create a tempview. Change hive metastore 
> uri, connect to that metastore and fetch the second table , create a tempview 
> and then use tempviews to do further. 
> I launched 2 different instances of hive on my local , individually i'm able 
> to connect to both, but when i try to connect to them one after other, the 
> second time the old URI's is getting used.
> Any workaround this.
> Please help out :\ 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-43017) Connect to multiple hive metastore using single sparkcontext (i.e without stopping it")

2023-04-07 Thread Leibniz Hu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709833#comment-17709833
 ] 

Leibniz Hu edited comment on SPARK-43017 at 4/8/23 6:40 AM:


my blog maybe helpful for you, but it's written in chinese:

[https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/]
 

 

you can read the code block part.


was (Author: JIRAUSER299406):
my blog maybe helpful for you, but it's written in chinese:

[https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/]
 

> Connect to multiple hive metastore using single sparkcontext (i.e without 
> stopping it")
> ---
>
> Key: SPARK-43017
> URL: https://issues.apache.org/jira/browse/SPARK-43017
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Spark Submit
>Affects Versions: 3.2.0
>Reporter: Pravin Rathore
>Priority: Critical
>
> I wanted to join 2 hive tables which are stored in 2 different hive 
> metastores.
> I would like fetch the first table, create a tempview. Change hive metastore 
> uri, connect to that metastore and fetch the second table , create a tempview 
> and then use tempviews to do further. 
> I launched 2 different instances of hive on my local , individually i'm able 
> to connect to both, but when i try to connect to them one after other, the 
> second time the old URI's is getting used.
> Any workaround this.
> Please help out :\ 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43017) Connect to multiple hive metastore using single sparkcontext (i.e without stopping it")

2023-04-07 Thread Leibniz Hu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709833#comment-17709833
 ] 

Leibniz Hu commented on SPARK-43017:


my blog maybe helpful for you, but it's written in chinese:

[https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/]
 

> Connect to multiple hive metastore using single sparkcontext (i.e without 
> stopping it")
> ---
>
> Key: SPARK-43017
> URL: https://issues.apache.org/jira/browse/SPARK-43017
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Spark Submit
>Affects Versions: 3.2.0
>Reporter: Pravin Rathore
>Priority: Critical
>
> I wanted to join 2 hive tables which are stored in 2 different hive 
> metastores.
> I would like fetch the first table, create a tempview. Change hive metastore 
> uri, connect to that metastore and fetch the second table , create a tempview 
> and then use tempviews to do further. 
> I launched 2 different instances of hive on my local , individually i'm able 
> to connect to both, but when i try to connect to them one after other, the 
> second time the old URI's is getting used.
> Any workaround this.
> Please help out :\ 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42840) Assign a name to the error class _LEGACY_ERROR_TEMP_2004

2023-04-01 Thread Leibniz Hu (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42840 ]


Leibniz Hu deleted comment on SPARK-42840:


was (Author: JIRAUSER299406):
[~maxgekk]  https://github.com/apache/spark/pull/40634

> Assign a name to the error class _LEGACY_ERROR_TEMP_2004
> 
>
> Key: SPARK-42840
> URL: https://issues.apache.org/jira/browse/SPARK-42840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42840) Assign a name to the error class _LEGACY_ERROR_TEMP_2004

2023-04-01 Thread Leibniz Hu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707596#comment-17707596
 ] 

Leibniz Hu commented on SPARK-42840:


[~maxgekk]  https://github.com/apache/spark/pull/40634

> Assign a name to the error class _LEGACY_ERROR_TEMP_2004
> 
>
> Key: SPARK-42840
> URL: https://issues.apache.org/jira/browse/SPARK-42840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42840) Assign a name to the error class _LEGACY_ERROR_TEMP_2004

2023-03-26 Thread Leibniz Hu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705149#comment-17705149
 ] 

Leibniz Hu commented on SPARK-42840:


[~maxgekk]  I found this issue from spark dev mail group, and I am trying to 
complete this task. plz assign to me, thx.

> Assign a name to the error class _LEGACY_ERROR_TEMP_2004
> 
>
> Key: SPARK-42840
> URL: https://issues.apache.org/jira/browse/SPARK-42840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org