[jira] [Updated] (SPARK-42280) add spark.yarn.archive/jars similar option for spark on K8S
[ https://issues.apache.org/jira/browse/SPARK-42280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Leibniz Hu updated SPARK-42280: --- Description: For spark on yarn, there are `spark.yarn.archive` and `spark.yarn.jars` to distribute spark runtime jars before driver/executor starts up. I'd like to propose similar functionality for spark on K8S. The benefits are: # accelerating workloads migration from yarn to K8S which use the above feature # explore new version of spark more easily without to rebuild the spark image # currently, there's really no other way to add additional/extension jars to executors on k8s before startup . was: For spark on yarn, there are `spark.yarn.archive` and `spark.yarn.jars` to distribute spark runtime jars before driver/executor starts up. I'd like to propose similar functionality for spark on K8S. The benefits are: # accelerating workloads migration from yarn to K8S which use the above feature # explore new version of spark more easily without to rebuild the spark image # currently, there's really no other way to add additional/extension jars to executors on k8s before startup. > add spark.yarn.archive/jars similar option for spark on K8S > --- > > Key: SPARK-42280 > URL: https://issues.apache.org/jira/browse/SPARK-42280 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.2.2, 3.3.1 >Reporter: YE >Priority: Major > > For spark on yarn, there are `spark.yarn.archive` and `spark.yarn.jars` to > distribute spark runtime jars before driver/executor starts up. > > I'd like to propose similar functionality for spark on K8S. The benefits are: > # accelerating workloads migration from yarn to K8S which use the above > feature > # explore new version of spark more easily without to rebuild the spark image > # currently, there's really no other way to add additional/extension jars to > executors on k8s before startup . -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43017) Connect to multiple hive metastore using single sparkcontext (i.e without stopping it")
[ https://issues.apache.org/jira/browse/SPARK-43017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709833#comment-17709833 ] Leibniz Hu edited comment on SPARK-43017 at 4/8/23 6:44 AM: [~pravin1406] : my blog maybe helpful for you, but it's written in chinese: [https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/] it's about how to load hive configuration dynamically after SparkContext has been created ( without adding hive-site.xml when spark-submit ) you can read the code block part. was (Author: JIRAUSER299406): [~pravin1406] my blog maybe helpful for you, but it's written in chinese: [https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/] it's about how to load hive configuration dynamically after SparkContext has been created ( without adding hive-site.xml when spark-submit ) you can read the code block part. > Connect to multiple hive metastore using single sparkcontext (i.e without > stopping it") > --- > > Key: SPARK-43017 > URL: https://issues.apache.org/jira/browse/SPARK-43017 > Project: Spark > Issue Type: Question > Components: Spark Core, Spark Submit >Affects Versions: 3.2.0 >Reporter: Pravin Rathore >Priority: Critical > > I wanted to join 2 hive tables which are stored in 2 different hive > metastores. > I would like fetch the first table, create a tempview. Change hive metastore > uri, connect to that metastore and fetch the second table , create a tempview > and then use tempviews to do further. > I launched 2 different instances of hive on my local , individually i'm able > to connect to both, but when i try to connect to them one after other, the > second time the old URI's is getting used. > Any workaround this. > Please help out :\ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43017) Connect to multiple hive metastore using single sparkcontext (i.e without stopping it")
[ https://issues.apache.org/jira/browse/SPARK-43017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709833#comment-17709833 ] Leibniz Hu edited comment on SPARK-43017 at 4/8/23 6:44 AM: [~pravin1406] my blog maybe helpful for you, but it's written in chinese: [https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/] it's about how to load hive configuration dynamically after SparkContext has been created ( without adding hive-site.xml when spark-submit ) you can read the code block part. was (Author: JIRAUSER299406): my blog maybe helpful for you, but it's written in chinese: [https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/] it's about how to load hive configuration dynamically after SparkContext has been created ( without adding hive-site.xml when spark-submit ) you can read the code block part. > Connect to multiple hive metastore using single sparkcontext (i.e without > stopping it") > --- > > Key: SPARK-43017 > URL: https://issues.apache.org/jira/browse/SPARK-43017 > Project: Spark > Issue Type: Question > Components: Spark Core, Spark Submit >Affects Versions: 3.2.0 >Reporter: Pravin Rathore >Priority: Critical > > I wanted to join 2 hive tables which are stored in 2 different hive > metastores. > I would like fetch the first table, create a tempview. Change hive metastore > uri, connect to that metastore and fetch the second table , create a tempview > and then use tempviews to do further. > I launched 2 different instances of hive on my local , individually i'm able > to connect to both, but when i try to connect to them one after other, the > second time the old URI's is getting used. > Any workaround this. > Please help out :\ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43017) Connect to multiple hive metastore using single sparkcontext (i.e without stopping it")
[ https://issues.apache.org/jira/browse/SPARK-43017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709833#comment-17709833 ] Leibniz Hu edited comment on SPARK-43017 at 4/8/23 6:43 AM: my blog maybe helpful for you, but it's written in chinese: [https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/] it's about how to load hive configuration dynamically after SparkContext has been created ( without adding hive-site.xml when spark-submit ) you can read the code block part. was (Author: JIRAUSER299406): my blog maybe helpful for you, but it's written in chinese: [https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/] you can read the code block part. > Connect to multiple hive metastore using single sparkcontext (i.e without > stopping it") > --- > > Key: SPARK-43017 > URL: https://issues.apache.org/jira/browse/SPARK-43017 > Project: Spark > Issue Type: Question > Components: Spark Core, Spark Submit >Affects Versions: 3.2.0 >Reporter: Pravin Rathore >Priority: Critical > > I wanted to join 2 hive tables which are stored in 2 different hive > metastores. > I would like fetch the first table, create a tempview. Change hive metastore > uri, connect to that metastore and fetch the second table , create a tempview > and then use tempviews to do further. > I launched 2 different instances of hive on my local , individually i'm able > to connect to both, but when i try to connect to them one after other, the > second time the old URI's is getting used. > Any workaround this. > Please help out :\ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-43017) Connect to multiple hive metastore using single sparkcontext (i.e without stopping it")
[ https://issues.apache.org/jira/browse/SPARK-43017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709833#comment-17709833 ] Leibniz Hu edited comment on SPARK-43017 at 4/8/23 6:40 AM: my blog maybe helpful for you, but it's written in chinese: [https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/] you can read the code block part. was (Author: JIRAUSER299406): my blog maybe helpful for you, but it's written in chinese: [https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/] > Connect to multiple hive metastore using single sparkcontext (i.e without > stopping it") > --- > > Key: SPARK-43017 > URL: https://issues.apache.org/jira/browse/SPARK-43017 > Project: Spark > Issue Type: Question > Components: Spark Core, Spark Submit >Affects Versions: 3.2.0 >Reporter: Pravin Rathore >Priority: Critical > > I wanted to join 2 hive tables which are stored in 2 different hive > metastores. > I would like fetch the first table, create a tempview. Change hive metastore > uri, connect to that metastore and fetch the second table , create a tempview > and then use tempviews to do further. > I launched 2 different instances of hive on my local , individually i'm able > to connect to both, but when i try to connect to them one after other, the > second time the old URI's is getting used. > Any workaround this. > Please help out :\ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43017) Connect to multiple hive metastore using single sparkcontext (i.e without stopping it")
[ https://issues.apache.org/jira/browse/SPARK-43017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17709833#comment-17709833 ] Leibniz Hu commented on SPARK-43017: my blog maybe helpful for you, but it's written in chinese: [https://leibnizhu.github.io/p/Spark%E5%8A%A8%E6%80%81%E5%8A%A0%E8%BD%BDhive%E9%85%8D%E7%BD%AE%E7%9A%84%E6%96%B9%E6%A1%88/] > Connect to multiple hive metastore using single sparkcontext (i.e without > stopping it") > --- > > Key: SPARK-43017 > URL: https://issues.apache.org/jira/browse/SPARK-43017 > Project: Spark > Issue Type: Question > Components: Spark Core, Spark Submit >Affects Versions: 3.2.0 >Reporter: Pravin Rathore >Priority: Critical > > I wanted to join 2 hive tables which are stored in 2 different hive > metastores. > I would like fetch the first table, create a tempview. Change hive metastore > uri, connect to that metastore and fetch the second table , create a tempview > and then use tempviews to do further. > I launched 2 different instances of hive on my local , individually i'm able > to connect to both, but when i try to connect to them one after other, the > second time the old URI's is getting used. > Any workaround this. > Please help out :\ -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-42840) Assign a name to the error class _LEGACY_ERROR_TEMP_2004
[ https://issues.apache.org/jira/browse/SPARK-42840 ] Leibniz Hu deleted comment on SPARK-42840: was (Author: JIRAUSER299406): [~maxgekk] https://github.com/apache/spark/pull/40634 > Assign a name to the error class _LEGACY_ERROR_TEMP_2004 > > > Key: SPARK-42840 > URL: https://issues.apache.org/jira/browse/SPARK-42840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42840) Assign a name to the error class _LEGACY_ERROR_TEMP_2004
[ https://issues.apache.org/jira/browse/SPARK-42840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17707596#comment-17707596 ] Leibniz Hu commented on SPARK-42840: [~maxgekk] https://github.com/apache/spark/pull/40634 > Assign a name to the error class _LEGACY_ERROR_TEMP_2004 > > > Key: SPARK-42840 > URL: https://issues.apache.org/jira/browse/SPARK-42840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42840) Assign a name to the error class _LEGACY_ERROR_TEMP_2004
[ https://issues.apache.org/jira/browse/SPARK-42840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705149#comment-17705149 ] Leibniz Hu commented on SPARK-42840: [~maxgekk] I found this issue from spark dev mail group, and I am trying to complete this task. plz assign to me, thx. > Assign a name to the error class _LEGACY_ERROR_TEMP_2004 > > > Key: SPARK-42840 > URL: https://issues.apache.org/jira/browse/SPARK-42840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org