[jira] [Assigned] (SPARK-33827) Unload State Store asap once it becomes inactive
[ https://issues.apache.org/jira/browse/SPARK-33827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-33827: Assignee: L. C. Hsieh > Unload State Store asap once it becomes inactive > > > Key: SPARK-33827 > URL: https://issues.apache.org/jira/browse/SPARK-33827 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > SS maintains state stores in executors across batches. Due to the nature of > Spark scheduling, a state store might be allocated on another executor in > next batch. The state store in previous batch becomes inactive. > Now we run a maintenance task periodically to unload inactive state stores. > So there will be some delays between a state store becomes inactive and it is > unloaded. > Per the discussion on https://github.com/apache/spark/pull/30770 with > [~kabhwan], I think the preference is to unload inactive state store asap. > However, we can force Spark to always allocate a state store to same > executor, by using task locality configuration. This can reduce the > possibility to have inactive state store. > Normally, I think with locality configuration, we might not able to see > inactive state store generally. There is still chance an executor can be > failed and reallocated, but in this case, inactive state store is also lost > too. So it is not an issue. > So unloading inactive store asap is only useful when we don't use task > locality to force state store locality across batches. > The required change to make driver-executor bi-directional for state store > management looks non-trivial. If we already can reduce possibility of > inactive store, is it still worth making non-trivial here? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33827) Unload State Store asap once it becomes inactive
[ https://issues.apache.org/jira/browse/SPARK-33827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-33827. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30827 [https://github.com/apache/spark/pull/30827] > Unload State Store asap once it becomes inactive > > > Key: SPARK-33827 > URL: https://issues.apache.org/jira/browse/SPARK-33827 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > SS maintains state stores in executors across batches. Due to the nature of > Spark scheduling, a state store might be allocated on another executor in > next batch. The state store in previous batch becomes inactive. > Now we run a maintenance task periodically to unload inactive state stores. > So there will be some delays between a state store becomes inactive and it is > unloaded. > Per the discussion on https://github.com/apache/spark/pull/30770 with > [~kabhwan], I think the preference is to unload inactive state store asap. > However, we can force Spark to always allocate a state store to same > executor, by using task locality configuration. This can reduce the > possibility to have inactive state store. > Normally, I think with locality configuration, we might not able to see > inactive state store generally. There is still chance an executor can be > failed and reallocated, but in this case, inactive state store is also lost > too. So it is not an issue. > So unloading inactive store asap is only useful when we don't use task > locality to force state store locality across batches. > The required change to make driver-executor bi-directional for state store > management looks non-trivial. If we already can reduce possibility of > inactive store, is it still worth making non-trivial here? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31685) Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN expiration issue
[ https://issues.apache.org/jira/browse/SPARK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255446#comment-17255446 ] Dongjoon Hyun commented on SPARK-31685: --- Thank you for pinging me, [~Qin Yao]. cc [~viirya] since he is looking at streaming. > Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN > expiration issue > --- > > Key: SPARK-31685 > URL: https://issues.apache.org/jira/browse/SPARK-31685 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4 > Environment: spark-2.4.4-bin-hadoop2.7 >Reporter: Rajeev Kumar >Priority: Major > > I am facing issue for spark-2.4.4-bin-hadoop2.7. I am using spark structured > streaming with Kafka. Reading the stream from Kafka and saving it to HBase. > I get this error on the driver after 24 hours. > > {code:java} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 6972072 for ) is expired > at org.apache.hadoop.ipc.Client.call(Client.java:1475) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108) > at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:130) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1169) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1165) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at > org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1171) > at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1630) > at > org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.exists(CheckpointFileManager.scala:326) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:142) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply$mcV$sp(MicroBatchExecution.scala:382) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381) > at > org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) > at > org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcZ$sp(MicroBatchExecution.scala:381) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch(MicroBatchExecution.
[jira] [Updated] (SPARK-33920) We cannot pass schema to a createDataFrame function in scala, however we can do this in python.
[ https://issues.apache.org/jira/browse/SPARK-33920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33920: - Component/s: (was: Build) > We cannot pass schema to a createDataFrame function in scala, however we can > do this in python. > --- > > Key: SPARK-33920 > URL: https://issues.apache.org/jira/browse/SPARK-33920 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.1 >Reporter: Abdul Rafay Abdul Rafay >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > ~spark.createDataFrame(data, schema)~ > ~I am able to pass schema as a parameter to a function createDataFrame in > python but cannot pass this in scala for static data.~ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33920) We cannot pass schema to a createDataFrame function in scala, however we can do this in python.
[ https://issues.apache.org/jira/browse/SPARK-33920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33920: - Priority: Major (was: Critical) > We cannot pass schema to a createDataFrame function in scala, however we can > do this in python. > --- > > Key: SPARK-33920 > URL: https://issues.apache.org/jira/browse/SPARK-33920 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.0.1 >Reporter: Abdul Rafay Abdul Rafay >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > ~spark.createDataFrame(data, schema)~ > ~I am able to pass schema as a parameter to a function createDataFrame in > python but cannot pass this in scala for static data.~ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33920) We cannot pass schema to a createDataFrame function in scala, however we can do this in python.
[ https://issues.apache.org/jira/browse/SPARK-33920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33920: - Target Version/s: (was: 3.0.1) > We cannot pass schema to a createDataFrame function in scala, however we can > do this in python. > --- > > Key: SPARK-33920 > URL: https://issues.apache.org/jira/browse/SPARK-33920 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.0.1 >Reporter: Abdul Rafay Abdul Rafay >Priority: Critical > Original Estimate: 168h > Remaining Estimate: 168h > > ~spark.createDataFrame(data, schema)~ > ~I am able to pass schema as a parameter to a function createDataFrame in > python but cannot pass this in scala for static data.~ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33922) Fix error test SparkLauncherSuite.testSparkLauncherGetError
[ https://issues.apache.org/jira/browse/SPARK-33922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255445#comment-17255445 ] Hyukjin Kwon commented on SPARK-33922: -- [~dengziming] this passes in CI. Can you elaborate how you run the tests? > Fix error test SparkLauncherSuite.testSparkLauncherGetError > --- > > Key: SPARK-33922 > URL: https://issues.apache.org/jira/browse/SPARK-33922 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.1 >Reporter: dengziming >Priority: Minor > > org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError get > failed everytime when executing, note that it's not a flaky test because it > failed everytime. > ``` > java.lang.AssertionErrorjava.lang.AssertionError at > org.junit.Assert.fail(Assert.java:87) at > org.junit.Assert.assertTrue(Assert.java:42) at > org.junit.Assert.assertTrue(Assert.java:53) at > org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError > ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33923) Fix some tests with AQE enabled
[ https://issues.apache.org/jira/browse/SPARK-33923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255439#comment-17255439 ] Apache Spark commented on SPARK-33923: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/30941 > Fix some tests with AQE enabled > --- > > Key: SPARK-33923 > URL: https://issues.apache.org/jira/browse/SPARK-33923 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: wuyi >Priority: Major > > e.g., > DataFrameAggregateSuite > DataFrameJoinSuite > JoinSuite > PlannerSuite > BucketedReadSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33923) Fix some tests with AQE enabled
[ https://issues.apache.org/jira/browse/SPARK-33923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255438#comment-17255438 ] Apache Spark commented on SPARK-33923: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/30941 > Fix some tests with AQE enabled > --- > > Key: SPARK-33923 > URL: https://issues.apache.org/jira/browse/SPARK-33923 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: wuyi >Priority: Major > > e.g., > DataFrameAggregateSuite > DataFrameJoinSuite > JoinSuite > PlannerSuite > BucketedReadSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33923) Fix some tests with AQE enabled
[ https://issues.apache.org/jira/browse/SPARK-33923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33923: Assignee: Apache Spark > Fix some tests with AQE enabled > --- > > Key: SPARK-33923 > URL: https://issues.apache.org/jira/browse/SPARK-33923 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > > e.g., > DataFrameAggregateSuite > DataFrameJoinSuite > JoinSuite > PlannerSuite > BucketedReadSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33923) Fix some tests with AQE enabled
[ https://issues.apache.org/jira/browse/SPARK-33923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33923: Assignee: (was: Apache Spark) > Fix some tests with AQE enabled > --- > > Key: SPARK-33923 > URL: https://issues.apache.org/jira/browse/SPARK-33923 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: wuyi >Priority: Major > > e.g., > DataFrameAggregateSuite > DataFrameJoinSuite > JoinSuite > PlannerSuite > BucketedReadSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33923) Fix some tests with AQE enabled
wuyi created SPARK-33923: Summary: Fix some tests with AQE enabled Key: SPARK-33923 URL: https://issues.apache.org/jira/browse/SPARK-33923 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0, 3.2.0 Reporter: wuyi e.g., DataFrameAggregateSuite DataFrameJoinSuite JoinSuite PlannerSuite BucketedReadSuite -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33907) Only prune columns of from_json if parsing options is empty
[ https://issues.apache.org/jira/browse/SPARK-33907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255436#comment-17255436 ] Apache Spark commented on SPARK-33907: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30944 > Only prune columns of from_json if parsing options is empty > --- > > Key: SPARK-33907 > URL: https://issues.apache.org/jira/browse/SPARK-33907 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0, 3.2.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Major > Fix For: 3.1.0 > > > For safety, we should only prune columns from from_json expression if the > parsing option is empty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33914) Describe the structure of unified v1 and v2 tests
[ https://issues.apache.org/jira/browse/SPARK-33914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33914. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30929 [https://github.com/apache/spark/pull/30929] > Describe the structure of unified v1 and v2 tests > - > > Key: SPARK-33914 > URL: https://issues.apache.org/jira/browse/SPARK-33914 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Add comments for unified v1 and v2 tests and describe their structure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33914) Describe the structure of unified v1 and v2 tests
[ https://issues.apache.org/jira/browse/SPARK-33914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33914: --- Assignee: Maxim Gekk > Describe the structure of unified v1 and v2 tests > - > > Key: SPARK-33914 > URL: https://issues.apache.org/jira/browse/SPARK-33914 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Add comments for unified v1 and v2 tests and describe their structure. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33908) Refact SparkSubmitUtils.resolveMavenCoordinates return parameter
[ https://issues.apache.org/jira/browse/SPARK-33908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33908. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30922 [https://github.com/apache/spark/pull/30922] > Refact SparkSubmitUtils.resolveMavenCoordinates return parameter > > > Key: SPARK-33908 > URL: https://issues.apache.org/jira/browse/SPARK-33908 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > > Per talk in https://github.com/apache/spark/pull/29966#discussion_r531917374 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33908) Refact SparkSubmitUtils.resolveMavenCoordinates return parameter
[ https://issues.apache.org/jira/browse/SPARK-33908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33908: Assignee: angerszhu > Refact SparkSubmitUtils.resolveMavenCoordinates return parameter > > > Key: SPARK-33908 > URL: https://issues.apache.org/jira/browse/SPARK-33908 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > > Per talk in https://github.com/apache/spark/pull/29966#discussion_r531917374 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33901) Char and Varchar display error after DDLs
[ https://issues.apache.org/jira/browse/SPARK-33901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33901. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30918 [https://github.com/apache/spark/pull/30918] > Char and Varchar display error after DDLs > - > > Key: SPARK-33901 > URL: https://issues.apache.org/jira/browse/SPARK-33901 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > Fix For: 3.1.0 > > > CTAS / CREATE TABLE LIKE/ CVAS/ alter table add columns -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31685) Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN expiration issue
[ https://issues.apache.org/jira/browse/SPARK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255420#comment-17255420 ] Kent Yao commented on SPARK-31685: -- Hi, [~rajeevkumar], does this issue still exist in the latest release 3.0.1, or the master branch? If so I guess this should be fixed as soon as possible for the 2.4 LTS version and the coming 3.1.0. Stability for long-running applications is essential. And I guess it is not that hard to fix it. cc [~cloud_fan] [~hyukjin.kwon] [~dongjoon] > Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN > expiration issue > --- > > Key: SPARK-31685 > URL: https://issues.apache.org/jira/browse/SPARK-31685 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.4.4 > Environment: spark-2.4.4-bin-hadoop2.7 >Reporter: Rajeev Kumar >Priority: Major > > I am facing issue for spark-2.4.4-bin-hadoop2.7. I am using spark structured > streaming with Kafka. Reading the stream from Kafka and saving it to HBase. > I get this error on the driver after 24 hours. > > {code:java} > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): > token (HDFS_DELEGATION_TOKEN token 6972072 for ) is expired > at org.apache.hadoop.ipc.Client.call(Client.java:1475) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) > at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108) > at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:130) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1169) > at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1165) > at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) > at > org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1171) > at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1630) > at > org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.exists(CheckpointFileManager.scala:326) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:142) > at > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply$mcV$sp(MicroBatchExecution.scala:382) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381) > at > org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351) > at > org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcZ$sp(MicroBatchExecution.scala:381) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337) > at > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337) > at > org.apache.spark.sql.execution.streami
[jira] [Commented] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255418#comment-17255418 ] Apache Spark commented on SPARK-30789: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/30943 > Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE > -- > > Key: SPARK-30789 > URL: https://issues.apache.org/jira/browse/SPARK-30789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS > | RESPECT NULLS. For example: > {code:java} > LEAD (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > LAG (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > NTH_VALUE (expr, offset) > [ IGNORE NULLS | RESPECT NULLS ] > OVER > ( [ PARTITION BY window_partition ] > [ ORDER BY window_ordering > frame_clause ] ){code} > > *Oracle:* > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] > *Redshift* > [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] > *Presto* > [https://prestodb.io/docs/current/functions/window.html] > *DB2* > [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] > *Teradata* > [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] > *Snowflake* > [https://docs.snowflake.com/en/sql-reference/functions/lead.html] > [https://docs.snowflake.com/en/sql-reference/functions/lag.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-30789: Assignee: Apache Spark > Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE > -- > > Key: SPARK-30789 > URL: https://issues.apache.org/jira/browse/SPARK-30789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS > | RESPECT NULLS. For example: > {code:java} > LEAD (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > LAG (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > NTH_VALUE (expr, offset) > [ IGNORE NULLS | RESPECT NULLS ] > OVER > ( [ PARTITION BY window_partition ] > [ ORDER BY window_ordering > frame_clause ] ){code} > > *Oracle:* > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] > *Redshift* > [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] > *Presto* > [https://prestodb.io/docs/current/functions/window.html] > *DB2* > [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] > *Teradata* > [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] > *Snowflake* > [https://docs.snowflake.com/en/sql-reference/functions/lead.html] > [https://docs.snowflake.com/en/sql-reference/functions/lag.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-30789: Assignee: Apache Spark > Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE > -- > > Key: SPARK-30789 > URL: https://issues.apache.org/jira/browse/SPARK-30789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS > | RESPECT NULLS. For example: > {code:java} > LEAD (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > LAG (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > NTH_VALUE (expr, offset) > [ IGNORE NULLS | RESPECT NULLS ] > OVER > ( [ PARTITION BY window_partition ] > [ ORDER BY window_ordering > frame_clause ] ){code} > > *Oracle:* > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] > *Redshift* > [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] > *Presto* > [https://prestodb.io/docs/current/functions/window.html] > *DB2* > [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] > *Teradata* > [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] > *Snowflake* > [https://docs.snowflake.com/en/sql-reference/functions/lead.html] > [https://docs.snowflake.com/en/sql-reference/functions/lag.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-30789: Assignee: (was: Apache Spark) > Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE > -- > > Key: SPARK-30789 > URL: https://issues.apache.org/jira/browse/SPARK-30789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS > | RESPECT NULLS. For example: > {code:java} > LEAD (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > LAG (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > NTH_VALUE (expr, offset) > [ IGNORE NULLS | RESPECT NULLS ] > OVER > ( [ PARTITION BY window_partition ] > [ ORDER BY window_ordering > frame_clause ] ){code} > > *Oracle:* > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] > *Redshift* > [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] > *Presto* > [https://prestodb.io/docs/current/functions/window.html] > *DB2* > [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] > *Teradata* > [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] > *Snowflake* > [https://docs.snowflake.com/en/sql-reference/functions/lead.html] > [https://docs.snowflake.com/en/sql-reference/functions/lag.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-30789: --- Description: All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS | RESPECT NULLS. For example: {code:java} LEAD (value_expr [, offset ]) [ IGNORE NULLS | RESPECT NULLS ] OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} {code:java} LAG (value_expr [, offset ]) [ IGNORE NULLS | RESPECT NULLS ] OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} {code:java} NTH_VALUE (expr, offset) [ IGNORE NULLS | RESPECT NULLS ] OVER ( [ PARTITION BY window_partition ] [ ORDER BY window_ordering frame_clause ] ){code} *Oracle:* [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] *Redshift* [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] *Presto* [https://prestodb.io/docs/current/functions/window.html] *DB2* [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] *Teradata* [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] *Snowflake* [https://docs.snowflake.com/en/sql-reference/functions/lead.html] [https://docs.snowflake.com/en/sql-reference/functions/lag.html] was: All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE support IGNORE NULLS | RESPECT NULLS. For example: {code:java} LEAD (value_expr [, offset ]) [ IGNORE NULLS | RESPECT NULLS ] OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} {code:java} LAG (value_expr [, offset ]) [ IGNORE NULLS | RESPECT NULLS ] OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} {code:java} NTH_VALUE (expr, offset) [ IGNORE NULLS | RESPECT NULLS ] OVER ( [ PARTITION BY window_partition ] [ ORDER BY window_ordering frame_clause ] ){code} *Oracle:* [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] *Redshift* [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] *Presto* [https://prestodb.io/docs/current/functions/window.html] *DB2* [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] *Teradata* [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] *Snowflake* [https://docs.snowflake.com/en/sql-reference/functions/lead.html] [https://docs.snowflake.com/en/sql-reference/functions/lag.html] > Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE > -- > > Key: SPARK-30789 > URL: https://issues.apache.org/jira/browse/SPARK-30789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS > | RESPECT NULLS. For example: > {code:java} > LEAD (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > LAG (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > NTH_VALUE (expr, offset) > [ IGNORE NULLS | RESPECT NULLS ] > OVER > ( [ PARTITION BY window_partition ] > [ ORDER BY window_ordering > frame_clause ] ){code} > > *Oracle:* > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] > *Redshift* > [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] > *Presto* > [https://prestodb.io/docs/current/functions/window.html] > *DB2* > [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] > *Teradata* > [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] > *Snowflake* > [https://docs.snowflake.com/en/sql-reference/functions/lead.html] > [https://docs.snowflake.com/en/sql-reference/functions/lag.html] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33801) Cleanup "Unicode escapes in triple quoted strings are deprecated" compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-33801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255410#comment-17255410 ] Apache Spark commented on SPARK-33801: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/30926 > Cleanup "Unicode escapes in triple quoted strings are deprecated" compilation > warnings > -- > > Key: SPARK-33801 > URL: https://issues.apache.org/jira/browse/SPARK-33801 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.2.0 > > > There are total 15 compilation warnings about this > {code:java} > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2930: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2931: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2932: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2933: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2934: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2935: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2936: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2937: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala:82: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala:32: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala:79: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala:97: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala:101: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala:76: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala:83: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33801) Cleanup "Unicode escapes in triple quoted strings are deprecated" compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-33801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255409#comment-17255409 ] Hyukjin Kwon commented on SPARK-33801: -- Fixed in https://github.com/apache/spark/pull/30926 > Cleanup "Unicode escapes in triple quoted strings are deprecated" compilation > warnings > -- > > Key: SPARK-33801 > URL: https://issues.apache.org/jira/browse/SPARK-33801 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Priority: Minor > > There are total 15 compilation warnings about this > {code:java} > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2930: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2931: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2932: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2933: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2934: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2935: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2936: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2937: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala:82: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala:32: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala:79: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala:97: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala:101: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala:76: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala:83: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33801) Cleanup "Unicode escapes in triple quoted strings are deprecated" compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-33801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33801. -- Fix Version/s: 3.2.0 Assignee: Yang Jie Resolution: Fixed > Cleanup "Unicode escapes in triple quoted strings are deprecated" compilation > warnings > -- > > Key: SPARK-33801 > URL: https://issues.apache.org/jira/browse/SPARK-33801 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.2.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.2.0 > > > There are total 15 compilation warnings about this > {code:java} > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2930: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2931: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2932: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2933: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2934: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2935: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2936: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/core/src/main/scala/org/apache/spark/util/Utils.scala:2937: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtils.scala:82: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala:32: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/CSVExprUtilsSuite.scala:79: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala:97: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala:101: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala:76: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > [WARNING] > /spark-source/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala:83: > Unicode escapes in triple quoted strings are deprecated, use the literal > character instead > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31168) Upgrade Scala to 2.12.13
[ https://issues.apache.org/jira/browse/SPARK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31168: -- Affects Version/s: (was: 3.1.0) 3.2.0 > Upgrade Scala to 2.12.13 > > > Key: SPARK-31168 > URL: https://issues.apache.org/jira/browse/SPARK-31168 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > h2. Highlights > * Performance improvements in the collections library: algorithmic > improvements and changes to avoid unnecessary allocations ([list of > PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+label%3Alibrary%3Acollections+label%3Aperformance]) > * Performance improvements in the compiler ([list of > PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+-label%3Alibrary%3Acollections+label%3Aperformance+], > minor [effects in our > benchmarks|https://scala-ci.typesafe.com/grafana/dashboard/db/scala-benchmark?orgId=1&from=1567985515850&to=1584355915694&var-branch=2.12.x&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@]) > * Improvements to {{-Yrepl-class-based}}, an alternative internal REPL > encoding that avoids deadlocks (details on > [#8712|https://github.com/scala/scala/pull/8712]) > * A new {{-Yrepl-use-magic-imports}} flag that avoids deep class nesting in > the REPL, which can lead to deteriorating performance in long sessions > ([#8576|https://github.com/scala/scala/pull/8576]) > * Fix some {{toX}} methods that could expose the underlying mutability of a > {{ListBuffer}}-generated collection > ([#8674|https://github.com/scala/scala/pull/8674]) > h3. JDK 9+ support > * ASM was upgraded to 7.3.1, allowing the optimizer to run on JDK 13+ > ([#8676|https://github.com/scala/scala/pull/8676]) > * {{:javap}} in the REPL now works on JDK 9+ > ([#8400|https://github.com/scala/scala/pull/8400]) > h3. Other changes > * Support new labels for creating durations for consistency: > {{Duration("1m")}}, {{Duration("3 hrs")}} > ([#8325|https://github.com/scala/scala/pull/8325], > [#8450|https://github.com/scala/scala/pull/8450]) > * Fix memory leak in runtime reflection's {{TypeTag}} caches > ([#8470|https://github.com/scala/scala/pull/8470]) and some thread safety > issues in runtime reflection > ([#8433|https://github.com/scala/scala/pull/8433]) > * When using compiler plugins, the ordering of compiler phases may change > due to [#8427|https://github.com/scala/scala/pull/8427] > For more details, see [https://github.com/scala/scala/releases/tag/v2.12.11]. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33884) Simplify CaseWhenclauses with (true and false) and (false and true)
[ https://issues.apache.org/jira/browse/SPARK-33884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33884: Description: Simplify CaseWhenclauses with (true and false) and (false and true): ||Expression||After simplify|| |case when cond then true else false end|cond| |case when cond then false else true end|!cond| was: This pr simplify {{CaseWhen}} when only one branch and one clause is null and another is boolean. This simplify similar to SPARK-32721. ||Expression||After simplify|| |case when cond then true else false end|cond| |case when cond then false else true end|!cond| > Simplify CaseWhenclauses with (true and false) and (false and true) > --- > > Key: SPARK-33884 > URL: https://issues.apache.org/jira/browse/SPARK-33884 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > Simplify CaseWhenclauses with (true and false) and (false and true): > ||Expression||After simplify|| > |case when cond then true else false end|cond| > |case when cond then false else true end|!cond| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33884) Simplify CaseWhenclauses with (true and false) and (false and true)
[ https://issues.apache.org/jira/browse/SPARK-33884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33884: Description: This pr simplify {{CaseWhen}} when only one branch and one clause is null and another is boolean. This simplify similar to SPARK-32721. ||Expression||After simplify|| |case when cond then true else false end|cond| |case when cond then false else true end|!cond| was: This pr simplify {{CaseWhen}} when only one branch and one clause is null and another is boolean. This simplify similar to SPARK-32721. ||Expression||After simplify|| |case when cond then null else false end|and(cond, null)| |case when cond then null else true end|or(not(cond), null)| |case when cond then false else null end|and(not(cond), null)| |case when cond then false end|and(not(cond), null)| |case when cond then true else null end|or(cond, null)| |case when cond then true end|or(cond, null)| > Simplify CaseWhenclauses with (true and false) and (false and true) > --- > > Key: SPARK-33884 > URL: https://issues.apache.org/jira/browse/SPARK-33884 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > This pr simplify {{CaseWhen}} when only one branch and one clause is null and > another is boolean. This simplify similar to SPARK-32721. > ||Expression||After simplify|| > |case when cond then true else false end|cond| > |case when cond then false else true end|!cond| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33884) Simplify CaseWhenclauses with (true and false) and (false and true)
[ https://issues.apache.org/jira/browse/SPARK-33884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33884: Summary: Simplify CaseWhenclauses with (true and false) and (false and true) (was: Simplify conditional if all branches are foldable boolean type) > Simplify CaseWhenclauses with (true and false) and (false and true) > --- > > Key: SPARK-33884 > URL: https://issues.apache.org/jira/browse/SPARK-33884 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > This pr simplify {{CaseWhen}} when only one branch and one clause is null and > another is boolean. This simplify similar to SPARK-32721. > ||Expression||After simplify|| > |case when cond then null else false end|and(cond, null)| > |case when cond then null else true end|or(not(cond), null)| > |case when cond then false else null end|and(not(cond), null)| > |case when cond then false end|and(not(cond), null)| > |case when cond then true else null end|or(cond, null)| > |case when cond then true end|or(cond, null)| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33913) Upgrade Kafka to 2.7.0
[ https://issues.apache.org/jira/browse/SPARK-33913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255395#comment-17255395 ] L. C. Hsieh commented on SPARK-33913: - Seems we cannot upgrade to Kafka 2.7.0. Because Kafka core inlines the Scala library, you can not use a different Scala patch version than what Kafka used to compile its jars: https://github.com/embeddedkafka/embedded-kafka/issues/202 Kafka 2.7.0 uses Scala 2.12.12 and currently Spark uses Scala 2.12.10, so there will be {{java.lang.NoClassDefFoundError: scala/math/Ordering$$anon$7}} errors. Due to an issue in Scala 2.12.12, Spark won't upgrade to Scala 2.12.12: https://github.com/scala/bug/issues/12096, and waits for Scala 2.12.13. So, seems for Kafka, Spark needs to wait for next Kafka version which uses Scala 2.12.13 too. > Upgrade Kafka to 2.7.0 > -- > > Key: SPARK-33913 > URL: https://issues.apache.org/jira/browse/SPARK-33913 > Project: Spark > Issue Type: Improvement > Components: Build, DStreams >Affects Versions: 3.2.0 >Reporter: dengziming >Priority: Major > > > The Apache Kafka community has released for Apache Kafka 2.7.0, some features > are useful for example the KAFKA-9893 > configurable TCP connection timeout, more details : > https://downloads.apache.org/kafka/2.7.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-33921) Upgrade Scala version to 2.12.12
[ https://issues.apache.org/jira/browse/SPARK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-33921. - We tried Scala 2.12.12 at SPARK-33168 already and revised SPARK-33168 to target Scala 2.12.13 to avoid Scala compiler bug. > Upgrade Scala version to 2.12.12 > > > Key: SPARK-33921 > URL: https://issues.apache.org/jira/browse/SPARK-33921 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Upgrade Scala 2.12 patch version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33921) Upgrade Scala version to 2.12.12
[ https://issues.apache.org/jira/browse/SPARK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255394#comment-17255394 ] Dongjoon Hyun edited comment on SPARK-33921 at 12/28/20, 5:59 AM: -- We tried Scala 2.12.12 at SPARK-31168 already and revised SPARK-31168 to target Scala 2.12.13 to avoid Scala compiler bug. was (Author: dongjoon): We tried Scala 2.12.12 at SPARK-33168 already and revised SPARK-33168 to target Scala 2.12.13 to avoid Scala compiler bug. > Upgrade Scala version to 2.12.12 > > > Key: SPARK-33921 > URL: https://issues.apache.org/jira/browse/SPARK-33921 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Upgrade Scala 2.12 patch version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33921) Upgrade Scala version to 2.12.12
[ https://issues.apache.org/jira/browse/SPARK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33921. --- Assignee: (was: L. C. Hsieh) Resolution: Duplicate > Upgrade Scala version to 2.12.12 > > > Key: SPARK-33921 > URL: https://issues.apache.org/jira/browse/SPARK-33921 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Priority: Major > > Upgrade Scala 2.12 patch version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33918) UnresolvedView should retain SQL text position
[ https://issues.apache.org/jira/browse/SPARK-33918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33918. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30936 [https://github.com/apache/spark/pull/30936] > UnresolvedView should retain SQL text position > -- > > Key: SPARK-33918 > URL: https://issues.apache.org/jira/browse/SPARK-33918 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.2.0 > > > UnresolvedView should retain SQL text position. The following commands will > be handled: > "DROP VIEW v" > "ALTER VIEW v SET TBLPROPERTIES ('k'='v')" > "ALTER VIEW v UNSET TBLPROPERTIES ('k')" > "ALTER VIEW v AS SELECT 1" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column
[ https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255389#comment-17255389 ] Ted Yu commented on SPARK-33915: Here is the plan prior to predicate pushdown: {code} 2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- Filter (get_json_object(phone#37, $.phone) = 1200) +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [] - Requested Columns: [id,address,phone] {code} Here is the plan with pushdown: {code} 2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0 +- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS phone#33] +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person - Cassandra Filters: [["`GetJsonObject(phone#37,$.phone)`" = ?, 1200]] - Requested Columns: [id,address,phone] {code} > Allow json expression to be pushable column > --- > > Key: SPARK-33915 > URL: https://issues.apache.org/jira/browse/SPARK-33915 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.1 >Reporter: Ted Yu >Priority: Major > > Currently PushableColumnBase provides no support for json / jsonb expression. > Example of json expression: > {code} > get_json_object(phone, '$.code') = '1200' > {code} > If non-string literal is part of the expression, the presence of cast() would > complicate the situation. > Implication is that implementation of SupportsPushDownFilters doesn't have a > chance to perform pushdown even if third party DB engine supports json > expression pushdown. > This issue is for discussion and implementation of Spark core changes which > would allow json expression to be recognized as pushable column. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33918) UnresolvedView should retain SQL text position
[ https://issues.apache.org/jira/browse/SPARK-33918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33918: --- Assignee: Terry Kim > UnresolvedView should retain SQL text position > -- > > Key: SPARK-33918 > URL: https://issues.apache.org/jira/browse/SPARK-33918 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > > UnresolvedView should retain SQL text position. The following commands will > be handled: > "DROP VIEW v" > "ALTER VIEW v SET TBLPROPERTIES ('k'='v')" > "ALTER VIEW v UNSET TBLPROPERTIES ('k')" > "ALTER VIEW v AS SELECT 1" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33921) Upgrade Scala version to 2.12.12
[ https://issues.apache.org/jira/browse/SPARK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255383#comment-17255383 ] Apache Spark commented on SPARK-33921: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30939 > Upgrade Scala version to 2.12.12 > > > Key: SPARK-33921 > URL: https://issues.apache.org/jira/browse/SPARK-33921 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Upgrade Scala 2.12 patch version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33921) Upgrade Scala version to 2.12.12
[ https://issues.apache.org/jira/browse/SPARK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255381#comment-17255381 ] Apache Spark commented on SPARK-33921: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30939 > Upgrade Scala version to 2.12.12 > > > Key: SPARK-33921 > URL: https://issues.apache.org/jira/browse/SPARK-33921 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Upgrade Scala 2.12 patch version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33532) Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method
[ https://issues.apache.org/jira/browse/SPARK-33532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33532. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30484 [https://github.com/apache/spark/pull/30484] > Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method > -- > > Key: SPARK-33532 > URL: https://issues.apache.org/jira/browse/SPARK-33532 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.2.0 > > > There are two places to call the "SpecificParquetRecordReaderBase.initialize( > InputSplit inputSplit, TaskAttemptContext taskAttemptContext > )" method, one is in ParquetFileFormat and the other one is in > ParquetPartitionReaderFactory, > the "inputSplit.rowGroupOffsets" passed in both places are null, it seems > that the branch of "rowgroupoffsets! = null" is useless. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33532) Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method
[ https://issues.apache.org/jira/browse/SPARK-33532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33532: Assignee: Yang Jie > Remove unreachable branch in SpecificParquetRecordReaderBase.initialize method > -- > > Key: SPARK-33532 > URL: https://issues.apache.org/jira/browse/SPARK-33532 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > There are two places to call the "SpecificParquetRecordReaderBase.initialize( > InputSplit inputSplit, TaskAttemptContext taskAttemptContext > )" method, one is in ParquetFileFormat and the other one is in > ParquetPartitionReaderFactory, > the "inputSplit.rowGroupOffsets" passed in both places are null, it seems > that the branch of "rowgroupoffsets! = null" is useless. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24913) Make `AssertTrue` and `AssertNotNull` non-deterministic
[ https://issues.apache.org/jira/browse/SPARK-24913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-24913. -- Resolution: Not A Problem > Make `AssertTrue` and `AssertNotNull` non-deterministic > --- > > Key: SPARK-24913 > URL: https://issues.apache.org/jira/browse/SPARK-24913 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: DB Tsai >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32685) Script transform hive serde default field.delimit is '\t'
[ https://issues.apache.org/jira/browse/SPARK-32685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32685: Assignee: Apache Spark > Script transform hive serde default field.delimit is '\t' > - > > Key: SPARK-32685 > URL: https://issues.apache.org/jira/browse/SPARK-32685 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > > > {code:java} > select split(value, "\t") from ( > SELECT TRANSFORM(a, b, c, null) > USING 'cat' > FROM (select 1 as a, 2 as b, 3 as c) t > ) temp; > result is : > _c0 > ["2","3","\\N"]{code} > > {code:java} > select split(value, "\t") from ( > SELECT TRANSFORM(a, b, c, null) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > USING 'cat' > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( >'serialization.last.column.takes.rest' = 'true' > ) > FROM (select 1 as a, 2 as b, 3 as c) t > ) temp; > result is : > _c0 > ["2","3","\\N"]{code} > > > > {code:java} > select split(value, "\t") from ( > SELECT TRANSFORM(a, b, c, null) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > USING 'cat' > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > FROM (select 1 as a, 2 as b, 3 as c) t > ) temp; > result is : > _c0 > ["2"] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-32685) Script transform hive serde default field.delimit is '\t'
[ https://issues.apache.org/jira/browse/SPARK-32685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-32685: Assignee: (was: Apache Spark) > Script transform hive serde default field.delimit is '\t' > - > > Key: SPARK-32685 > URL: https://issues.apache.org/jira/browse/SPARK-32685 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > > > {code:java} > select split(value, "\t") from ( > SELECT TRANSFORM(a, b, c, null) > USING 'cat' > FROM (select 1 as a, 2 as b, 3 as c) t > ) temp; > result is : > _c0 > ["2","3","\\N"]{code} > > {code:java} > select split(value, "\t") from ( > SELECT TRANSFORM(a, b, c, null) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > USING 'cat' > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( >'serialization.last.column.takes.rest' = 'true' > ) > FROM (select 1 as a, 2 as b, 3 as c) t > ) temp; > result is : > _c0 > ["2","3","\\N"]{code} > > > > {code:java} > select split(value, "\t") from ( > SELECT TRANSFORM(a, b, c, null) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > USING 'cat' > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > FROM (select 1 as a, 2 as b, 3 as c) t > ) temp; > result is : > _c0 > ["2"] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32685) Script transform hive serde default field.delimit is '\t'
[ https://issues.apache.org/jira/browse/SPARK-32685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255365#comment-17255365 ] Apache Spark commented on SPARK-32685: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/30942 > Script transform hive serde default field.delimit is '\t' > - > > Key: SPARK-32685 > URL: https://issues.apache.org/jira/browse/SPARK-32685 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: angerszhu >Priority: Major > > > {code:java} > select split(value, "\t") from ( > SELECT TRANSFORM(a, b, c, null) > USING 'cat' > FROM (select 1 as a, 2 as b, 3 as c) t > ) temp; > result is : > _c0 > ["2","3","\\N"]{code} > > {code:java} > select split(value, "\t") from ( > SELECT TRANSFORM(a, b, c, null) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > USING 'cat' > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > WITH SERDEPROPERTIES ( >'serialization.last.column.takes.rest' = 'true' > ) > FROM (select 1 as a, 2 as b, 3 as c) t > ) temp; > result is : > _c0 > ["2","3","\\N"]{code} > > > > {code:java} > select split(value, "\t") from ( > SELECT TRANSFORM(a, b, c, null) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > USING 'cat' > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' > FROM (select 1 as a, 2 as b, 3 as c) t > ) temp; > result is : > _c0 > ["2"] > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33680) Fix PrunePartitionSuiteBase/BucketedReadWithHiveSupportSuite not to depend on the default conf
[ https://issues.apache.org/jira/browse/SPARK-33680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255364#comment-17255364 ] Apache Spark commented on SPARK-33680: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/30941 > Fix PrunePartitionSuiteBase/BucketedReadWithHiveSupportSuite not to depend on > the default conf > -- > > Key: SPARK-33680 > URL: https://issues.apache.org/jira/browse/SPARK-33680 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33680) Fix PrunePartitionSuiteBase/BucketedReadWithHiveSupportSuite not to depend on the default conf
[ https://issues.apache.org/jira/browse/SPARK-33680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255362#comment-17255362 ] Apache Spark commented on SPARK-33680: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/30941 > Fix PrunePartitionSuiteBase/BucketedReadWithHiveSupportSuite not to depend on > the default conf > -- > > Key: SPARK-33680 > URL: https://issues.apache.org/jira/browse/SPARK-33680 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33922) Fix error test SparkLauncherSuite.testSparkLauncherGetError
dengziming created SPARK-33922: -- Summary: Fix error test SparkLauncherSuite.testSparkLauncherGetError Key: SPARK-33922 URL: https://issues.apache.org/jira/browse/SPARK-33922 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.0.1 Reporter: dengziming org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError get failed everytime when executing, note that it's not a flaky test because it failed everytime. ``` java.lang.AssertionErrorjava.lang.AssertionError at org.junit.Assert.fail(Assert.java:87) at org.junit.Assert.assertTrue(Assert.java:42) at org.junit.Assert.assertTrue(Assert.java:53) at org.apache.spark.launcher.SparkLauncherSuite.testSparkLauncherGetError ``` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33921) Upgrade Scala version to 2.12.12
[ https://issues.apache.org/jira/browse/SPARK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255346#comment-17255346 ] Apache Spark commented on SPARK-33921: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30940 > Upgrade Scala version to 2.12.12 > > > Key: SPARK-33921 > URL: https://issues.apache.org/jira/browse/SPARK-33921 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Upgrade Scala 2.12 patch version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33921) Upgrade Scala version to 2.12.12
[ https://issues.apache.org/jira/browse/SPARK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33921: Assignee: Apache Spark (was: L. C. Hsieh) > Upgrade Scala version to 2.12.12 > > > Key: SPARK-33921 > URL: https://issues.apache.org/jira/browse/SPARK-33921 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Major > > Upgrade Scala 2.12 patch version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33921) Upgrade Scala version to 2.12.12
[ https://issues.apache.org/jira/browse/SPARK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33921: Assignee: L. C. Hsieh (was: Apache Spark) > Upgrade Scala version to 2.12.12 > > > Key: SPARK-33921 > URL: https://issues.apache.org/jira/browse/SPARK-33921 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Upgrade Scala 2.12 patch version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33921) Upgrade Scala version to 2.12.12
[ https://issues.apache.org/jira/browse/SPARK-33921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255345#comment-17255345 ] Apache Spark commented on SPARK-33921: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30940 > Upgrade Scala version to 2.12.12 > > > Key: SPARK-33921 > URL: https://issues.apache.org/jira/browse/SPARK-33921 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > Upgrade Scala 2.12 patch version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33896) Make Spark DAGScheduler datasource cache aware when scheduling tasks in a multi-replication HDFS
[ https://issues.apache.org/jira/browse/SPARK-33896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255342#comment-17255342 ] Xudingyu commented on SPARK-33896: -- [~sro...@scient.com][~sro...@yahoo.com][~sowen] > Make Spark DAGScheduler datasource cache aware when scheduling tasks in a > multi-replication HDFS > > > Key: SPARK-33896 > URL: https://issues.apache.org/jira/browse/SPARK-33896 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Xudingyu >Priority: Critical > > *Goals:* > • Make Spark 3.0 Scheduler DataSource-Cache-Aware in multi-replication > HDFS cluster > • Performance gain in E2E workload when enabling this feature > *Problem Statement:* > Spark’s DAGScheduler currently schedule tasks according to RDD’s > preferLocations, which repects HDFS BlockLocation. In a multi-replication > cluster, HDFS BlockLocation can be returned as an Array[BlockLocation], Spark > chooses one of the BlockLocation to run tasks on. +However,tasks can run > faster if scheduled to the nodes with datasource cache that they need. > Currently there’re no datasource cache locality provision mechanism in Spark > if nodes in the cluster have cache data+. > This project aims to add a cache-locality-aware mechanism. Spark DAGScheduler > can schedule tasks to the nodes with datasource cache according to cache > locality in a multi-replication HDFS. > *Basic idea:* > The basic idea is to open a datasource cache locality provider interface in > Spark and with default implementation is to respect HDFS BlockLocation. > Worker nodes datasource cache meta(like offset, length) needs to be stored in > an externalDB like Redis. Spark driver can look up these cache meta and > customize task schedule locality algorithm to choose the most efficient node. > *CBL(Cost Based Locality)* > CBL(cost based locality), takes cache size、disk IO、network IO.. into > account when scheduling tasks. > Say there’re 3 nodes A、B、C in a 2-replication HDFS cluster. When Spark > scheduling task1, nodeB have all the data replication on disk that task1 > needs, at the same time, nodeA has 20% datasource cache and 50% data > replication on disk. > Then we calculate the cost for schedule task1 on nodeA、nodeB and nodeC. > CostA = CalculateCost(20% read from cache) + CalculateCost(50% read from > disk) + CalculateCost(30% read from remote) > CostB = CalculateCost(100% read from disk) > CostC = CalculateCost(100% read from remote) > Return the node with minimal cost. > *Modifications:* > A config is needed to decide which cache locality provider to use, can be as > follows > {code:java} > SQLConf.PARTITIONED_FILE_PREFERREDLOC_IMPL > {code} > For Spark3.0 need to modify FilePartition.scala$preferredLocations() can be > as follows > {code:java} > override def preferredLocations(): Array[String] = { > Utils.classForName(SparkEnv.get.conf.get(SQLConf.PARTITIONED_FILE_PREFERREDLOC_IMPL)) > . getConstructor() > . newInstance() > . getPreferredLocs() > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33921) Upgrade Scala version to 2.12.12
L. C. Hsieh created SPARK-33921: --- Summary: Upgrade Scala version to 2.12.12 Key: SPARK-33921 URL: https://issues.apache.org/jira/browse/SPARK-33921 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.2.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh Upgrade Scala 2.12 patch version. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33913) Upgrade Kafka to 2.7.0
[ https://issues.apache.org/jira/browse/SPARK-33913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255324#comment-17255324 ] Apache Spark commented on SPARK-33913: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30939 > Upgrade Kafka to 2.7.0 > -- > > Key: SPARK-33913 > URL: https://issues.apache.org/jira/browse/SPARK-33913 > Project: Spark > Issue Type: Improvement > Components: Build, DStreams >Affects Versions: 3.2.0 >Reporter: dengziming >Priority: Major > > > The Apache Kafka community has released for Apache Kafka 2.7.0, some features > are useful for example the KAFKA-9893 > configurable TCP connection timeout, more details : > https://downloads.apache.org/kafka/2.7.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33913) Upgrade Kafka to 2.7.0
[ https://issues.apache.org/jira/browse/SPARK-33913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33913: Assignee: (was: Apache Spark) > Upgrade Kafka to 2.7.0 > -- > > Key: SPARK-33913 > URL: https://issues.apache.org/jira/browse/SPARK-33913 > Project: Spark > Issue Type: Improvement > Components: Build, DStreams >Affects Versions: 3.2.0 >Reporter: dengziming >Priority: Major > > > The Apache Kafka community has released for Apache Kafka 2.7.0, some features > are useful for example the KAFKA-9893 > configurable TCP connection timeout, more details : > https://downloads.apache.org/kafka/2.7.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33913) Upgrade Kafka to 2.7.0
[ https://issues.apache.org/jira/browse/SPARK-33913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33913: Assignee: Apache Spark > Upgrade Kafka to 2.7.0 > -- > > Key: SPARK-33913 > URL: https://issues.apache.org/jira/browse/SPARK-33913 > Project: Spark > Issue Type: Improvement > Components: Build, DStreams >Affects Versions: 3.2.0 >Reporter: dengziming >Assignee: Apache Spark >Priority: Major > > > The Apache Kafka community has released for Apache Kafka 2.7.0, some features > are useful for example the KAFKA-9893 > configurable TCP connection timeout, more details : > https://downloads.apache.org/kafka/2.7.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33920) We cannot pass schema to a createDataFrame function in scala, however we can do this in python.
[ https://issues.apache.org/jira/browse/SPARK-33920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255321#comment-17255321 ] L. C. Hsieh commented on SPARK-33920: - There is `{{def createDataFrame(rowRDD: RDD[Row], schema: StructType)}}` in Scala API. If you mean `{{def createDataFrame[A <: Product : TypeTag](data: Seq[A])}}`, Scala API uses Scala reflection to infer the schema of the given Product. Why you need `{{schema}}` parameter here? > We cannot pass schema to a createDataFrame function in scala, however we can > do this in python. > --- > > Key: SPARK-33920 > URL: https://issues.apache.org/jira/browse/SPARK-33920 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.0.1 >Reporter: Abdul Rafay Abdul Rafay >Priority: Critical > Original Estimate: 168h > Remaining Estimate: 168h > > ~spark.createDataFrame(data, schema)~ > ~I am able to pass schema as a parameter to a function createDataFrame in > python but cannot pass this in scala for static data.~ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33824) Restructure and improve Python package management page
[ https://issues.apache.org/jira/browse/SPARK-33824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255229#comment-17255229 ] Apache Spark commented on SPARK-33824: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/30938 > Restructure and improve Python package management page > -- > > Key: SPARK-33824 > URL: https://issues.apache.org/jira/browse/SPARK-33824 > Project: Spark > Issue Type: Sub-task > Components: docs, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.1.0 > > > I lately wrote a blog post (pending to publish soon) about Python dependency > management. > This JIRA aims to aa some of contents in the blog post into PySpark > documentation for users. > Please see the linked PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33824) Restructure and improve Python package management page
[ https://issues.apache.org/jira/browse/SPARK-33824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255228#comment-17255228 ] Apache Spark commented on SPARK-33824: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/30938 > Restructure and improve Python package management page > -- > > Key: SPARK-33824 > URL: https://issues.apache.org/jira/browse/SPARK-33824 > Project: Spark > Issue Type: Sub-task > Components: docs, PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.1.0 > > > I lately wrote a blog post (pending to publish soon) about Python dependency > management. > This JIRA aims to aa some of contents in the blog post into PySpark > documentation for users. > Please see the linked PR. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33920) We cannot pass schema to a createDataFrame function in scala, however we can do this in python.
Abdul Rafay Abdul Rafay created SPARK-33920: --- Summary: We cannot pass schema to a createDataFrame function in scala, however we can do this in python. Key: SPARK-33920 URL: https://issues.apache.org/jira/browse/SPARK-33920 Project: Spark Issue Type: Improvement Components: Build, SQL Affects Versions: 3.0.1 Reporter: Abdul Rafay Abdul Rafay ~spark.createDataFrame(data, schema)~ ~I am able to pass schema as a parameter to a function createDataFrame in python but cannot pass this in scala for static data.~ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33911) Update SQL migration guide about changes in HiveClientImpl
[ https://issues.apache.org/jira/browse/SPARK-33911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-33911: Assignee: Maxim Gekk > Update SQL migration guide about changes in HiveClientImpl > -- > > Key: SPARK-33911 > URL: https://issues.apache.org/jira/browse/SPARK-33911 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > 1. https://github.com/apache/spark/pull/30802 > 2. https://github.com/apache/spark/pull/30711 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33911) Update SQL migration guide about changes in HiveClientImpl
[ https://issues.apache.org/jira/browse/SPARK-33911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33911. -- Fix Version/s: 2.4.8 3.1.0 Resolution: Fixed Issue resolved by pull request 30933 [https://github.com/apache/spark/pull/30933] > Update SQL migration guide about changes in HiveClientImpl > -- > > Key: SPARK-33911 > URL: https://issues.apache.org/jira/browse/SPARK-33911 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.8, 3.0.2, 3.1.0, 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.1.0, 2.4.8 > > > 1. https://github.com/apache/spark/pull/30802 > 2. https://github.com/apache/spark/pull/30711 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33919) Unify v1 and v2 SHOW NAMESPACES tests
[ https://issues.apache.org/jira/browse/SPARK-33919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255190#comment-17255190 ] Apache Spark commented on SPARK-33919: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30937 > Unify v1 and v2 SHOW NAMESPACES tests > - > > Key: SPARK-33919 > URL: https://issues.apache.org/jira/browse/SPARK-33919 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Write unified tests for SHOW DATABASES and SHOW NAMESPACES that can be run > for v1 and v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33919) Unify v1 and v2 SHOW NAMESPACES tests
[ https://issues.apache.org/jira/browse/SPARK-33919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33919: Assignee: (was: Apache Spark) > Unify v1 and v2 SHOW NAMESPACES tests > - > > Key: SPARK-33919 > URL: https://issues.apache.org/jira/browse/SPARK-33919 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Write unified tests for SHOW DATABASES and SHOW NAMESPACES that can be run > for v1 and v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33919) Unify v1 and v2 SHOW NAMESPACES tests
[ https://issues.apache.org/jira/browse/SPARK-33919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33919: Assignee: Apache Spark > Unify v1 and v2 SHOW NAMESPACES tests > - > > Key: SPARK-33919 > URL: https://issues.apache.org/jira/browse/SPARK-33919 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Write unified tests for SHOW DATABASES and SHOW NAMESPACES that can be run > for v1 and v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33919) Unify v1 and v2 SHOW NAMESPACES tests
[ https://issues.apache.org/jira/browse/SPARK-33919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255189#comment-17255189 ] Apache Spark commented on SPARK-33919: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/30937 > Unify v1 and v2 SHOW NAMESPACES tests > - > > Key: SPARK-33919 > URL: https://issues.apache.org/jira/browse/SPARK-33919 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > Write unified tests for SHOW DATABASES and SHOW NAMESPACES that can be run > for v1 and v2 catalogs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org