[jira] [Assigned] (SPARK-37343) Implement createIndex and IndexExists in JDBC (Postgres dialect)

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37343:


Assignee: Apache Spark

> Implement createIndex and IndexExists in JDBC (Postgres dialect)
> 
>
> Key: SPARK-37343
> URL: https://issues.apache.org/jira/browse/SPARK-37343
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37343) Implement createIndex and IndexExists in JDBC (Postgres dialect)

2021-11-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446754#comment-17446754
 ] 

Apache Spark commented on SPARK-37343:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34673

> Implement createIndex and IndexExists in JDBC (Postgres dialect)
> 
>
> Key: SPARK-37343
> URL: https://issues.apache.org/jira/browse/SPARK-37343
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37343) Implement createIndex and IndexExists in JDBC (Postgres dialect)

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37343:


Assignee: (was: Apache Spark)

> Implement createIndex and IndexExists in JDBC (Postgres dialect)
> 
>
> Key: SPARK-37343
> URL: https://issues.apache.org/jira/browse/SPARK-37343
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37379) Add tree pattern pruning to CTESubstitution rule

2021-11-19 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-37379.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34658
[https://github.com/apache/spark/pull/34658]

> Add tree pattern pruning to CTESubstitution rule
> 
>
> Key: SPARK-37379
> URL: https://issues.apache.org/jira/browse/SPARK-37379
> Project: Spark
>  Issue Type: Sub-task
>  Components: Optimizer
>Affects Versions: 3.1.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
> Fix For: 3.3.0
>
>
> I propose to add tree pattern pruning to the CTESubstitution rule in order to 
> skip tree traversal when the tree does not contain an UnresolvedWith 
> expression.
> This is motivated by profiling a job which uses DataFrame APIs to 
> incrementally construct a huge query plan (200k+ nodes): each API call 
> results in eager re-analysis of the plan, of which CTESubstitution accounts 
> for the majority of the analysis time. This query didn't contain CTEs, so 
> skipping the CTESubstitution significantly speeds up analysis.
> I plan to submit a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37394) Skip registering with ESS if a customized shuffle manager is configured

2021-11-19 Thread Weiwei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiwei Yang updated SPARK-37394:

Summary: Skip registering with ESS if a customized shuffle manager is 
configured  (was: Skip registering to ESS if a customized shuffle manager is 
configured)

> Skip registering with ESS if a customized shuffle manager is configured
> ---
>
> Key: SPARK-37394
> URL: https://issues.apache.org/jira/browse/SPARK-37394
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Major
>
> In order to enable dynamic allocation with a customized remote shuffle 
> service, the following configuration properties are set:
> * spark.dynamicAllocation.enabled=true
> * spark.dynamicAllocation.shuffleTracking.enabled=false 
> * spark.shuffle.service.enabled=true
> * spark.shuffle.manager=org.apache.spark.SomeShuffleManager
> When running Spark job with the above configurations, the job failed with the 
> following error:
> {code}
> 21/11/19 23:01:51 INFO BlockManager: external shuffle service port = 7337
> 21/11/19 23:01:51 INFO BlockManager: Registering executor with local external 
> shuffle service.
> 21/11/19 23:01:51 ERROR BlockManager: Failed to connect to external shuffle 
> server, will retry 2 more times after waiting 5 seconds...
> java.io.IOException: Failed to connect to /10.1.2.75:7337
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:201)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:142)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:294)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:291)
>   at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:265)
>   at org.apache.spark.executor.Executor.(Executor.scala:118)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)
>   at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
>   at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
>   at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
>   at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
> Connection refused: /10.1.2.75:7337
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
>   ... 1 more
> Caused by: java.net.ConnectException: Connection refused
>   ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37394) Skip registering to ESS if a customized shuffle manager is configured

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37394:


Assignee: Apache Spark

> Skip registering to ESS if a customized shuffle manager is configured
> -
>
> Key: SPARK-37394
> URL: https://issues.apache.org/jira/browse/SPARK-37394
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Assignee: Apache Spark
>Priority: Major
>
> In order to enable dynamic allocation with a customized remote shuffle 
> service, the following configuration properties are set:
> * spark.dynamicAllocation.enabled=true
> * spark.dynamicAllocation.shuffleTracking.enabled=false 
> * spark.shuffle.service.enabled=true
> * spark.shuffle.manager=org.apache.spark.SomeShuffleManager
> When running Spark job with the above configurations, the job failed with the 
> following error:
> {code}
> 21/11/19 23:01:51 INFO BlockManager: external shuffle service port = 7337
> 21/11/19 23:01:51 INFO BlockManager: Registering executor with local external 
> shuffle service.
> 21/11/19 23:01:51 ERROR BlockManager: Failed to connect to external shuffle 
> server, will retry 2 more times after waiting 5 seconds...
> java.io.IOException: Failed to connect to /10.1.2.75:7337
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:201)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:142)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:294)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:291)
>   at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:265)
>   at org.apache.spark.executor.Executor.(Executor.scala:118)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)
>   at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
>   at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
>   at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
>   at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
> Connection refused: /10.1.2.75:7337
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
>   ... 1 more
> Caused by: java.net.ConnectException: Connection refused
>   ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-11-19 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-36900:
--
Fix Version/s: 3.2.1

> "SPARK-36464: size returns correct positive number even with over 2GB data" 
> will oom with JDK17 
> 
>
> Key: SPARK-36900
> URL: https://issues.apache.org/jira/browse/SPARK-36900
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.2.1, 3.3.0
>
>
> Execute
>  
> {code:java}
> build/mvn clean install  -pl core -am -Dtest=none 
> -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
> {code}
> with JDK 17,
> {code:java}
> ChunkedByteBufferOutputStreamSuite:
> - empty output
> - write a single byte
> - write a single near boundary
> - write a single at boundary
> - single chunk output
> - single chunk output at boundary size
> - multiple chunk output
> - multiple chunk output at boundary size
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at java.base/java.lang.Integer.valueOf(Integer.java:1081)
>   at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
>   at java.base/java.io.OutputStream.write(OutputStream.java:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
>  Source)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37394) Skip registering to ESS if a customized shuffle manager is configured

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37394:


Assignee: (was: Apache Spark)

> Skip registering to ESS if a customized shuffle manager is configured
> -
>
> Key: SPARK-37394
> URL: https://issues.apache.org/jira/browse/SPARK-37394
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Major
>
> In order to enable dynamic allocation with a customized remote shuffle 
> service, the following configuration properties are set:
> * spark.dynamicAllocation.enabled=true
> * spark.dynamicAllocation.shuffleTracking.enabled=false 
> * spark.shuffle.service.enabled=true
> * spark.shuffle.manager=org.apache.spark.SomeShuffleManager
> When running Spark job with the above configurations, the job failed with the 
> following error:
> {code}
> 21/11/19 23:01:51 INFO BlockManager: external shuffle service port = 7337
> 21/11/19 23:01:51 INFO BlockManager: Registering executor with local external 
> shuffle service.
> 21/11/19 23:01:51 ERROR BlockManager: Failed to connect to external shuffle 
> server, will retry 2 more times after waiting 5 seconds...
> java.io.IOException: Failed to connect to /10.1.2.75:7337
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:201)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:142)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:294)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:291)
>   at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:265)
>   at org.apache.spark.executor.Executor.(Executor.scala:118)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)
>   at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
>   at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
>   at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
>   at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
> Connection refused: /10.1.2.75:7337
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
>   ... 1 more
> Caused by: java.net.ConnectException: Connection refused
>   ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37394) Skip registering to ESS if a customized shuffle manager is configured

2021-11-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446717#comment-17446717
 ] 

Apache Spark commented on SPARK-37394:
--

User 'yangwwei' has created a pull request for this issue:
https://github.com/apache/spark/pull/34672

> Skip registering to ESS if a customized shuffle manager is configured
> -
>
> Key: SPARK-37394
> URL: https://issues.apache.org/jira/browse/SPARK-37394
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Weiwei Yang
>Priority: Major
>
> In order to enable dynamic allocation with a customized remote shuffle 
> service, the following configuration properties are set:
> * spark.dynamicAllocation.enabled=true
> * spark.dynamicAllocation.shuffleTracking.enabled=false 
> * spark.shuffle.service.enabled=true
> * spark.shuffle.manager=org.apache.spark.SomeShuffleManager
> When running Spark job with the above configurations, the job failed with the 
> following error:
> {code}
> 21/11/19 23:01:51 INFO BlockManager: external shuffle service port = 7337
> 21/11/19 23:01:51 INFO BlockManager: Registering executor with local external 
> shuffle service.
> 21/11/19 23:01:51 ERROR BlockManager: Failed to connect to external shuffle 
> server, will retry 2 more times after waiting 5 seconds...
> java.io.IOException: Failed to connect to /10.1.2.75:7337
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:201)
>   at 
> org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:142)
>   at 
> org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:294)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at 
> org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:291)
>   at 
> org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:265)
>   at org.apache.spark.executor.Executor.(Executor.scala:118)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)
>   at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
>   at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
>   at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
>   at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
> Connection refused: /10.1.2.75:7337
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
>   at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>   at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
>   ... 1 more
> Caused by: java.net.ConnectException: Connection refused
>   ... 11 more
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37394) Skip registering to ESS if a customized shuffle manager is configured

2021-11-19 Thread Weiwei Yang (Jira)
Weiwei Yang created SPARK-37394:
---

 Summary: Skip registering to ESS if a customized shuffle manager 
is configured
 Key: SPARK-37394
 URL: https://issues.apache.org/jira/browse/SPARK-37394
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Weiwei Yang


In order to enable dynamic allocation with a customized remote shuffle service, 
the following configuration properties are set:
* spark.dynamicAllocation.enabled=true
* spark.dynamicAllocation.shuffleTracking.enabled=false 
* spark.shuffle.service.enabled=true
* spark.shuffle.manager=org.apache.spark.SomeShuffleManager

When running Spark job with the above configurations, the job failed with the 
following error:

{code}
21/11/19 23:01:51 INFO BlockManager: external shuffle service port = 7337
21/11/19 23:01:51 INFO BlockManager: Registering executor with local external 
shuffle service.
21/11/19 23:01:51 ERROR BlockManager: Failed to connect to external shuffle 
server, will retry 2 more times after waiting 5 seconds...
java.io.IOException: Failed to connect to /10.1.2.75:7337
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at 
org.apache.spark.network.client.TransportClientFactory.createUnmanagedClient(TransportClientFactory.java:201)
at 
org.apache.spark.network.shuffle.ExternalShuffleClient.registerWithShuffleServer(ExternalShuffleClient.java:142)
at 
org.apache.spark.storage.BlockManager$$anonfun$registerWithExternalShuffleServer$1.apply$mcVI$sp(BlockManager.scala:294)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at 
org.apache.spark.storage.BlockManager.registerWithExternalShuffleServer(BlockManager.scala:291)
at 
org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:265)
at org.apache.spark.executor.Executor.(Executor.scala:118)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: 
Connection refused: /10.1.2.75:7337
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
at 
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
... 1 more
Caused by: java.net.ConnectException: Connection refused
... 11 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37393) Inline annotations for {ml, mllib}/common.py

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37393:


Assignee: Apache Spark

> Inline annotations for {ml, mllib}/common.py
> 
>
> Key: SPARK-37393
> URL: https://issues.apache.org/jira/browse/SPARK-37393
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Affects Versions: 3.2.0
>Reporter: Nicholas Chammas
>Assignee: Apache Spark
>Priority: Minor
>
> This will allow us to run type checks against those files themselves.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37393) Inline annotations for {ml, mllib}/common.py

2021-11-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446673#comment-17446673
 ] 

Apache Spark commented on SPARK-37393:
--

User 'nchammas' has created a pull request for this issue:
https://github.com/apache/spark/pull/34671

> Inline annotations for {ml, mllib}/common.py
> 
>
> Key: SPARK-37393
> URL: https://issues.apache.org/jira/browse/SPARK-37393
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Affects Versions: 3.2.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> This will allow us to run type checks against those files themselves.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37393) Inline annotations for {ml, mllib}/common.py

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37393:


Assignee: (was: Apache Spark)

> Inline annotations for {ml, mllib}/common.py
> 
>
> Key: SPARK-37393
> URL: https://issues.apache.org/jira/browse/SPARK-37393
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Affects Versions: 3.2.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> This will allow us to run type checks against those files themselves.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36997) Test type hints against examples

2021-11-19 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-36997.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34273
[https://github.com/apache/spark/pull/34273]

> Test type hints against examples
> 
>
> Key: SPARK-36997
> URL: https://issues.apache.org/jira/browse/SPARK-36997
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.3.0
>
>
> Next to tests, examples are the largest piece of code showing actual usage of 
> public API, that we have. 
> Originally, {{pyspark-stubs}} tested {{sql}} and {{ml}} examples, but these 
> tests weren't ported during the initial migration.
> We should add these back, and extend to {{mllib}}, {{streaming}} and core 
> files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36997) Test type hints against examples

2021-11-19 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-36997:
--

Assignee: Maciej Szymkiewicz

> Test type hints against examples
> 
>
> Key: SPARK-36997
> URL: https://issues.apache.org/jira/browse/SPARK-36997
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
>
> Next to tests, examples are the largest piece of code showing actual usage of 
> public API, that we have. 
> Originally, {{pyspark-stubs}} tested {{sql}} and {{ml}} examples, but these 
> tests weren't ported during the initial migration.
> We should add these back, and extend to {{mllib}}, {{streaming}} and core 
> files.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37393) Inline annotations for {ml, mllib}/common.py

2021-11-19 Thread Nicholas Chammas (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-37393:
-
Description: This will allow us to run type checks against those files 
themselves.

> Inline annotations for {ml, mllib}/common.py
> 
>
> Key: SPARK-37393
> URL: https://issues.apache.org/jira/browse/SPARK-37393
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Affects Versions: 3.2.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> This will allow us to run type checks against those files themselves.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37393) Inline annotations for {ml, mllib}/common.py

2021-11-19 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-37393:


 Summary: Inline annotations for {ml, mllib}/common.py
 Key: SPARK-37393
 URL: https://issues.apache.org/jira/browse/SPARK-37393
 Project: Spark
  Issue Type: Improvement
  Components: ML, MLlib, PySpark
Affects Versions: 3.2.0
Reporter: Nicholas Chammas






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37392) Catalyst optimizer very time-consuming and memory-intensive with some "explode(array)"

2021-11-19 Thread Francois MARTIN (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois MARTIN updated SPARK-37392:

Description: 
The problem occurs with the simple code below:
{code:java}
import session.implicits._

Seq(
  (1, "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x")
).toDF()
  .checkpoint() // or save and reload to truncate lineage
  .createOrReplaceTempView("sub")

session.sql("""
  SELECT
*
  FROM
  (
SELECT
  EXPLODE( ARRAY( * ) ) result
FROM
(
  SELECT
_1 a, _2 b, _3 c, _4 d, _5 e, _6 f, _7 g, _8 h, _9 i, _10 j, _11 k, _12 
l, _13 m, _14 n, _15 o, _16 p, _17 q, _18 r, _19 s, _20 t, _21 u
  FROM
sub
)
  )
  WHERE
result != ''
  """).show() {code}
It takes several minutes and a very high Java heap usage, when it should be 
immediate.

It does not occur when replacing the unique integer value ({_}1{_}) with a 
string value ({_}"x"{_}).

All the time is spent in the _PruneFilters_ optimization rule.

Not reproduced in Spark 2.4.1.

  was:
The problem occurs with this simple code below:
{code:java}
import session.implicits._

Seq(
  (1, "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x")
).toDF()
  .checkpoint() // or save and reload to truncate lineage
  .createOrReplaceTempView("sub")

session.sql("""
  SELECT
*
  FROM
  (
SELECT
  EXPLODE( ARRAY( * ) ) result
FROM
(
  SELECT
_1 a, _2 b, _3 c, _4 d, _5 e, _6 f, _7 g, _8 h, _9 i, _10 j, _11 k, _12 
l, _13 m, _14 n, _15 o, _16 p, _17 q, _18 r, _19 s, _20 t, _21 u
  FROM
sub
)
  )
  WHERE
result != ''
  """).show() {code}
It does not occur when replacing the unique integer value ({_}1{_}) with a 
string value ({_}"x"{_}).

All the time is spent in the _PruneFilters_ optimization rule.

Not reproduced in Spark 2.4.1.


> Catalyst optimizer very time-consuming and memory-intensive with some 
> "explode(array)" 
> ---
>
> Key: SPARK-37392
> URL: https://issues.apache.org/jira/browse/SPARK-37392
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Francois MARTIN
>Priority: Major
>
> The problem occurs with the simple code below:
> {code:java}
> import session.implicits._
> Seq(
>   (1, "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
> "x", "x", "x", "x", "x", "x")
> ).toDF()
>   .checkpoint() // or save and reload to truncate lineage
>   .createOrReplaceTempView("sub")
> session.sql("""
>   SELECT
> *
>   FROM
>   (
> SELECT
>   EXPLODE( ARRAY( * ) ) result
> FROM
> (
>   SELECT
> _1 a, _2 b, _3 c, _4 d, _5 e, _6 f, _7 g, _8 h, _9 i, _10 j, _11 k, 
> _12 l, _13 m, _14 n, _15 o, _16 p, _17 q, _18 r, _19 s, _20 t, _21 u
>   FROM
> sub
> )
>   )
>   WHERE
> result != ''
>   """).show() {code}
> It takes several minutes and a very high Java heap usage, when it should be 
> immediate.
> It does not occur when replacing the unique integer value ({_}1{_}) with a 
> string value ({_}"x"{_}).
> All the time is spent in the _PruneFilters_ optimization rule.
> Not reproduced in Spark 2.4.1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37392) Catalyst optimizer very time-consuming and memory-intensive with some "explode(array)"

2021-11-19 Thread Francois MARTIN (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois MARTIN updated SPARK-37392:

Priority: Major  (was: Minor)

> Catalyst optimizer very time-consuming and memory-intensive with some 
> "explode(array)" 
> ---
>
> Key: SPARK-37392
> URL: https://issues.apache.org/jira/browse/SPARK-37392
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Francois MARTIN
>Priority: Major
>
> The problem occurs with this simple code below:
> {code:java}
> import session.implicits._
> Seq(
>   (1, "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
> "x", "x", "x", "x", "x", "x")
> ).toDF()
>   .checkpoint() // or save and reload to truncate lineage
>   .createOrReplaceTempView("sub")
> session.sql("""
>   SELECT
> *
>   FROM
>   (
> SELECT
>   EXPLODE( ARRAY( * ) ) result
> FROM
> (
>   SELECT
> _1 a, _2 b, _3 c, _4 d, _5 e, _6 f, _7 g, _8 h, _9 i, _10 j, _11 k, 
> _12 l, _13 m, _14 n, _15 o, _16 p, _17 q, _18 r, _19 s, _20 t, _21 u
>   FROM
> sub
> )
>   )
>   WHERE
> result != ''
>   """).show() {code}
> It does not occur when replacing the unique integer value ({_}1{_}) with a 
> string value ({_}"x"{_}).
> All the time is spent in the _PruneFilters_ optimization rule.
> Not reproduced in Spark 2.4.1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37392) Catalyst optimizer very time-consuming and memory-intensive with some "explode(array)"

2021-11-19 Thread Francois MARTIN (Jira)
Francois MARTIN created SPARK-37392:
---

 Summary: Catalyst optimizer very time-consuming and 
memory-intensive with some "explode(array)" 
 Key: SPARK-37392
 URL: https://issues.apache.org/jira/browse/SPARK-37392
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.1.2
Reporter: Francois MARTIN


The problem occurs with this simple code below:
{code:java}
import session.implicits._

Seq(
  (1, "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", "x", 
"x", "x", "x", "x", "x", "x")
).toDF()
  .checkpoint() // or save and reload to truncate lineage
  .createOrReplaceTempView("sub")

session.sql("""
  SELECT
*
  FROM
  (
SELECT
  EXPLODE( ARRAY( * ) ) result
FROM
(
  SELECT
_1 a, _2 b, _3 c, _4 d, _5 e, _6 f, _7 g, _8 h, _9 i, _10 j, _11 k, _12 
l, _13 m, _14 n, _15 o, _16 p, _17 q, _18 r, _19 s, _20 t, _21 u
  FROM
sub
)
  )
  WHERE
result != ''
  """).show() {code}
It does not occur when replacing the unique integer value ({_}1{_}) with a 
string value ({_}"x"{_}).

All the time is spent in the _PruneFilters_ optimization rule.

Not reproduced in Spark 2.4.1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497

2021-11-19 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-37391:
---
Description: 
The fix for SPARK-34497 ( [https://github.com/apache/spark/pull/31622] ) does 
not seem to have consider the reality that some apps may rely on being able to 
establish many JDBC connections simultaneously for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!

  was:
The fix for SPARK-34497 ([https://github.com/apache/spark/pull/31622)] does not 
seem to have consider the reality that some apps may rely on being able to 
establish many JDBC connections simultaneously for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!


> SIGNIFICANT bottleneck introduced by fix for SPARK-34497
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
>
> The fix for SPARK-34497 ( [https://github.com/apache/spark/pull/31622] ) does 
> not seem to have consider the reality that some apps may rely on being able 
> to establish many JDBC connections simultaneously for performance reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497

2021-11-19 Thread Danny Guinther (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Guinther updated SPARK-37391:
---
Description: 
The fix for SPARK-34497 ([https://github.com/apache/spark/pull/31622)] does not 
seem to have consider the reality that some apps may rely on being able to 
establish many JDBC connections simultaneously for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!

  was:
The fix for SPARK-34497 does not seem to have consider the reality that some 
apps may rely on being able to establish many JDBC connections simultaneously 
for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!


> SIGNIFICANT bottleneck introduced by fix for SPARK-34497
> 
>
> Key: SPARK-37391
> URL: https://issues.apache.org/jira/browse/SPARK-37391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0
> Environment: N/A
>Reporter: Danny Guinther
>Priority: Major
>
> The fix for SPARK-34497 ([https://github.com/apache/spark/pull/31622)] does 
> not seem to have consider the reality that some apps may rely on being able 
> to establish many JDBC connections simultaneously for performance reasons.
> The fix forces concurrency to 1 when establishing database connections and 
> that strikes me as a *significant* user impacting change and a *significant* 
> bottleneck.
> Can anyone propose a workaround for this? I have an app that makes 
> connections to thousands of databases and I can't upgrade to any version 
> >3.1.x because of this significant bottleneck.
>  
> Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37349) Improve SQL Rest API Parsing

2021-11-19 Thread Yian Liou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yian Liou updated SPARK-37349:
--
Description: 
https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
Rest API. Currently, values like
`"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 
59.0 B (stage 1.0: task 5))"` are currently shown from Rest API calls which are 
not easily digested in its current form.New processing logic of the values is 
introduced to organize and process new metric fields in a more user friendly 
manner.

  was:https://issues.apache.org/jira/browse/SPARK-31440 added improvements for 
SQL Rest API. This Jira aims to add further enhancements in regards to parsing 
the incoming data by accounting for `StageIds` and `TaskIds` fields that came 
in Spark 3.


> Improve SQL Rest API Parsing
> 
>
> Key: SPARK-37349
> URL: https://issues.apache.org/jira/browse/SPARK-37349
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yian Liou
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
> Rest API. Currently, values like
> `"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 
> 59.0 B (stage 1.0: task 5))"` are currently shown from Rest API calls which 
> are not easily digested in its current form.New processing logic of the 
> values is introduced to organize and process new metric fields in a more user 
> friendly manner.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37391) SIGNIFICANT bottleneck introduced by fix for SPARK-34497

2021-11-19 Thread Danny Guinther (Jira)
Danny Guinther created SPARK-37391:
--

 Summary: SIGNIFICANT bottleneck introduced by fix for SPARK-34497
 Key: SPARK-37391
 URL: https://issues.apache.org/jira/browse/SPARK-37391
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0, 3.1.2, 3.1.1, 3.1.0
 Environment: N/A
Reporter: Danny Guinther


The fix for SPARK-34497 does not seem to have consider the reality that some 
apps may rely on being able to establish many JDBC connections simultaneously 
for performance reasons.

The fix forces concurrency to 1 when establishing database connections and that 
strikes me as a *significant* user impacting change and a *significant* 
bottleneck.

Can anyone propose a workaround for this? I have an app that makes connections 
to thousands of databases and I can't upgrade to any version >3.1.x because of 
this significant bottleneck.

 

Thanks in advance for your help!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37349) Improve SQL Rest API Parsing

2021-11-19 Thread Yian Liou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yian Liou updated SPARK-37349:
--
Description: 
https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
Rest API. Currently, values like
{code:java}
"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 
59.0 B (stage 1.0: task 5))"{code}
 are currently shown from Rest API calls which are not easily digested in its 
current form.New processing logic of the values is introduced to organize and 
process new metric fields in a more user friendly manner.

  was:
https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
Rest API. Currently, values like
`"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 
59.0 B (stage 1.0: task 5))"` are currently shown from Rest API calls which are 
not easily digested in its current form.New processing logic of the values is 
introduced to organize and process new metric fields in a more user friendly 
manner.


> Improve SQL Rest API Parsing
> 
>
> Key: SPARK-37349
> URL: https://issues.apache.org/jira/browse/SPARK-37349
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yian Liou
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
> Rest API. Currently, values like
> {code:java}
> "value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 
> 59.0 B (stage 1.0: task 5))"{code}
>  are currently shown from Rest API calls which are not easily digested in its 
> current form.New processing logic of the values is introduced to organize and 
> process new metric fields in a more user friendly manner.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25642) Add new Metrics in External Shuffle Service to help determine Network performance and Connection Handling capabilities of the Shuffle Service

2021-11-19 Thread Parth Gandhi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446592#comment-17446592
 ] 

Parth Gandhi commented on SPARK-25642:
--

Hello [~yzhangal] , ideally you are correct, numActiveConnections should be a 
subset of numRegisteredConnections. So if you are seeing a different 
observation, it may need to be investigated. Please let me know if you have any 
more questions regarding this PR. Thank you.

> Add new Metrics in External Shuffle Service to help determine Network 
> performance and Connection Handling capabilities of the Shuffle Service
> -
>
> Key: SPARK-25642
> URL: https://issues.apache.org/jira/browse/SPARK-25642
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.4.0
>Reporter: Parth Gandhi
>Assignee: Parth Gandhi
>Priority: Minor
> Fix For: 3.0.0
>
>
> Recently, the ability to expose the metrics for YARN Shuffle Service was 
> added as part of [SPARK-18364|[https://github.com/apache/spark/pull/22485]]. 
> We need to add some metrics to be able to determine the number of active 
> connections as well as open connections to the external shuffle service to 
> benchmark network and connection issues on large cluster environments.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37385) Add tests for TimestampNTZ and TimestampLTZ for Parquet data source

2021-11-19 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-37385.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34663
[https://github.com/apache/spark/pull/34663]

> Add tests for TimestampNTZ and TimestampLTZ for Parquet data source
> ---
>
> Key: SPARK-37385
> URL: https://issues.apache.org/jira/browse/SPARK-37385
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37385) Add tests for TimestampNTZ and TimestampLTZ for Parquet data source

2021-11-19 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-37385:
---

Assignee: Ivan Sadikov

> Add tests for TimestampNTZ and TimestampLTZ for Parquet data source
> ---
>
> Key: SPARK-37385
> URL: https://issues.apache.org/jira/browse/SPARK-37385
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37259) JDBC read is always going to wrap the query in a select statement

2021-11-19 Thread Peter Toth (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446554#comment-17446554
 ] 

Peter Toth commented on SPARK-37259:


[~KevinAppelBofa], how about adding a new `withClause` to the JDBC options? Do 
you think you could split your CTE query to "with clause" and "regular query" 
parts manually and specify something like: .option("withClause", 
withClause).option("query", query)?
Because, that way we probably only need a small change to `sqlText` in 
`compute()` 
([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L370-L371):]
{noformat}
val sqlText = s"$withClause SELECT $columnList FROM ${options.tableOrQuery} 
$myTableSampleClause" +
  s" $myWhereClause $getGroupByClause $myLimitClause"
{noformat}
and also we could keep its other functionality.

Sidenote: technically we could extract the WITH clause in MsSqlServerDialect 
and assemble a dialect specific `sqlText` there, but it is not that simple to 
do it...

> JDBC read is always going to wrap the query in a select statement
> -
>
> Key: SPARK-37259
> URL: https://issues.apache.org/jira/browse/SPARK-37259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Kevin Appel
>Priority: Major
>
> The read jdbc is wrapping the query it sends to the database server inside a 
> select statement and there is no way to override this currently.
> Initially I ran into this issue when trying to run a CTE query against SQL 
> server and it fails, the details of the failure is in these cases:
> [https://github.com/microsoft/mssql-jdbc/issues/1340]
> [https://github.com/microsoft/mssql-jdbc/issues/1657]
> [https://github.com/microsoft/sql-spark-connector/issues/147]
> https://issues.apache.org/jira/browse/SPARK-32825
> https://issues.apache.org/jira/browse/SPARK-34928
> I started to patch the code to get the query to run and ran into a few 
> different items, if there is a way to add these features to allow this code 
> path to run, this would be extremely helpful to running these type of edge 
> case queries.  These are basic examples here the actual queries are much more 
> complex and would require significant time to rewrite.
> Inside JDBCOptions.scala the query is being set to either, using the dbtable 
> this allows the query to be passed without modification
>  
> {code:java}
> name.trim
> or
> s"(${subquery}) SPARK_GEN_SUBQ_${curId.getAndIncrement()}"
> {code}
>  
> Inside JDBCRelation.scala this is going to try to get the schema for this 
> query, and this ends up running dialect.getSchemaQuery which is doing:
> {code:java}
> s"SELECT * FROM $table WHERE 1=0"{code}
> Overriding the dialect here and initially just passing back the $table gets 
> passed here and to the next issue which is in the compute function in 
> JDBCRDD.scala
>  
> {code:java}
> val sqlText = s"SELECT $columnList FROM ${options.tableOrQuery} 
> $myTableSampleClause" + s" $myWhereClause $getGroupByClause $myLimitClause"
>  
> {code}
>  
> For these two queries, about a CTE query and using temp tables, finding out 
> the schema is difficult without actually running the query and for the temp 
> table if you run it in the schema check that will have the table now exist 
> and fail when it runs the actual query.
>  
> The way I patched these is by doing these two items:
> JDBCRDD.scala (compute)
>  
> {code:java}
>     val runQueryAsIs = options.parameters.getOrElse("runQueryAsIs", 
> "false").toBoolean
>     val sqlText = if (runQueryAsIs) {
>       s"${options.tableOrQuery}"
>     } else {
>       s"SELECT $columnList FROM ${options.tableOrQuery} $myWhereClause"
>     }
> {code}
> JDBCRelation.scala (getSchema)
> {code:java}
> val useCustomSchema = jdbcOptions.parameters.getOrElse("useCustomSchema", 
> "false").toBoolean
>     if (useCustomSchema) {
>       val myCustomSchema = jdbcOptions.parameters.getOrElse("customSchema", 
> "").toString
>       val newSchema = CatalystSqlParser.parseTableSchema(myCustomSchema)
>       logInfo(s"Going to return the new $newSchema because useCustomSchema is 
> $useCustomSchema and passed in $myCustomSchema")
>       newSchema
>     } else {
>       val tableSchema = JDBCRDD.resolveTable(jdbcOptions)
>       jdbcOptions.customSchema match {
>       case Some(customSchema) => JdbcUtils.getCustomSchema(
>         tableSchema, customSchema, resolver)
>       case None => tableSchema
>       }
>     }{code}
>  
> This is allowing the query to run as is, by using the dbtable option and then 
> provide a custom schema that will bypass the dialect schema check
>  
> Test queries
>  
> {code:java}
> query1 = """ 
> SELECT 1 as DummyCOL
> """
> query2 = """ 
> WITH DummyCTE AS
> (

[jira] [Assigned] (SPARK-37388) WidthBucket throws NullPointerException in WholeStageCodegenExec

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37388:


Assignee: Apache Spark

> WidthBucket throws NullPointerException in WholeStageCodegenExec
> 
>
> Key: SPARK-37388
> URL: https://issues.apache.org/jira/browse/SPARK-37388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tom van Bussel
>Assignee: Apache Spark
>Priority: Major
>
> Repro: Disable ConstantFolding and run
> {code:java}
> SELECT width_bucket(3.5, 3.0, 3.0, 888) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37388) WidthBucket throws NullPointerException in WholeStageCodegenExec

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37388:


Assignee: Apache Spark

> WidthBucket throws NullPointerException in WholeStageCodegenExec
> 
>
> Key: SPARK-37388
> URL: https://issues.apache.org/jira/browse/SPARK-37388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tom van Bussel
>Assignee: Apache Spark
>Priority: Major
>
> Repro: Disable ConstantFolding and run
> {code:java}
> SELECT width_bucket(3.5, 3.0, 3.0, 888) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37388) WidthBucket throws NullPointerException in WholeStageCodegenExec

2021-11-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446524#comment-17446524
 ] 

Apache Spark commented on SPARK-37388:
--

User 'tomvanbussel' has created a pull request for this issue:
https://github.com/apache/spark/pull/34670

> WidthBucket throws NullPointerException in WholeStageCodegenExec
> 
>
> Key: SPARK-37388
> URL: https://issues.apache.org/jira/browse/SPARK-37388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tom van Bussel
>Priority: Major
>
> Repro: Disable ConstantFolding and run
> {code:java}
> SELECT width_bucket(3.5, 3.0, 3.0, 888) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37388) WidthBucket throws NullPointerException in WholeStageCodegenExec

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37388:


Assignee: (was: Apache Spark)

> WidthBucket throws NullPointerException in WholeStageCodegenExec
> 
>
> Key: SPARK-37388
> URL: https://issues.apache.org/jira/browse/SPARK-37388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tom van Bussel
>Priority: Major
>
> Repro: Disable ConstantFolding and run
> {code:java}
> SELECT width_bucket(3.5, 3.0, 3.0, 888) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37388) WidthBucket throws NullPointerException in WholeStageCodegenExec

2021-11-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446525#comment-17446525
 ] 

Apache Spark commented on SPARK-37388:
--

User 'tomvanbussel' has created a pull request for this issue:
https://github.com/apache/spark/pull/34670

> WidthBucket throws NullPointerException in WholeStageCodegenExec
> 
>
> Key: SPARK-37388
> URL: https://issues.apache.org/jira/browse/SPARK-37388
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Tom van Bussel
>Priority: Major
>
> Repro: Disable ConstantFolding and run
> {code:java}
> SELECT width_bucket(3.5, 3.0, 3.0, 888) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37390) Buggy method retrival in pyspark.docs.conf.setup

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37390:


Assignee: Apache Spark

> Buggy method retrival in pyspark.docs.conf.setup
> 
>
> Key: SPARK-37390
> URL: https://issues.apache.org/jira/browse/SPARK-37390
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Minor
>
> [Currently we have this 
> code|https://github.com/apache/spark/blob/04af08e9b6d692ed4b5df5581e39de138067211c/python/docs/source/conf.py#L374-L376]
> {code:python}
> def setup(app):
> # The app.add_javascript() is deprecated.
> getattr(app, "add_js_file", getattr(app, 
> "add_javascript"))('copybutton.js')
> {code}
> where nested {{getattr}} should handle compatibility issues between different 
> Sphinx versions.
> However, {{getattr(app, "add_javascript"))}} is missing {{default}} and will 
> fail in latest Sphinx version, without ever falling back to {{getattr(app, 
> "add_js_file")}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37390) Buggy method retrival in pyspark.docs.conf.setup

2021-11-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446504#comment-17446504
 ] 

Apache Spark commented on SPARK-37390:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/34669

> Buggy method retrival in pyspark.docs.conf.setup
> 
>
> Key: SPARK-37390
> URL: https://issues.apache.org/jira/browse/SPARK-37390
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> [Currently we have this 
> code|https://github.com/apache/spark/blob/04af08e9b6d692ed4b5df5581e39de138067211c/python/docs/source/conf.py#L374-L376]
> {code:python}
> def setup(app):
> # The app.add_javascript() is deprecated.
> getattr(app, "add_js_file", getattr(app, 
> "add_javascript"))('copybutton.js')
> {code}
> where nested {{getattr}} should handle compatibility issues between different 
> Sphinx versions.
> However, {{getattr(app, "add_javascript"))}} is missing {{default}} and will 
> fail in latest Sphinx version, without ever falling back to {{getattr(app, 
> "add_js_file")}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37390) Buggy method retrival in pyspark.docs.conf.setup

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37390:


Assignee: (was: Apache Spark)

> Buggy method retrival in pyspark.docs.conf.setup
> 
>
> Key: SPARK-37390
> URL: https://issues.apache.org/jira/browse/SPARK-37390
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 3.1.0, 3.2.0, 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> [Currently we have this 
> code|https://github.com/apache/spark/blob/04af08e9b6d692ed4b5df5581e39de138067211c/python/docs/source/conf.py#L374-L376]
> {code:python}
> def setup(app):
> # The app.add_javascript() is deprecated.
> getattr(app, "add_js_file", getattr(app, 
> "add_javascript"))('copybutton.js')
> {code}
> where nested {{getattr}} should handle compatibility issues between different 
> Sphinx versions.
> However, {{getattr(app, "add_javascript"))}} is missing {{default}} and will 
> fail in latest Sphinx version, without ever falling back to {{getattr(app, 
> "add_js_file")}}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37390) Buggy method retrival in pyspark.docs.conf.setup

2021-11-19 Thread Maciej Szymkiewicz (Jira)
Maciej Szymkiewicz created SPARK-37390:
--

 Summary: Buggy method retrival in pyspark.docs.conf.setup
 Key: SPARK-37390
 URL: https://issues.apache.org/jira/browse/SPARK-37390
 Project: Spark
  Issue Type: Bug
  Components: Documentation, PySpark
Affects Versions: 3.2.0, 3.1.0, 3.3.0
Reporter: Maciej Szymkiewicz


[Currently we have this 
code|https://github.com/apache/spark/blob/04af08e9b6d692ed4b5df5581e39de138067211c/python/docs/source/conf.py#L374-L376]

{code:python}
def setup(app):
# The app.add_javascript() is deprecated.
getattr(app, "add_js_file", getattr(app, "add_javascript"))('copybutton.js')
{code}

where nested {{getattr}} should handle compatibility issues between different 
Sphinx versions.

However, {{getattr(app, "add_javascript"))}} is missing {{default}} and will 
fail in latest Sphinx version, without ever falling back to {{getattr(app, 
"add_js_file")}}

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37389) Check unclosed bracketed comments

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37389:


Assignee: (was: Apache Spark)

> Check unclosed bracketed comments
> -
>
> Key: SPARK-37389
> URL: https://issues.apache.org/jira/browse/SPARK-37389
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> The SQL below has unclosed bracketed comment.
> {code:java}
> /*abc*/
> select 1 as a
> /*
> 2 as b
> /*abc*/
> , 3 as c
> /**/
> ;
> {code}
> But Spark will output:
> a
> 1
> PostgreSQL also supports the feature, and output:
> {code:java}
> SQL 错误 [42601]: Unterminated block comment started at position 47 in SQL 
> /*abc*/ -- block comment
> select 1 as a
> /*
> 2 as b
> /*abc*/
> , 3 as c
> /**/
> . Expected */ sequence
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37389) Check unclosed bracketed comments

2021-11-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446495#comment-17446495
 ] 

Apache Spark commented on SPARK-37389:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/34668

> Check unclosed bracketed comments
> -
>
> Key: SPARK-37389
> URL: https://issues.apache.org/jira/browse/SPARK-37389
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> The SQL below has unclosed bracketed comment.
> {code:java}
> /*abc*/
> select 1 as a
> /*
> 2 as b
> /*abc*/
> , 3 as c
> /**/
> ;
> {code}
> But Spark will output:
> a
> 1
> PostgreSQL also supports the feature, and output:
> {code:java}
> SQL 错误 [42601]: Unterminated block comment started at position 47 in SQL 
> /*abc*/ -- block comment
> select 1 as a
> /*
> 2 as b
> /*abc*/
> , 3 as c
> /**/
> . Expected */ sequence
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37389) Check unclosed bracketed comments

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37389:


Assignee: Apache Spark

> Check unclosed bracketed comments
> -
>
> Key: SPARK-37389
> URL: https://issues.apache.org/jira/browse/SPARK-37389
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> The SQL below has unclosed bracketed comment.
> {code:java}
> /*abc*/
> select 1 as a
> /*
> 2 as b
> /*abc*/
> , 3 as c
> /**/
> ;
> {code}
> But Spark will output:
> a
> 1
> PostgreSQL also supports the feature, and output:
> {code:java}
> SQL 错误 [42601]: Unterminated block comment started at position 47 in SQL 
> /*abc*/ -- block comment
> select 1 as a
> /*
> 2 as b
> /*abc*/
> , 3 as c
> /**/
> . Expected */ sequence
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37389) Check unclosed bracketed comments

2021-11-19 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-37389:
--

 Summary: Check unclosed bracketed comments
 Key: SPARK-37389
 URL: https://issues.apache.org/jira/browse/SPARK-37389
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng


The SQL below has unclosed bracketed comment.
{code:java}
/*abc*/
select 1 as a
/*

2 as b
/*abc*/
, 3 as c

/**/
;
{code}
But Spark will output:
a
1
PostgreSQL also supports the feature, and output:

{code:java}
SQL 错误 [42601]: Unterminated block comment started at position 47 in SQL 
/*abc*/ -- block comment
select 1 as a
/*

2 as b
/*abc*/
, 3 as c

/**/
. Expected */ sequence
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37388) WidthBucket throws NullPointerException in WholeStageCodegenExec

2021-11-19 Thread Tom van Bussel (Jira)
Tom van Bussel created SPARK-37388:
--

 Summary: WidthBucket throws NullPointerException in 
WholeStageCodegenExec
 Key: SPARK-37388
 URL: https://issues.apache.org/jira/browse/SPARK-37388
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Tom van Bussel


Repro: Disable ConstantFolding and run
{code:java}
SELECT width_bucket(3.5, 3.0, 3.0, 888) {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37387) Allow nondeterministic expression in aggregate function

2021-11-19 Thread Leona Yoda (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446457#comment-17446457
 ] 

Leona Yoda commented on SPARK-37387:


[~zhenw] I found this behavior when I tried to find the filenames in a S3 
bucket, which contained a specific value.

Both {{RANDOM()}} and {{INPUT_FILE_NAME()}} are nondeterministic function. I 
referred to RANDOM() because it is widely implemented by RDBs and is easy to 
compare.

> Allow nondeterministic expression in aggregate function
> ---
>
> Key: SPARK-37387
> URL: https://issues.apache.org/jira/browse/SPARK-37387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Leona Yoda
>Priority: Minor
>
> Nondeterministic expression in aggregate function is not allow in spark, so 
> we cannot execute query like
> {code:java}
> SELECT COUNT(RANDOM());
> {code}
> and raise \{{nondeterministic expression ... should not appear in the 
> arguments of an aggregate function. }}error message.
> [related code 
> section|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L298]
>  
> Hence other DB like PostgreSQL, we can call the SQL.
> {code:java}
> postgres=# SELECT COUNT(RANDOM());
>  count
> ---
>      1
> (1 row) {code}
>  
> I tried to remove the error message section, then I found spark could execute 
> the query. 
> {code:java}
> scala> spark.sql("SELECT COUNT(RANDOM())").show()
> +-+
> |count(rand())|
> +-+
> |            1|
> +-+ {code}
>  
> It could be useful for spark users to be able to execute those kinds of 
> queries because they can simply call
> {code:java}
> spark.sql("SELECT COUNT(DISTINCT(INPUT_FILE_NAME())) FROM table WHERE ...") 
> {code}
> to find target files, for example.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36902) Migrate CreateTableAsSelectStatement to v2 command

2021-11-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446398#comment-17446398
 ] 

Apache Spark commented on SPARK-36902:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34667

> Migrate CreateTableAsSelectStatement to v2 command
> --
>
> Key: SPARK-36902
> URL: https://issues.apache.org/jira/browse/SPARK-36902
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36902) Migrate CreateTableAsSelectStatement to v2 command

2021-11-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446397#comment-17446397
 ] 

Apache Spark commented on SPARK-36902:
--

User 'dchvn' has created a pull request for this issue:
https://github.com/apache/spark/pull/34667

> Migrate CreateTableAsSelectStatement to v2 command
> --
>
> Key: SPARK-36902
> URL: https://issues.apache.org/jira/browse/SPARK-36902
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dgd_contributor
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37192) Migrate SHOW TBLPROPERTIES to use V2 command by default

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37192:


Assignee: Apache Spark

> Migrate SHOW TBLPROPERTIES to use V2 command by default
> ---
>
> Key: SPARK-37192
> URL: https://issues.apache.org/jira/browse/SPARK-37192
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> Migrate SHOW TBLPROPERTIES to use V2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37192) Migrate SHOW TBLPROPERTIES to use V2 command by default

2021-11-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37192:


Assignee: (was: Apache Spark)

> Migrate SHOW TBLPROPERTIES to use V2 command by default
> ---
>
> Key: SPARK-37192
> URL: https://issues.apache.org/jira/browse/SPARK-37192
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>
> Migrate SHOW TBLPROPERTIES to use V2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37192) Migrate SHOW TBLPROPERTIES to use V2 command by default

2021-11-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446371#comment-17446371
 ] 

Apache Spark commented on SPARK-37192:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/34666

> Migrate SHOW TBLPROPERTIES to use V2 command by default
> ---
>
> Key: SPARK-37192
> URL: https://issues.apache.org/jira/browse/SPARK-37192
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
> Fix For: 3.3.0
>
>
> Migrate SHOW TBLPROPERTIES to use V2 command by default



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37382) `with as` clause got inconsistent results

2021-11-19 Thread Zhen Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446367#comment-17446367
 ] 

Zhen Wang commented on SPARK-37382:
---

https://issues.apache.org/jira/browse/SPARK-36447 related?

> `with as` clause got inconsistent results
> -
>
> Key: SPARK-37382
> URL: https://issues.apache.org/jira/browse/SPARK-37382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: caican
>Priority: Major
>
> In Spark3.1, the `with as` clause in the same SQL is executed multiple times, 
>  got different results
> `
> with tab as (
>  select 'Withas' as name, rand() as rand_number
> )
> select name, rand_number
> from tab
> union all
> select name, rand_number
> from tab
> `
> !https://internal-api-lark-file.f.mioffice.cn/api/image/keys/img_bcf6f867-6aee-4afe-bc43-30bf4f2dbdel?message_id=7032102765711097965!
> But In spark2.3, it got consistent results
> `
> with tab as (
>  select 'Withas' as name, rand() as rand_number
> )
> select name, rand_number
> from tab
> union all
> select name, rand_number
> from tab
> `
> !https://internal-api-lark-file.f.mioffice.cn/api/image/keys/img_6dc6e44b-d4a5-4b0d-bd2c-00859ec80a1l?message_id=7032104202756751468!
> Why does Spark3.1.2 return different results?
> Has anyone encountered this problem?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37387) Allow nondeterministic expression in aggregate function

2021-11-19 Thread Zhen Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446365#comment-17446365
 ] 

Zhen Wang commented on SPARK-37387:
---

[~yoda-mon] what is the use case behind, random doesn't seem justify your usage

> Allow nondeterministic expression in aggregate function
> ---
>
> Key: SPARK-37387
> URL: https://issues.apache.org/jira/browse/SPARK-37387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Leona Yoda
>Priority: Minor
>
> Nondeterministic expression in aggregate function is not allow in spark, so 
> we cannot execute query like
> {code:java}
> SELECT COUNT(RANDOM());
> {code}
> and raise \{{nondeterministic expression ... should not appear in the 
> arguments of an aggregate function. }}error message.
> [related code 
> section|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L298]
>  
> Hence other DB like PostgreSQL, we can call the SQL.
> {code:java}
> postgres=# SELECT COUNT(RANDOM());
>  count
> ---
>      1
> (1 row) {code}
>  
> I tried to remove the error message section, then I found spark could execute 
> the query. 
> {code:java}
> scala> spark.sql("SELECT COUNT(RANDOM())").show()
> +-+
> |count(rand())|
> +-+
> |            1|
> +-+ {code}
>  
> It could be useful for spark users to be able to execute those kinds of 
> queries because they can simply call
> {code:java}
> spark.sql("SELECT COUNT(DISTINCT(INPUT_FILE_NAME())) FROM table WHERE ...") 
> {code}
> to find target files, for example.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37382) `with as` clause got inconsistent results

2021-11-19 Thread jiasheng55 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446344#comment-17446344
 ] 

jiasheng55 commented on SPARK-37382:


The imsages are broken, please have a look:)

> `with as` clause got inconsistent results
> -
>
> Key: SPARK-37382
> URL: https://issues.apache.org/jira/browse/SPARK-37382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: caican
>Priority: Major
>
> In Spark3.1, the `with as` clause in the same SQL is executed multiple times, 
>  got different results
> `
> with tab as (
>  select 'Withas' as name, rand() as rand_number
> )
> select name, rand_number
> from tab
> union all
> select name, rand_number
> from tab
> `
> !https://internal-api-lark-file.f.mioffice.cn/api/image/keys/img_bcf6f867-6aee-4afe-bc43-30bf4f2dbdel?message_id=7032102765711097965!
> But In spark2.3, it got consistent results
> `
> with tab as (
>  select 'Withas' as name, rand() as rand_number
> )
> select name, rand_number
> from tab
> union all
> select name, rand_number
> from tab
> `
> !https://internal-api-lark-file.f.mioffice.cn/api/image/keys/img_6dc6e44b-d4a5-4b0d-bd2c-00859ec80a1l?message_id=7032104202756751468!
> Why does Spark3.1.2 return different results?
> Has anyone encountered this problem?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37382) `with as` clause got inconsistent results

2021-11-19 Thread jiasheng55 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17446344#comment-17446344
 ] 

jiasheng55 edited comment on SPARK-37382 at 11/19/21, 7:59 AM:
---

The images are broken, please have a look:)


was (Author: victor-wong):
The imsages are broken, please have a look:)

> `with as` clause got inconsistent results
> -
>
> Key: SPARK-37382
> URL: https://issues.apache.org/jira/browse/SPARK-37382
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: caican
>Priority: Major
>
> In Spark3.1, the `with as` clause in the same SQL is executed multiple times, 
>  got different results
> `
> with tab as (
>  select 'Withas' as name, rand() as rand_number
> )
> select name, rand_number
> from tab
> union all
> select name, rand_number
> from tab
> `
> !https://internal-api-lark-file.f.mioffice.cn/api/image/keys/img_bcf6f867-6aee-4afe-bc43-30bf4f2dbdel?message_id=7032102765711097965!
> But In spark2.3, it got consistent results
> `
> with tab as (
>  select 'Withas' as name, rand() as rand_number
> )
> select name, rand_number
> from tab
> union all
> select name, rand_number
> from tab
> `
> !https://internal-api-lark-file.f.mioffice.cn/api/image/keys/img_6dc6e44b-d4a5-4b0d-bd2c-00859ec80a1l?message_id=7032104202756751468!
> Why does Spark3.1.2 return different results?
> Has anyone encountered this problem?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org