[jira] [Resolved] (SPARK-33704) Support latest version of initialize() in HiveGenericUDTF

2020-12-31 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-33704.
-
Resolution: Duplicate

> Support latest version of initialize() in HiveGenericUDTF
> -
>
> Key: SPARK-33704
> URL: https://issues.apache.org/jira/browse/SPARK-33704
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.4, 2.4.3
>Reporter: chenliang
>Priority: Major
>
> For HiveGenericUDTF , there are two initialization methods:
> {code:java}
>   public StructObjectInspector initialize(StructObjectInspector argOIs)
>   throws UDFArgumentException {
> List inputFields = argOIs.getAllStructFieldRefs();
> ObjectInspector[] udtfInputOIs = new ObjectInspector[inputFields.size()];
> for (int i = 0; i < inputFields.size(); i++) {
>   udtfInputOIs[i] = inputFields.get(i).getFieldObjectInspector();
> }
> return initialize(udtfInputOIs);
>   }
>   @Deprecated
>   public StructObjectInspector initialize(ObjectInspector[] argOIs)
>   throws UDFArgumentException {
> throw new IllegalStateException("Should not be called directly");
>   }
> {code}
> As https://issues.apache.org/jira/browse/HIVE-5737 mentioned, hive provided 
> StructObjectInspector for UDTFs rather than ObjectInspect[], but Spark SQL  
> still only support deprecated function.
> An exception will be reported before fix:
> Error in query: No handler for UDF/UDAF/UDTF 'FeatureParseUDTF1': 
> java.lang.IllegalStateException: Should not be called directly
> Please make sure your function overrides public StructObjectInspector 
> initialize(ObjectInspector[] args).; line 1 pos 7
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33953) readImages test fails due to NoClassDefFoundError for ImageTypeSpecifier

2020-12-31 Thread Ted Yu (Jira)
Ted Yu created SPARK-33953:
--

 Summary: readImages test fails due to NoClassDefFoundError for 
ImageTypeSpecifier
 Key: SPARK-33953
 URL: https://issues.apache.org/jira/browse/SPARK-33953
 Project: Spark
  Issue Type: Test
  Components: ML
Affects Versions: 3.0.1
Reporter: Ted Yu


>From https://github.com/apache/spark/pull/30984/checks?check_run_id=1630709203 
>:
```
[info]   org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 21.0 failed 1 times, most recent failure: Lost task 1.0 in 
stage 21.0 (TID 20) (fv-az212-589.internal.cloudapp.net executor driver): 
java.lang.NoClassDefFoundError: Could not initialize class 
javax.imageio.ImageTypeSpecifier
[info]  at 
com.sun.imageio.plugins.png.PNGImageReader.getImageTypes(PNGImageReader.java:1531)
[info]  at 
com.sun.imageio.plugins.png.PNGImageReader.readImage(PNGImageReader.java:1318)
```
It seems some dependency is missing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33931) Recover GitHub Action

2020-12-31 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257109#comment-17257109
 ] 

Apache Spark commented on SPARK-33931:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30986

> Recover GitHub Action
> -
>
> Key: SPARK-33931
> URL: https://issues.apache.org/jira/browse/SPARK-33931
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.8, 3.0.1, 3.1.0, 3.2.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-31 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33906:
-

Assignee: Baohe Zhang

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Assignee: Baohe Zhang
>Priority: Blocker
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33906) SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset

2020-12-31 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33906.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30920
[https://github.com/apache/spark/pull/30920]

> SPARK UI Executors page stuck when ExecutorSummary.peakMemoryMetrics is unset
> -
>
> Key: SPARK-33906
> URL: https://issues.apache.org/jira/browse/SPARK-33906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Baohe Zhang
>Assignee: Baohe Zhang
>Priority: Blocker
> Fix For: 3.1.0
>
> Attachments: executor-page.png
>
>
> How to reproduce it?
> In mac OS standalone mode, open a spark-shell and run
> $SPARK_HOME/bin/spark-shell --master spark://localhost:7077
> {code:scala}
> val x = sc.makeRDD(1 to 10, 5)
> x.count()
> {code}
> Then open the app UI in the browser, and click the Executors page, will get 
> stuck at this page: 
>  !executor-page.png! 
> Also the return JSON of REST API endpoint 
> http://localhost:4040/api/v1/applications/app-20201224134418-0003/executors 
> miss "peakMemoryMetrics" for executors.
> {noformat}
> [ {
>   "id" : "driver",
>   "hostPort" : "192.168.1.241:50042",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 0,
>   "maxTasks" : 0,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 0,
>   "totalTasks" : 0,
>   "totalDuration" : 0,
>   "totalGCTime" : 0,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:18.033GMT",
>   "executorLogs" : { },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "peakMemoryMetrics" : {
> "JVMHeapMemory" : 135021152,
> "JVMOffHeapMemory" : 149558576,
> "OnHeapExecutionMemory" : 0,
> "OffHeapExecutionMemory" : 0,
> "OnHeapStorageMemory" : 3301,
> "OffHeapStorageMemory" : 0,
> "OnHeapUnifiedMemory" : 3301,
> "OffHeapUnifiedMemory" : 0,
> "DirectPoolMemory" : 67963178,
> "MappedPoolMemory" : 0,
> "ProcessTreeJVMVMemory" : 0,
> "ProcessTreeJVMRSSMemory" : 0,
> "ProcessTreePythonVMemory" : 0,
> "ProcessTreePythonRSSMemory" : 0,
> "ProcessTreeOtherVMemory" : 0,
> "ProcessTreeOtherRSSMemory" : 0,
> "MinorGCCount" : 15,
> "MinorGCTime" : 101,
> "MajorGCCount" : 0,
> "MajorGCTime" : 0
>   },
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> }, {
>   "id" : "0",
>   "hostPort" : "192.168.1.241:50054",
>   "isActive" : true,
>   "rddBlocks" : 0,
>   "memoryUsed" : 0,
>   "diskUsed" : 0,
>   "totalCores" : 12,
>   "maxTasks" : 12,
>   "activeTasks" : 0,
>   "failedTasks" : 0,
>   "completedTasks" : 5,
>   "totalTasks" : 5,
>   "totalDuration" : 2107,
>   "totalGCTime" : 25,
>   "totalInputBytes" : 0,
>   "totalShuffleRead" : 0,
>   "totalShuffleWrite" : 0,
>   "isBlacklisted" : false,
>   "maxMemory" : 455501414,
>   "addTime" : "2020-12-24T19:44:20.335GMT",
>   "executorLogs" : {
> "stdout" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stdout;,
> "stderr" : 
> "http://192.168.1.241:8081/logPage/?appId=app-20201224134418-0003=0=stderr;
>   },
>   "memoryMetrics" : {
> "usedOnHeapStorageMemory" : 0,
> "usedOffHeapStorageMemory" : 0,
> "totalOnHeapStorageMemory" : 455501414,
> "totalOffHeapStorageMemory" : 0
>   },
>   "blacklistedInStages" : [ ],
>   "attributes" : { },
>   "resources" : { },
>   "resourceProfileId" : 0,
>   "isExcluded" : false,
>   "excludedInStages" : [ ]
> } ]
> {noformat}
> I debugged it and observed that ExecutorMetricsPoller
> .getExecutorUpdates returns an empty map, which causes peakExecutorMetrics to 
> None in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/status/LiveEntity.scala#L345.
>  The possible reason for returning the empty map is that the stage completion 
> time is shorter than the heartbeat interval, so the stage entry in stageTCMP 
> has already been removed before the reportHeartbeat is called.
> How to fix it?
> Check if the peakMemoryMetrics is undefined in executorspage.js.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31946) Failed to register SIGPWR handler on MacOS

2020-12-31 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31946.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30968
[https://github.com/apache/spark/pull/30968]

> Failed to register SIGPWR handler on MacOS
> --
>
> Key: SPARK-31946
> URL: https://issues.apache.org/jira/browse/SPARK-31946
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
> Environment: macOS 10.14.6
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.2.0
>
>
>  
> {code:java}
> 20/06/09 22:54:54 WARN SignalUtils: Failed to register SIGPWR handler - 
> disabling decommission feature.
> java.lang.IllegalArgumentException: Unknown signal: PWR
>   at sun.misc.Signal.(Signal.java:143)
>   at 
> org.apache.spark.util.SignalUtils$.$anonfun$register$1(SignalUtils.scala:83)
>   at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
>   at org.apache.spark.util.SignalUtils$.register(SignalUtils.scala:81)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.onStart(CoarseGrainedExecutorBackend.scala:86)
>   at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120)
>   at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203)
>   at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>   at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>   at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Seem like MacOS is *POSIX* compliant. But SIGPWR is not specified in the 
> *POSIX* specification. See [https://en.wikipedia.org/wiki/Signal_(IPC)#SIGPWR]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31946) Failed to register SIGPWR handler on MacOS

2020-12-31 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31946:
-

Assignee: wuyi

> Failed to register SIGPWR handler on MacOS
> --
>
> Key: SPARK-31946
> URL: https://issues.apache.org/jira/browse/SPARK-31946
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
> Environment: macOS 10.14.6
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
>  
> {code:java}
> 20/06/09 22:54:54 WARN SignalUtils: Failed to register SIGPWR handler - 
> disabling decommission feature.
> java.lang.IllegalArgumentException: Unknown signal: PWR
>   at sun.misc.Signal.(Signal.java:143)
>   at 
> org.apache.spark.util.SignalUtils$.$anonfun$register$1(SignalUtils.scala:83)
>   at scala.collection.mutable.HashMap.getOrElseUpdate(HashMap.scala:86)
>   at org.apache.spark.util.SignalUtils$.register(SignalUtils.scala:81)
>   at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.onStart(CoarseGrainedExecutorBackend.scala:86)
>   at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:120)
>   at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:203)
>   at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>   at 
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
>   at 
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> Seem like MacOS is *POSIX* compliant. But SIGPWR is not specified in the 
> *POSIX* specification. See [https://en.wikipedia.org/wiki/Signal_(IPC)#SIGPWR]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-19450) Replace askWithRetry with askSync.

2020-12-31 Thread Zhongwei Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-19450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257091#comment-17257091
 ] 

Zhongwei Zhu edited comment on SPARK-19450 at 12/31/20, 7:58 PM:
-

For old askWithRetry method, it can use provided config `spark.rpc.numRetries` 
and `spark.rpc.retry.wait`. The default value for `spark.rpc.numRetries` is 3. 
So I suppose it will retry 3 times if rpc failed, but now askSync is used 
without using above 2 configs. Could we support such retry? If retry is 
disabled intentionally, maybe doc need to be updated. [~jinxing6...@126.com] 
[~srowen]


was (Author: warrenzhu25):
For old askWithRetry method, it can use provided config `spark.rpc.numRetries` 
and `spark.rpc.retry.wait`. The default value for  `spark.rpc.numRetries` is 3. 
So I suppose it will retry 3 times if rpc failed, but now askSync is used 
without using above 2 configs. Does that mean no retry anymore? 
[~jinxing6...@126.com] [~srowen]

> Replace askWithRetry with askSync.
> --
>
> Key: SPARK-19450
> URL: https://issues.apache.org/jira/browse/SPARK-19450
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Minor
> Fix For: 2.2.0
>
>
> *askSync* is already added in *RpcEndpointRef* (see SPARK-19347 and 
> https://github.com/apache/spark/pull/16690#issuecomment-276850068) and 
> *askWithRetry* is marked as deprecated. 
> As mentioned 
> SPARK-18113(https://github.com/apache/spark/pull/16503#event-927953218):
> ??askWithRetry is basically an unneeded API, and a leftover from the akka 
> days that doesn't make sense anymore. It's prone to cause deadlocks (exactly 
> because it's blocking), it imposes restrictions on the caller (e.g. 
> idempotency) and other things that people generally don't pay that much 
> attention to when using it.??
> Since *askWithRetry* is just used inside spark and not in user logic. It 
> might make sense to replace all of them with *askSync*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19450) Replace askWithRetry with askSync.

2020-12-31 Thread Zhongwei Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-19450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257091#comment-17257091
 ] 

Zhongwei Zhu commented on SPARK-19450:
--

For old askWithRetry method, it can use provided config `spark.rpc.numRetries` 
and `spark.rpc.retry.wait`. The default value for  `spark.rpc.numRetries` is 3. 
So I suppose it will retry 3 times if rpc failed, but now askSync is used 
without using above 2 configs. Does that mean no retry anymore? 
[~jinxing6...@126.com] [~srowen]

> Replace askWithRetry with askSync.
> --
>
> Key: SPARK-19450
> URL: https://issues.apache.org/jira/browse/SPARK-19450
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Minor
> Fix For: 2.2.0
>
>
> *askSync* is already added in *RpcEndpointRef* (see SPARK-19347 and 
> https://github.com/apache/spark/pull/16690#issuecomment-276850068) and 
> *askWithRetry* is marked as deprecated. 
> As mentioned 
> SPARK-18113(https://github.com/apache/spark/pull/16503#event-927953218):
> ??askWithRetry is basically an unneeded API, and a leftover from the akka 
> days that doesn't make sense anymore. It's prone to cause deadlocks (exactly 
> because it's blocking), it imposes restrictions on the caller (e.g. 
> idempotency) and other things that people generally don't pay that much 
> attention to when using it.??
> Since *askWithRetry* is just used inside spark and not in user logic. It 
> might make sense to replace all of them with *askSync*.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2020-12-31 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257078#comment-17257078
 ] 

Sean R. Owen commented on SPARK-33948:
--

That looks like a possibly transient problem in the build system? are you sure 
it's a code problem?

> branch-3.1 jenkins test failed in Scala 2.13 
> -
>
> Key: SPARK-33948
> URL: https://issues.apache.org/jira/browse/SPARK-33948
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
> Environment: * 
>  
>Reporter: Yang Jie
>Priority: Major
>
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.returnDifferentClientsForDifferentServers|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/returnDifferentClientsForDifferentServers/]
>  
> 

[jira] [Commented] (SPARK-33730) Standardize warning types

2020-12-31 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257032#comment-17257032
 ] 

Apache Spark commented on SPARK-33730:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/30985

> Standardize warning types
> -
>
> Key: SPARK-33730
> URL: https://issues.apache.org/jira/browse/SPARK-33730
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Bryński
>Priority: Major
>
> We should use warnings properly per 
> [https://docs.python.org/3/library/warnings.html#warning-categories]
> In particular,
>  - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the 
> places we should show the warnings to end-users by default.
>  - we should __maybe__ think about customizing stacklevel 
> ([https://docs.python.org/3/library/warnings.html#warnings.warn]) like pandas 
> does.
>  - ...
> Current warnings are a bit messy and somewhat arbitrary.
> To be more explicit, we'll have to fix:
> {code:java}
> pyspark/context.py:warnings.warn(
> pyspark/context.py:warnings.warn(
> pyspark/ml/classification.py:warnings.warn("weightCol is 
> ignored, "
> pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will 
> be removed in future versions. Use "
> pyspark/mllib/classification.py:warnings.warn(
> pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
> are false. The model does nothing.")
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
> pyspark/rdd.py:warnings.warn(
> pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
> pyspark/shuffle.py:warnings.warn("Please install psutil to have 
> better "
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
> and value is not None. value will be ignored.")
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
> approx_count_distinct instead.", DeprecationWarning)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/functions.py:warnings.warn(
> pyspark/sql/pandas/group_ops.py:warnings.warn(
> pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
> support because failing to access HiveConf, "
> {code}
> PySpark prints warnings via using {{print}} in some places as well. We should 
> also see if we should switch and replace to {{warnings.warn}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33730) Standardize warning types

2020-12-31 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33730:


Assignee: Apache Spark  (was: Maciej Bryński)

> Standardize warning types
> -
>
> Key: SPARK-33730
> URL: https://issues.apache.org/jira/browse/SPARK-33730
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> We should use warnings properly per 
> [https://docs.python.org/3/library/warnings.html#warning-categories]
> In particular,
>  - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the 
> places we should show the warnings to end-users by default.
>  - we should __maybe__ think about customizing stacklevel 
> ([https://docs.python.org/3/library/warnings.html#warnings.warn]) like pandas 
> does.
>  - ...
> Current warnings are a bit messy and somewhat arbitrary.
> To be more explicit, we'll have to fix:
> {code:java}
> pyspark/context.py:warnings.warn(
> pyspark/context.py:warnings.warn(
> pyspark/ml/classification.py:warnings.warn("weightCol is 
> ignored, "
> pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will 
> be removed in future versions. Use "
> pyspark/mllib/classification.py:warnings.warn(
> pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
> are false. The model does nothing.")
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
> pyspark/rdd.py:warnings.warn(
> pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
> pyspark/shuffle.py:warnings.warn("Please install psutil to have 
> better "
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
> and value is not None. value will be ignored.")
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
> approx_count_distinct instead.", DeprecationWarning)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/functions.py:warnings.warn(
> pyspark/sql/pandas/group_ops.py:warnings.warn(
> pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
> support because failing to access HiveConf, "
> {code}
> PySpark prints warnings via using {{print}} in some places as well. We should 
> also see if we should switch and replace to {{warnings.warn}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33730) Standardize warning types

2020-12-31 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33730:


Assignee: Maciej Bryński  (was: Apache Spark)

> Standardize warning types
> -
>
> Key: SPARK-33730
> URL: https://issues.apache.org/jira/browse/SPARK-33730
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Bryński
>Priority: Major
>
> We should use warnings properly per 
> [https://docs.python.org/3/library/warnings.html#warning-categories]
> In particular,
>  - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the 
> places we should show the warnings to end-users by default.
>  - we should __maybe__ think about customizing stacklevel 
> ([https://docs.python.org/3/library/warnings.html#warnings.warn]) like pandas 
> does.
>  - ...
> Current warnings are a bit messy and somewhat arbitrary.
> To be more explicit, we'll have to fix:
> {code:java}
> pyspark/context.py:warnings.warn(
> pyspark/context.py:warnings.warn(
> pyspark/ml/classification.py:warnings.warn("weightCol is 
> ignored, "
> pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will 
> be removed in future versions. Use "
> pyspark/mllib/classification.py:warnings.warn(
> pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
> are false. The model does nothing.")
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
> pyspark/rdd.py:warnings.warn(
> pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
> pyspark/shuffle.py:warnings.warn("Please install psutil to have 
> better "
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
> and value is not None. value will be ignored.")
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
> approx_count_distinct instead.", DeprecationWarning)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/functions.py:warnings.warn(
> pyspark/sql/pandas/group_ops.py:warnings.warn(
> pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
> support because failing to access HiveConf, "
> {code}
> PySpark prints warnings via using {{print}} in some places as well. We should 
> also see if we should switch and replace to {{warnings.warn}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33730) Standardize warning types

2020-12-31 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33730:


Assignee: Maciej Bryński  (was: Apache Spark)

> Standardize warning types
> -
>
> Key: SPARK-33730
> URL: https://issues.apache.org/jira/browse/SPARK-33730
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Bryński
>Priority: Major
>
> We should use warnings properly per 
> [https://docs.python.org/3/library/warnings.html#warning-categories]
> In particular,
>  - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the 
> places we should show the warnings to end-users by default.
>  - we should __maybe__ think about customizing stacklevel 
> ([https://docs.python.org/3/library/warnings.html#warnings.warn]) like pandas 
> does.
>  - ...
> Current warnings are a bit messy and somewhat arbitrary.
> To be more explicit, we'll have to fix:
> {code:java}
> pyspark/context.py:warnings.warn(
> pyspark/context.py:warnings.warn(
> pyspark/ml/classification.py:warnings.warn("weightCol is 
> ignored, "
> pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will 
> be removed in future versions. Use "
> pyspark/mllib/classification.py:warnings.warn(
> pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd 
> are false. The model does nothing.")
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/mllib/regression.py:warnings.warn(
> pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; "
> pyspark/rdd.py:warnings.warn(
> pyspark/shell.py:warnings.warn("Failed to initialize Spark session.")
> pyspark/shuffle.py:warnings.warn("Please install psutil to have 
> better "
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/catalog.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/column.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/context.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn(
> pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict 
> and value is not None. value will be ignored.")
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians 
> instead.", DeprecationWarning)
> pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use 
> approx_count_distinct instead.", DeprecationWarning)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/conversion.py:warnings.warn(msg)
> pyspark/sql/pandas/functions.py:warnings.warn(
> pyspark/sql/pandas/group_ops.py:warnings.warn(
> pyspark/sql/session.py:warnings.warn("Fall back to non-hive 
> support because failing to access HiveConf, "
> {code}
> PySpark prints warnings via using {{print}} in some places as well. We should 
> also see if we should switch and replace to {{warnings.warn}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-31 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257027#comment-17257027
 ] 

Apache Spark commented on SPARK-33915:
--

User 'tedyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/30984

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33915) Allow json expression to be pushable column

2020-12-31 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33915:


Assignee: Apache Spark

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Assignee: Apache Spark
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33915) Allow json expression to be pushable column

2020-12-31 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33915:


Assignee: (was: Apache Spark)

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33915) Allow json expression to be pushable column

2020-12-31 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33915:


Assignee: Apache Spark

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Assignee: Apache Spark
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33915) Allow json expression to be pushable column

2020-12-31 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17257025#comment-17257025
 ] 

Ted Yu commented on SPARK-33915:


Opened https://github.com/apache/spark/pull/30984

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33915) Allow json expression to be pushable column

2020-12-31 Thread Ted Yu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255389#comment-17255389
 ] 

Ted Yu edited comment on SPARK-33915 at 12/31/20, 3:16 PM:
---

Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.code) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) 
AS phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [[phone->'code' = ?, 1200]]
 - Requested Columns: [id,address,phone]

{code}


was (Author: yuzhih...@gmail.com):
Here is the plan prior to predicate pushdown:
{code}
2020-12-26 03:28:59,926 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) AS 
phone#33]
   +- Filter (get_json_object(phone#37, $.phone) = 1200)
  +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: []
 - Requested Columns: [id,address,phone]
{code}
Here is the plan with pushdown:
{code}
2020-12-28 01:40:08,150 (Time-limited test) [DEBUG - 
org.apache.spark.internal.Logging.logDebug(Logging.scala:61)] Adaptive 
execution enabled for plan: Sort [id#34 ASC NULLS FIRST], true, 0
+- Project [id#34, address#35, phone#37, get_json_object(phone#37, $.code) 
AS phone#33]
   +- BatchScan[id#34, address#35, phone#37] Cassandra Scan: test.person
 - Cassandra Filters: [[phone->'phone' = ?, 1200]]
 - Requested Columns: [id,address,phone]

{code}

> Allow json expression to be pushable column
> ---
>
> Key: SPARK-33915
> URL: https://issues.apache.org/jira/browse/SPARK-33915
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Ted Yu
>Priority: Major
>
> Currently PushableColumnBase provides no support for json / jsonb expression.
> Example of json expression:
> {code}
> get_json_object(phone, '$.code') = '1200'
> {code}
> If non-string literal is part of the expression, the presence of cast() would 
> complicate the situation.
> Implication is that implementation of SupportsPushDownFilters doesn't have a 
> chance to perform pushdown even if third party DB engine supports json 
> expression pushdown.
> This issue is for discussion and implementation of Spark core changes which 
> would allow json expression to be recognized as pushable column.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33943) Zookeeper LeaderElection Agent not being called by Spark Master

2020-12-31 Thread Saloni (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256965#comment-17256965
 ] 

Saloni commented on SPARK-33943:


If we increase the timeouts/no. of retries, will that resolve the issue i.e. 
will it ensure that the ZooKeeper LeaderElection Agent is called?

Because, the crux of it boils down to understanding why after successful 
establishment of the session, LeaderElection Agent is not called.

> Zookeeper LeaderElection Agent not being called by Spark Master
> ---
>
> Key: SPARK-33943
> URL: https://issues.apache.org/jira/browse/SPARK-33943
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: 2 Spark Masters KVMs and 3 Zookeeper KVMs.
>  Operating System - RHEL 6.10
>Reporter: Saloni
>Priority: Major
>
> I have 2 spark masters and 3 zookeepers deployed on my system on separate 
> virtual machines. I am using spark in standalone mode.
> The services come up online in the below sequence:
>  # zookeeper-1
>  # sparkmaster-1
>  # sparkmaster-2
>  # zookeeper-2
>  # zookeeper-3
> The above sequence leads to both the spark masters running in STANDBY mode.
> From the logs, I can see that only after zookeeper-2 service comes up (i.e. 2 
> zookeeper services are up), spark master is successfully able to create a 
> zookeeper session. Until zookeeper-2 is up, it re-tries session creation. 
> However, after both zookeeper services are up and Persistence Engine is able 
> to successfully connect and create a session; *the ZooKeeper LeaderElection 
> Agent is not called*.
> Logs (spark-master.log):
> {code:java}
> 10:03:47.241 INFO org.apache.spark.internal.Logging:57 - Persisting recovery 
> state to ZooKeeper Initiating client connection, 
> connectString=zookeeper-2:,zookeeper-3:,zookeeper-1: 
> sessionTimeout=6 watcher=org.apache.curator.ConnectionState
> # Only zookeeper-2 is online #
> 10:03:47.630 INFO org.apache.zookeeper.ClientCnxn$SendThread:1025 - Opening 
> socket connection to server zookeeper-1:. Will not attempt to 
> authenticate using SASL (unknown error)
> 10:03:50.635 INFO org.apache.zookeeper.ClientCnxn$SendThread:1162 - Socket 
> error occurred: zookeeper-1:: No route to host
> 10:03:50.738 INFO org.apache.zookeeper.ClientCnxn$SendThread:1025 - Opening 
> socket connection to server zookeeper-2:. Will not attempt to 
> authenticate using SASL (unknown error)
> 10:03:50.739 INFO org.apache.zookeeper.ClientCnxn$SendThread:879 - Socket 
> connection established to zookeeper-2:, initiating session
> 10:03:50.742 INFO org.apache.zookeeper.ClientCnxn$SendThread:1158 - Unable to 
> read additional data from server sessionid 0x0, likely server has closed 
> socket, closing socket connection and attempting reconnect
> 10:03:51.842 INFO org.apache.zookeeper.ClientCnxn$SendThread:1025 - Opening 
> socket connection to server zookeeper-3:. Will not attempt to 
> authenticate using SASL (unknown error)
> 10:03:51.843 INFO org.apache.zookeeper.ClientCnxn$SendThread:1162 - Socket 
> error occurred: zookeeper-3:: Connection refused 
> 10:04:02.685 ERROR org.apache.curator.ConnectionState:200 - Connection timed 
> out for connection string 
> (zookeeper-2:,zookeeper-3:,zookeeper-1:) and timeout (15000) / 
> elapsed (15274)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
> ConnectionLoss 
>   at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) 
>   at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
> ...
> ...
> ...
> 10:04:22.691 ERROR org.apache.curator.ConnectionState:200 - Connection timed 
> out for connection string 
> (zookeeper-2:,zookeeper-3:,zookeeper-1:) and timeout (15000) / 
> elapsed (35297) org.apache.curator.CuratorConnectionLossException: 
> KeeperErrorCode = ConnectionLoss 
>   at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197)
> ...
> ...
> ...
> 10:04:42.696 ERROR org.apache.curator.ConnectionState:200 - Connection timed 
> out for connection string 
> (zookeeper-2:,zookeeper-3:,zookeeper-1:) and timeout (15000) / 
> elapsed (55301) org.apache.curator.CuratorConnectionLossException: 
> KeeperErrorCode = ConnectionLoss 
>   at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) 
>   at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
> ...
> ...
> ...
> 10:05:32.699 WARN org.apache.curator.ConnectionState:191 - Connection attempt 
> unsuccessful after 105305 (greater than max timeout of 6). Resetting 
> connection and trying again with a new connection. 
> 10:05:32.864 INFO org.apache.zookeeper.ZooKeeper:693 - Session: 0x0 closed 
> 

[jira] [Commented] (SPARK-33943) Zookeeper LeaderElection Agent not being called by Spark Master

2020-12-31 Thread Saloni (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256962#comment-17256962
 ] 

Saloni commented on SPARK-33943:


As per my understanding, the re-tries going on in the logs for establishing a 
successful zookeeper session are for 'Persisting recovery state to ZooKeeper'.
{code:java}
10:03:47.241 INFO org.apache.spark.internal.Logging:57 - Persisting recovery 
state to ZooKeeper Initiating client connection, 
connectString=zookeeper-2:,zookeeper-3:,zookeeper-1: 
sessionTimeout=6 watcher=org.apache.curator.ConnectionState
{code}
Once this is successfully established, then the Zookeeper LeaderElection Agent 
is ideally called.

The last lines in the log state that a session was successfully created, it 
seems this was for the Persistence Engine (since for this the connection was 
initiated).

 
{code:java}
10:05:57.566 INFO org.apache.zookeeper.ClientCnxn$SendThread:879 - Socket 
connection established to zookeeper-3:, initiating session 
10:05:57.574 INFO org.apache.zookeeper.ClientCnxn$SendThread:1299 - Session 
establishment complete on server zookeeper-3:, sessionid = , negotiated 
timeout = 4 
10:05:57.580 INFO org.apache.curator.framework.state.ConnectionStateManager:228 
- State change: CONNECTED
{code}
What I don't understand is why the Zookeeper LeaderElection Agent was not 
called if sparkMaster was able to connect to the zookeepers?

 

> Zookeeper LeaderElection Agent not being called by Spark Master
> ---
>
> Key: SPARK-33943
> URL: https://issues.apache.org/jira/browse/SPARK-33943
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: 2 Spark Masters KVMs and 3 Zookeeper KVMs.
>  Operating System - RHEL 6.10
>Reporter: Saloni
>Priority: Major
>
> I have 2 spark masters and 3 zookeepers deployed on my system on separate 
> virtual machines. I am using spark in standalone mode.
> The services come up online in the below sequence:
>  # zookeeper-1
>  # sparkmaster-1
>  # sparkmaster-2
>  # zookeeper-2
>  # zookeeper-3
> The above sequence leads to both the spark masters running in STANDBY mode.
> From the logs, I can see that only after zookeeper-2 service comes up (i.e. 2 
> zookeeper services are up), spark master is successfully able to create a 
> zookeeper session. Until zookeeper-2 is up, it re-tries session creation. 
> However, after both zookeeper services are up and Persistence Engine is able 
> to successfully connect and create a session; *the ZooKeeper LeaderElection 
> Agent is not called*.
> Logs (spark-master.log):
> {code:java}
> 10:03:47.241 INFO org.apache.spark.internal.Logging:57 - Persisting recovery 
> state to ZooKeeper Initiating client connection, 
> connectString=zookeeper-2:,zookeeper-3:,zookeeper-1: 
> sessionTimeout=6 watcher=org.apache.curator.ConnectionState
> # Only zookeeper-2 is online #
> 10:03:47.630 INFO org.apache.zookeeper.ClientCnxn$SendThread:1025 - Opening 
> socket connection to server zookeeper-1:. Will not attempt to 
> authenticate using SASL (unknown error)
> 10:03:50.635 INFO org.apache.zookeeper.ClientCnxn$SendThread:1162 - Socket 
> error occurred: zookeeper-1:: No route to host
> 10:03:50.738 INFO org.apache.zookeeper.ClientCnxn$SendThread:1025 - Opening 
> socket connection to server zookeeper-2:. Will not attempt to 
> authenticate using SASL (unknown error)
> 10:03:50.739 INFO org.apache.zookeeper.ClientCnxn$SendThread:879 - Socket 
> connection established to zookeeper-2:, initiating session
> 10:03:50.742 INFO org.apache.zookeeper.ClientCnxn$SendThread:1158 - Unable to 
> read additional data from server sessionid 0x0, likely server has closed 
> socket, closing socket connection and attempting reconnect
> 10:03:51.842 INFO org.apache.zookeeper.ClientCnxn$SendThread:1025 - Opening 
> socket connection to server zookeeper-3:. Will not attempt to 
> authenticate using SASL (unknown error)
> 10:03:51.843 INFO org.apache.zookeeper.ClientCnxn$SendThread:1162 - Socket 
> error occurred: zookeeper-3:: Connection refused 
> 10:04:02.685 ERROR org.apache.curator.ConnectionState:200 - Connection timed 
> out for connection string 
> (zookeeper-2:,zookeeper-3:,zookeeper-1:) and timeout (15000) / 
> elapsed (15274)
> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = 
> ConnectionLoss 
>   at 
> org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) 
>   at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87)
> ...
> ...
> ...
> 10:04:22.691 ERROR org.apache.curator.ConnectionState:200 - Connection timed 
> out for connection string 
> (zookeeper-2:,zookeeper-3:,zookeeper-1:) and 

[jira] [Commented] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2020-12-31 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256946#comment-17256946
 ] 

Apache Spark commented on SPARK-33950:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30983

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2020-12-31 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33950:


Assignee: Apache Spark

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2020-12-31 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33950:


Assignee: (was: Apache Spark)

> ALTER TABLE .. DROP PARTITION doesn't refresh cache
> ---
>
> Key: SPARK-33950
> URL: https://issues.apache.org/jira/browse/SPARK-33950
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Here is the example to reproduce the issue:
> {code:sql}
> spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED 
> BY (part0);
> spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
> spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
> spark-sql> CACHE TABLE tbl1;
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
> spark-sql> SELECT * FROM tbl1;
> 0 0
> 1 1
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33951) Distinguish the error between filter and distinct

2020-12-31 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33951:


Assignee: (was: Apache Spark)

> Distinguish the error between filter and distinct
> -
>
> Key: SPARK-33951
> URL: https://issues.apache.org/jira/browse/SPARK-33951
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> The error messages for specifying filter and distinct for the aggregate 
> function are mixed together and should be separated. This can increase 
> readability and ease of use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33951) Distinguish the error between filter and distinct

2020-12-31 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256944#comment-17256944
 ] 

Apache Spark commented on SPARK-33951:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/30982

> Distinguish the error between filter and distinct
> -
>
> Key: SPARK-33951
> URL: https://issues.apache.org/jira/browse/SPARK-33951
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> The error messages for specifying filter and distinct for the aggregate 
> function are mixed together and should be separated. This can increase 
> readability and ease of use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33951) Distinguish the error between filter and distinct

2020-12-31 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256943#comment-17256943
 ] 

Apache Spark commented on SPARK-33951:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/30982

> Distinguish the error between filter and distinct
> -
>
> Key: SPARK-33951
> URL: https://issues.apache.org/jira/browse/SPARK-33951
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Priority: Major
>
> The error messages for specifying filter and distinct for the aggregate 
> function are mixed together and should be separated. This can increase 
> readability and ease of use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33951) Distinguish the error between filter and distinct

2020-12-31 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33951:


Assignee: Apache Spark

> Distinguish the error between filter and distinct
> -
>
> Key: SPARK-33951
> URL: https://issues.apache.org/jira/browse/SPARK-33951
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> The error messages for specifying filter and distinct for the aggregate 
> function are mixed together and should be separated. This can increase 
> readability and ease of use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33952) Python-friendly dtypes for pyspark dataframes

2020-12-31 Thread Marc de Lignie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc de Lignie updated SPARK-33952:
---
Fix Version/s: 3.2.0

> Python-friendly dtypes for pyspark dataframes
> -
>
> Key: SPARK-33952
> URL: https://issues.apache.org/jira/browse/SPARK-33952
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Marc de Lignie
>Priority: Minor
> Fix For: 3.2.0
>
>
> The pyspark.sql.DataFrame.dtypes attribute contains string representations of 
> the column datatypes in terms of JVM datatypes. However, for a python user it 
> is a significant mental step to translate these to the corresponding python 
> types encountered in UDF's and collected dataframes. This holds in particular 
> for nested composite datatypes (array, map and struct). It is proposed to 
> provide python-friendly dtypes in pyspark (as an addition, not a replacement) 
> in which array<>, map<> and struct<> are translated to [], {} and Row().
> Sample code, including tests, is available as [gist on 
> github|https://gist.github.com/vtslab/81ded1a7af006100e00bf2a4a70a8147]. More 
> explanation is provided at: 
> [https://yaaics.blogspot.com/2020/12/python-friendly-dtypes-for-pyspark.html]
> If this proposal finds sufficient support, I can provide a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33952) Python-friendly dtypes for pyspark dataframes

2020-12-31 Thread Marc de Lignie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc de Lignie updated SPARK-33952:
---
Issue Type: Improvement  (was: Task)

> Python-friendly dtypes for pyspark dataframes
> -
>
> Key: SPARK-33952
> URL: https://issues.apache.org/jira/browse/SPARK-33952
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Marc de Lignie
>Priority: Minor
> Fix For: 3.2.0
>
>
> The pyspark.sql.DataFrame.dtypes attribute contains string representations of 
> the column datatypes in terms of JVM datatypes. However, for a python user it 
> is a significant mental step to translate these to the corresponding python 
> types encountered in UDF's and collected dataframes. This holds in particular 
> for nested composite datatypes (array, map and struct). It is proposed to 
> provide python-friendly dtypes in pyspark (as an addition, not a replacement) 
> in which array<>, map<> and struct<> are translated to [], {} and Row().
> Sample code, including tests, is available as [gist on 
> github|https://gist.github.com/vtslab/81ded1a7af006100e00bf2a4a70a8147]. More 
> explanation is provided at: 
> [https://yaaics.blogspot.com/2020/12/python-friendly-dtypes-for-pyspark.html]
> If this proposal finds sufficient support, I can provide a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33952) Python-friendly dtypes for pyspark dataframes

2020-12-31 Thread Marc de Lignie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc de Lignie updated SPARK-33952:
---
Affects Version/s: (was: 3.0.1)
   3.2.0

> Python-friendly dtypes for pyspark dataframes
> -
>
> Key: SPARK-33952
> URL: https://issues.apache.org/jira/browse/SPARK-33952
> Project: Spark
>  Issue Type: Task
>  Components: PySpark
>Affects Versions: 3.2.0
>Reporter: Marc de Lignie
>Priority: Minor
>
> The pyspark.sql.DataFrame.dtypes attribute contains string representations of 
> the column datatypes in terms of JVM datatypes. However, for a python user it 
> is a significant mental step to translate these to the corresponding python 
> types encountered in UDF's and collected dataframes. This holds in particular 
> for nested composite datatypes (array, map and struct). It is proposed to 
> provide python-friendly dtypes in pyspark (as an addition, not a replacement) 
> in which array<>, map<> and struct<> are translated to [], {} and Row().
> Sample code, including tests, is available as [gist on 
> github|https://gist.github.com/vtslab/81ded1a7af006100e00bf2a4a70a8147]. More 
> explanation is provided at: 
> [https://yaaics.blogspot.com/2020/12/python-friendly-dtypes-for-pyspark.html]
> If this proposal finds sufficient support, I can provide a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33952) Python-friendly dtypes for pyspark dataframes

2020-12-31 Thread Marc de Lignie (Jira)
Marc de Lignie created SPARK-33952:
--

 Summary: Python-friendly dtypes for pyspark dataframes
 Key: SPARK-33952
 URL: https://issues.apache.org/jira/browse/SPARK-33952
 Project: Spark
  Issue Type: Task
  Components: PySpark
Affects Versions: 3.0.1
Reporter: Marc de Lignie


The pyspark.sql.DataFrame.dtypes attribute contains string representations of 
the column datatypes in terms of JVM datatypes. However, for a python user it 
is a significant mental step to translate these to the corresponding python 
types encountered in UDF's and collected dataframes. This holds in particular 
for nested composite datatypes (array, map and struct). It is proposed to 
provide python-friendly dtypes in pyspark (as an addition, not a replacement) 
in which array<>, map<> and struct<> are translated to [], {} and Row().

Sample code, including tests, is available as [gist on 
github|https://gist.github.com/vtslab/81ded1a7af006100e00bf2a4a70a8147]. More 
explanation is provided at: 
[https://yaaics.blogspot.com/2020/12/python-friendly-dtypes-for-pyspark.html]

If this proposal finds sufficient support, I can provide a PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33951) Distinguish the error between filter and distinct

2020-12-31 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-33951:
--

 Summary: Distinguish the error between filter and distinct
 Key: SPARK-33951
 URL: https://issues.apache.org/jira/browse/SPARK-33951
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: jiaan.geng


The error messages for specifying filter and distinct for the aggregate 
function are mixed together and should be separated. This can increase 
readability and ease of use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33950) ALTER TABLE .. DROP PARTITION doesn't refresh cache

2020-12-31 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33950:
--

 Summary: ALTER TABLE .. DROP PARTITION doesn't refresh cache
 Key: SPARK-33950
 URL: https://issues.apache.org/jira/browse/SPARK-33950
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: Maxim Gekk


Here is the example to reproduce the issue:
{code:sql}
spark-sql> CREATE TABLE tbl1 (col0 int, part0 int) USING parquet PARTITIONED BY 
(part0);
spark-sql> INSERT INTO tbl1 PARTITION (part0=0) SELECT 0;
spark-sql> INSERT INTO tbl1 PARTITION (part0=1) SELECT 1;
spark-sql> CACHE TABLE tbl1;
spark-sql> SELECT * FROM tbl1;
0   0
1   1
spark-sql> ALTER TABLE tbl1 DROP PARTITION (part0=0);
spark-sql> SELECT * FROM tbl1;
0   0
1   1
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not

2020-12-31 Thread ulysses you (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-33949:

Description: 
This code will fail because folabe value not fold, we should keep 
{code:java}
val excludedRules = Seq(ConstantFolding, 
ReorderAssociativeOperator).map(_.ruleName)
withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
excludedRules.mkString(",")) {
  sql("select approx_count_distinct(1, 0.01 + 0.02)")
}{code}



> Make approx_count_distinct result consistent whether Optimize rule exists or 
> not
> 
>
> Key: SPARK-33949
> URL: https://issues.apache.org/jira/browse/SPARK-33949
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> This code will fail because folabe value not fold, we should keep 
> {code:java}
> val excludedRules = Seq(ConstantFolding, 
> ReorderAssociativeOperator).map(_.ruleName)
> withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
> excludedRules.mkString(",")) {
>   sql("select approx_count_distinct(1, 0.01 + 0.02)")
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not

2020-12-31 Thread ulysses you (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ulysses you updated SPARK-33949:

Description: 
This code will fail because folabe value not fold, we should keep result 
consistent whether Optimize rule exists or not.
{code:java}
val excludedRules = Seq(ConstantFolding, 
ReorderAssociativeOperator).map(_.ruleName)
withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
excludedRules.mkString(",")) {
  sql("select approx_count_distinct(1, 0.01 + 0.02)")
}{code}



  was:
This code will fail because folabe value not fold, we should keep 
{code:java}
val excludedRules = Seq(ConstantFolding, 
ReorderAssociativeOperator).map(_.ruleName)
withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
excludedRules.mkString(",")) {
  sql("select approx_count_distinct(1, 0.01 + 0.02)")
}{code}




> Make approx_count_distinct result consistent whether Optimize rule exists or 
> not
> 
>
> Key: SPARK-33949
> URL: https://issues.apache.org/jira/browse/SPARK-33949
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: ulysses you
>Priority: Minor
>
> This code will fail because folabe value not fold, we should keep result 
> consistent whether Optimize rule exists or not.
> {code:java}
> val excludedRules = Seq(ConstantFolding, 
> ReorderAssociativeOperator).map(_.ruleName)
> withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key -> 
> excludedRules.mkString(",")) {
>   sql("select approx_count_distinct(1, 0.01 + 0.02)")
> }{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33949) Make approx_count_distinct result consistent whether Optimize rule exists or not

2020-12-31 Thread ulysses you (Jira)
ulysses you created SPARK-33949:
---

 Summary: Make approx_count_distinct result consistent whether 
Optimize rule exists or not
 Key: SPARK-33949
 URL: https://issues.apache.org/jira/browse/SPARK-33949
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: ulysses you






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2020-12-31 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256908#comment-17256908
 ] 

Yang Jie commented on SPARK-33948:
--

seems master branch cannot reproduce this problem

> branch-3.1 jenkins test failed in Scala 2.13 
> -
>
> Key: SPARK-33948
> URL: https://issues.apache.org/jira/browse/SPARK-33948
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
> Environment: * 
>  
>Reporter: Yang Jie
>Priority: Major
>
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.returnDifferentClientsForDifferentServers|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/returnDifferentClientsForDifferentServers/]
>  
> 

[jira] [Comment Edited] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2020-12-31 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256903#comment-17256903
 ] 

Yang Jie edited comment on SPARK-33948 at 12/31/20, 9:03 AM:
-

Very strange for ExpressionEncoderSuite:

run 
{code:java}
mvn clean test -pl sql/catalyst  -Pscala-2.13  
-DwildcardSuites=org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite 
-Dtest=none
{code}
All case passed.

run
{code:java}
mvn clean test -pl sql/catalyst  -Pscala-2.13 {code}
 

 there are 16 tests failed in ExpressionEncoderSuite 

 

cc [~dongjoon] [~srowen]


was (Author: luciferyang):
Very strange for ExpressionEncoderSuite:

run 
{code:java}
mvn clean test -pl sql/catalyst  -Pscala-2.13  
-DwildcardSuites=org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite 
-Dtest=none
{code}
All case passed.

run
{code:java}
mvn clean test -pl sql/catalyst  -Pscala-2.13 {code}
 

 there are 16 tests failed in ExpressionEncoderSuite 

> branch-3.1 jenkins test failed in Scala 2.13 
> -
>
> Key: SPARK-33948
> URL: https://issues.apache.org/jira/browse/SPARK-33948
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
> Environment: * 
>  
>Reporter: Yang Jie
>Priority: Major
>
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
>  * 
> 

[jira] [Commented] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2020-12-31 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17256903#comment-17256903
 ] 

Yang Jie commented on SPARK-33948:
--

Very strange for ExpressionEncoderSuite:

run 
{code:java}
mvn clean test -pl sql/catalyst  -Pscala-2.13  
-DwildcardSuites=org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite 
-Dtest=none
{code}
All case passed.

run
{code:java}
mvn clean test -pl sql/catalyst  -Pscala-2.13 {code}
 

 there are 16 tests failed in ExpressionEncoderSuite 

> branch-3.1 jenkins test failed in Scala 2.13 
> -
>
> Key: SPARK-33948
> URL: https://issues.apache.org/jira/browse/SPARK-33948
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
> Environment: * 
>  
>Reporter: Yang Jie
>Priority: Major
>
> [https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut/]
>  * 
> [org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut_2/]
>  * 
> 

[jira] [Updated] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2020-12-31 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33948:
-
Description: 
[https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.returnDifferentClientsForDifferentServers|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/returnDifferentClientsForDifferentServers/]

 
[ExpressionEncoderSuite|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/101/testReport/junit/org.apache.spark.sql.catalyst.encoders/ExpressionEncoderSuite/encode_decode_for_Tuple2___ArrayBuffer__String__String___ArrayBuffer__a_b_codegen_path_/]
 * [org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite.encode/decode 
for Tuple2: (ArrayBuffer[(String, String)],ArrayBuffer((a,b))) (codegen 

[jira] [Updated] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2020-12-31 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33948:
-
Environment: 
* 

 

  was:
[https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.returnDifferentClientsForDifferentServers|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/returnDifferentClientsForDifferentServers/]

 
 * [org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite.encode/decode 
for Tuple2: (ArrayBuffer[(String, String)],ArrayBuffer((a,b))) (codegen 
path)|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/101/testReport/junit/org.apache.spark.sql.catalyst.encoders/ExpressionEncoderSuite/encode_decode_for_Tuple2___ArrayBuffer__String__String___ArrayBuffer__a_b_codegen_path_/]
 * [org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite.encode/decode 
for Tuple2: (ArrayBuffer[(String, String)],ArrayBuffer((a,b))) (interpreted 

[jira] [Updated] (SPARK-33947) String functions: Trim/Ltrim/Rtrim support byte array

2020-12-31 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-33947:
---
Summary: String functions: Trim/Ltrim/Rtrim support byte array  (was: 
String functions: Trim/Ltrim/Rtrim support byte arrays)

> String functions: Trim/Ltrim/Rtrim support byte array
> -
>
> Key: SPARK-33947
> URL: https://issues.apache.org/jira/browse/SPARK-33947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> String functions: Trim/Ltrim/Rtrim support byte arrays
> The mainstream database support this feature show below:
> Teradata
> https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A
> Vertica
> https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/BTRIM.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_5
> Redshift
> https://docs.aws.amazon.com/redshift/latest/dg/r_BTRIM.html
> Postgresql
> https://www.postgresql.org/docs/11/functions-binarystring.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33947) String functions: Trim/Ltrim/Rtrim support byte arrays

2020-12-31 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-33947:
---
Description: 
String functions: Trim/Ltrim/Rtrim support byte arrays

The mainstream database support this feature show below:

Teradata
https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A

Vertica
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/BTRIM.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_5

Redshift
https://docs.aws.amazon.com/redshift/latest/dg/r_BTRIM.html

Postgresql
https://www.postgresql.org/docs/11/functions-binarystring.html





  was:
String functions: Trim/Ltrim/Rtrim support byte arrays

The mainstream database support this feature show below:

Teradata
https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A

Vertica
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/BTRIM.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_5

Redshift
https://docs.aws.amazon.com/redshift/latest/dg/r_BTRIM.html






> String functions: Trim/Ltrim/Rtrim support byte arrays
> --
>
> Key: SPARK-33947
> URL: https://issues.apache.org/jira/browse/SPARK-33947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> String functions: Trim/Ltrim/Rtrim support byte arrays
> The mainstream database support this feature show below:
> Teradata
> https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A
> Vertica
> https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/BTRIM.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_5
> Redshift
> https://docs.aws.amazon.com/redshift/latest/dg/r_BTRIM.html
> Postgresql
> https://www.postgresql.org/docs/11/functions-binarystring.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33947) String functions: Trim/Ltrim/Rtrim support byte arrays

2020-12-31 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-33947:
---
Description: 
String functions: Trim/Ltrim/Rtrim support byte arrays

The mainstream database support this feature show below:

Teradata
https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A

Vertica
https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/BTRIM.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_5

Redshift
https://docs.aws.amazon.com/redshift/latest/dg/r_BTRIM.html





  was:
String functions: Trim/Ltrim/Rtrim support byte arrays

The mainstream database support this feature show below:

Teradata
https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A




> String functions: Trim/Ltrim/Rtrim support byte arrays
> --
>
> Key: SPARK-33947
> URL: https://issues.apache.org/jira/browse/SPARK-33947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> String functions: Trim/Ltrim/Rtrim support byte arrays
> The mainstream database support this feature show below:
> Teradata
> https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A
> Vertica
> https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/String/BTRIM.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CString%20Functions%7C_5
> Redshift
> https://docs.aws.amazon.com/redshift/latest/dg/r_BTRIM.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33947) String functions: Trim/Ltrim/Rtrim support byte arrays

2020-12-31 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-33947:
---
Description: 
String functions: Trim/Ltrim/Rtrim support byte arrays

The mainstream database support this feature show below:

Teradata
https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A



  was:
https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A

The mainstream database support this feature show below:

Teradata
https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A




> String functions: Trim/Ltrim/Rtrim support byte arrays
> --
>
> Key: SPARK-33947
> URL: https://issues.apache.org/jira/browse/SPARK-33947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> String functions: Trim/Ltrim/Rtrim support byte arrays
> The mainstream database support this feature show below:
> Teradata
> https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/vd81iWAoGj0cEeGBMAfF9A



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33948) branch-3.1 jenkins test failed in Scala 2.13

2020-12-31 Thread Yang Jie (Jira)
Yang Jie created SPARK-33948:


 Summary: branch-3.1 jenkins test failed in Scala 2.13 
 Key: SPARK-33948
 URL: https://issues.apache.org/jira/browse/SPARK-33948
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 3.1.0
 Environment: 
[https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/#showFailuresLink]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.reuseClientsUpToConfigVariableConcurrent|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/reuseClientsUpToConfigVariableConcurrent_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.fastFailConnectionInTimeWindow|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/fastFailConnectionInTimeWindow_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeFactoryBeforeCreateClient|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeFactoryBeforeCreateClient_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeBlockClientsWithFactory|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeBlockClientsWithFactory_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.neverReturnInactiveClients|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/neverReturnInactiveClients_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.closeIdleConnectionForRequestTimeOut|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/closeIdleConnectionForRequestTimeOut_2/]
 * 
[org.apache.spark.network.client.TransportClientFactorySuite.returnDifferentClientsForDifferentServers|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/104/testReport/junit/org.apache.spark.network.client/TransportClientFactorySuite/returnDifferentClientsForDifferentServers/]

 
 * [org.apache.spark.sql.catalyst.encoders.ExpressionEncoderSuite.encode/decode 
for Tuple2: (ArrayBuffer[(String, String)],ArrayBuffer((a,b))) (codegen 
path)|https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-3.1-test-maven-hadoop-3.2-scala-2.13/101/testReport/junit/org.apache.spark.sql.catalyst.encoders/ExpressionEncoderSuite/encode_decode_for_Tuple2___ArrayBuffer__String__String___ArrayBuffer__a_b_codegen_path_/]
 *