[jira] [Assigned] (SPARK-32691) Test org.apache.spark.DistributedSuite failed on arm64 jenkins

2020-10-16 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-32691:


Assignee: zhengruifeng

> Test org.apache.spark.DistributedSuite failed on arm64 jenkins
> --
>
> Key: SPARK-32691
> URL: https://issues.apache.org/jira/browse/SPARK-32691
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.0
> Environment: ARM64
>Reporter: huangtianhua
>Assignee: zhengruifeng
>Priority: Major
> Attachments: Screen Shot 2020-09-28 at 8.49.04 AM.png, failure.log, 
> success.log
>
>
> Tests of org.apache.spark.DistributedSuite are failed on arm64 jenkins: 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-arm/ 
> - caching in memory and disk, replicated (encryption = on) (with 
> replication as stream) *** FAILED ***
>   3 did not equal 2; got 3 replicas instead of 2 (DistributedSuite.scala:191)
> - caching in memory and disk, serialized, replicated (encryption = on) 
> (with replication as stream) *** FAILED ***
>   3 did not equal 2; got 3 replicas instead of 2 (DistributedSuite.scala:191)
> - caching in memory, serialized, replicated (encryption = on) (with 
> replication as stream) *** FAILED ***
>   3 did not equal 2; got 3 replicas instead of 2 (DistributedSuite.scala:191)
> ...
> 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33158) Check whether the executor and external service connection is available

2020-10-16 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215244#comment-17215244
 ] 

dzcxzl commented on SPARK-33158:


[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669]/[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898]
 provides the ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail.

> Check whether the executor and external service connection is available
> ---
>
> Key: SPARK-33158
> URL: https://issues.apache.org/jira/browse/SPARK-33158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the executor only establishes a connection with the external 
> shuffle service once at initialization and registers.
> In yarn, nodemanager may stop working, shuffle service does not work, but the 
> container/executor process is still executing, ShuffleMapTask can still be 
> executed, and the returned result mapstatus is still the address of the 
> external shuffle service
> When the next stage reads shuffle data, it will not be connected to the 
> shuffle serivce.
> The final job execution failed.
> The approach I thought of:
> Before ShuffleMapTask starts to write data, check whether the connection is 
> available, or regularly test whether the connection is normal, such as the 
> driver and executor heartbeat check threads.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33158) Check whether the executor and external service connection is available

2020-10-16 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215244#comment-17215244
 ] 

dzcxzl edited comment on SPARK-33158 at 10/16/20, 7:52 AM:
---

[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669] / 
[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898] provides the 
ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail.


was (Author: dzcxzl):
[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669]/[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898]
 provides the ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail.

> Check whether the executor and external service connection is available
> ---
>
> Key: SPARK-33158
> URL: https://issues.apache.org/jira/browse/SPARK-33158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the executor only establishes a connection with the external 
> shuffle service once at initialization and registers.
> In yarn, nodemanager may stop working, shuffle service does not work, but the 
> container/executor process is still executing, ShuffleMapTask can still be 
> executed, and the returned result mapstatus is still the address of the 
> external shuffle service
> When the next stage reads shuffle data, it will not be connected to the 
> shuffle serivce.
> The final job execution failed.
> The approach I thought of:
> Before ShuffleMapTask starts to write data, check whether the connection is 
> available, or regularly test whether the connection is normal, such as the 
> driver and executor heartbeat check threads.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33143) Make SocketAuthServer socket timeout configurable

2020-10-16 Thread Miklos Szurap (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215249#comment-17215249
 ] 

Miklos Szurap commented on SPARK-33143:
---

It has been observed with big RDDs.
{noformat}
20/10/07 18:27:20 INFO scheduler.DAGScheduler: Job 311 finished: toPandas at 
/data/1/app/bin/apps/report/doreport.py:91, took 0.619208 s
Exception in thread "serve-DataFrame" java.net.SocketTimeoutException: Accept 
timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at 
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:545)
at java.net.ServerSocket.accept(ServerSocket.java:513)
at 
org.apache.spark.api.python.PythonServer$$anon$1.run(PythonRDD.scala:881)
Traceback (most recent call last):
  File "/data/1/app/bin/apps/report/doreport.py", line 91, in 
df=dq_final.toPandas()
  File 
"/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/sql/dataframe.py",
 line 2142, in toPandas
  File 
"/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/sql/dataframe.py",
 line 534, in collect
  File 
"/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/rdd.py",
 line 144, in _load_from_socket
  File 
"/opt/cloudera/parcels/SPARK2-2.4.0.cloudera2-1.cdh5.13.3.p3544.1321029/lib/spark2/python/lib/pyspark.zip/pyspark/java_gateway.py",
 line 178, in local_connect_and_auth
Exception: could not open socket: ["tried to connect to ('127.0.0.1', 33127), 
but an error occured: "]
20/10/07 18:27:36 INFO spark.SparkContext: Invoking stop() from shutdown hook
{noformat}
After splitting the app to two parts to process half the data amount in a 
single run it could finish successfully.

> Make SocketAuthServer socket timeout configurable
> -
>
> Key: SPARK-33143
> URL: https://issues.apache.org/jira/browse/SPARK-33143
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Miklos Szurap
>Priority: Major
>
> In SPARK-21551 the socket timeout for the Pyspark applications has been 
> increased from 3 to 15 seconds. However it is still hardcoded.
> In certain situations even the 15 seconds is not enough, so it should be made 
> configurable. 
> This is requested after seeing it in real-life workload failures.
> Also it has been suggested and requested in an earlier comment in 
> [SPARK-18649|https://issues.apache.org/jira/browse/SPARK-18649?focusedCommentId=16493498&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16493498]
> In 
> Spark 2.4 it is under
> [PythonRDD.scala|https://github.com/apache/spark/blob/branch-2.4/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L899]
> in Spark 3.x the code has been moved to
> [SocketAuthServer.scala|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/security/SocketAuthServer.scala#L51]
> {code}
> serverSocket.setSoTimeout(15000)
> {code}
> Please include this in both 2.4 and 3.x branches.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33158) Check whether the executor and external service connection is available

2020-10-16 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215244#comment-17215244
 ] 

dzcxzl edited comment on SPARK-33158 at 10/16/20, 8:06 AM:
---

[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669] / 
[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898] provides the 
ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail. 
https://issues.apache.org/jira/browse/YARN-72?focusedCommentId=13505398&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13505398

Although spark.files.fetchFailure.unRegisterOutputOnHost can be turned on to 
remove all shuffle files of the host, it may still be assigned to this host 
when the stage is rerun. Since the executor does not know whether the shuffle 
service is available, it continues to write data to disk.



was (Author: dzcxzl):
[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669] / 
[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898] provides the 
ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail.

> Check whether the executor and external service connection is available
> ---
>
> Key: SPARK-33158
> URL: https://issues.apache.org/jira/browse/SPARK-33158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the executor only establishes a connection with the external 
> shuffle service once at initialization and registers.
> In yarn, nodemanager may stop working, shuffle service does not work, but the 
> container/executor process is still executing, ShuffleMapTask can still be 
> executed, and the returned result mapstatus is still the address of the 
> external shuffle service
> When the next stage reads shuffle data, it will not be connected to the 
> shuffle serivce.
> The final job execution failed.
> The approach I thought of:
> Before ShuffleMapTask starts to write data, check whether the connection is 
> available, or regularly test whether the connection is normal, such as the 
> driver and executor heartbeat check threads.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33158) Check whether the executor and external service connection is available

2020-10-16 Thread dzcxzl (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215244#comment-17215244
 ] 

dzcxzl edited comment on SPARK-33158 at 10/16/20, 8:08 AM:
---

[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669] / 
[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898] provides the 
ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail. 
https://issues.apache.org/jira/browse/YARN-72?focusedCommentId=13505398&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13505398

Although spark.files.fetchFailure.unRegisterOutputOnHost can be turned on to 
remove all shuffle files of the host, it may still be assigned to this host 
when the stage is rerun. Since the executor does not know whether the shuffle 
service is available, it continues to write data to disk , the next round of 
shuffle read will fail again.



was (Author: dzcxzl):
[SPARK-13​​669|https://issues.apache.org/jira/browse/SPARK-13669] / 
[SPARK-20898|https://issues.apache.org/jira/browse/SPARK-20898] provides the 
ability to add host to the blacklist when fetch fails, 
[SPARK-27272|https://issues.apache.org/jira/browse/SPARK-27272] tries to enable 
this feature by default.

If we want to avoid this problem, we can configure
spark.blacklist.enabled=true
spark.blacklist.application.fetchFailure.enabled=true

Sometimes we will stop nm or decommission nm for a period of time,nm does not 
guarantee that all container processes will be killed when stopping, it may 
appear that the container is still executing, nm does not provide shuffle 
service,which will cause fetch fail. 
https://issues.apache.org/jira/browse/YARN-72?focusedCommentId=13505398&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13505398

Although spark.files.fetchFailure.unRegisterOutputOnHost can be turned on to 
remove all shuffle files of the host, it may still be assigned to this host 
when the stage is rerun. Since the executor does not know whether the shuffle 
service is available, it continues to write data to disk.


> Check whether the executor and external service connection is available
> ---
>
> Key: SPARK-33158
> URL: https://issues.apache.org/jira/browse/SPARK-33158
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: dzcxzl
>Priority: Trivial
>
> At present, the executor only establishes a connection with the external 
> shuffle service once at initialization and registers.
> In yarn, nodemanager may stop working, shuffle service does not work, but the 
> container/executor process is still executing, ShuffleMapTask can still be 
> executed, and the returned result mapstatus is still the address of the 
> external shuffle service
> When the next stage reads shuffle data, it will not be connected to the 
> shuffle serivce.
> The final job execution failed.
> The approach I thought of:
> Before ShuffleMapTask starts to write data, check whether the connection is 
> available, or regularly test whether the connection is normal, such as the 
> driver and executor heartbeat check threads.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33168) spark REST API Unable to get JobDescription

2020-10-16 Thread zhaoyachao (Jira)
zhaoyachao created SPARK-33168:
--

 Summary: spark REST API Unable to get JobDescription
 Key: SPARK-33168
 URL: https://issues.apache.org/jira/browse/SPARK-33168
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: zhaoyachao


spark set job description ,use spark REST API 
(localhost:4040/api/v1/applications/xxx/jobs)unable to get job description,but 
it can be displayed at localhost:4040/jobs

spark.sparkContext.setJobDescription({color:#6a8759}"test_count"{color})
spark.range({color:#6897bb}100{color}).count()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33126) Simplify offset window function(Remove direction field)

2020-10-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33126:
---

Assignee: jiaan.geng

> Simplify offset window function(Remove direction field)
> ---
>
> Key: SPARK-33126
> URL: https://issues.apache.org/jira/browse/SPARK-33126
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
>
> The current Lead/Lag extends OffsetWindowFunction. OffsetWindowFunction 
> contains field direction and use direction to calculate the boundary.
> We can use single literal expression unify the two properties.
> For example:
>  3 means direction is Asc and boundary is 3.
>  -3 means direction is Desc and boundary is -3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33126) Simplify offset window function(Remove direction field)

2020-10-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33126.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30023
[https://github.com/apache/spark/pull/30023]

> Simplify offset window function(Remove direction field)
> ---
>
> Key: SPARK-33126
> URL: https://issues.apache.org/jira/browse/SPARK-33126
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.1.0
>
>
> The current Lead/Lag extends OffsetWindowFunction. OffsetWindowFunction 
> contains field direction and use direction to calculate the boundary.
> We can use single literal expression unify the two properties.
> For example:
>  3 means direction is Asc and boundary is 3.
>  -3 means direction is Desc and boundary is -3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33131) Fix grouping sets with having clause can not resolve qualified col name

2020-10-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33131:
---

Assignee: ulysses you

> Fix grouping sets with having clause can not resolve qualified col name
> ---
>
> Key: SPARK-33131
> URL: https://issues.apache.org/jira/browse/SPARK-33131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
>
> Grouping sets construct new aggregate lost the qualified name of grouping 
> expression. Here is a example:
> {code:java}
> -- Works resolved by ResolveReferences
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 
> 1
> -- Works because of the extra expression c1
> select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) 
> having t1.c1 = 1
> -- Failed
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having 
> t1.c1 = 1{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33131) Fix grouping sets with having clause can not resolve qualified col name

2020-10-16 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33131.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30029
[https://github.com/apache/spark/pull/30029]

> Fix grouping sets with having clause can not resolve qualified col name
> ---
>
> Key: SPARK-33131
> URL: https://issues.apache.org/jira/browse/SPARK-33131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
> Fix For: 3.1.0
>
>
> Grouping sets construct new aggregate lost the qualified name of grouping 
> expression. Here is a example:
> {code:java}
> -- Works resolved by ResolveReferences
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 
> 1
> -- Works because of the extra expression c1
> select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) 
> having t1.c1 = 1
> -- Failed
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having 
> t1.c1 = 1{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32032) Eliminate deprecated poll(long) API calls to avoid infinite wait in driver

2020-10-16 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-32032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215352#comment-17215352
 ] 

Hélder Hugo Ferreira commented on SPARK-32032:
--

Hi, we are getting the following error when using the kafka StreamReader 
functionality "_option("startingOffsetsByTimestamp")_":  

 
{code:java}
WARN KafkaOffsetReader: Error in attempt 1 getting Kafka offsets: 
java.lang.AssertionError: assertion failed: No offset matched from request of 
topic-partition MyTopic-6 and timestamp 160267338. at 
scala.Predef$.assert(Predef.scala:223) at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$fetchSpecificTimestampBasedOffsets$6(KafkaOffsetReader.scala:238)
 at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) 
at scala.collection.Iterator.foreach(Iterator.scala:941) at 
scala.collection.Iterator.foreach$(Iterator.scala:941) at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at 
scala.collection.IterableLike.foreach(IterableLike.scala:74) at 
scala.collection.IterableLike.foreach$(IterableLike.scala:73) at 
scala.collection.AbstractIterable.foreach(Iterable.scala:56) at 
scala.collection.TraversableLike.map(TraversableLike.scala:238) at 
scala.collection.TraversableLike.map$(TraversableLike.scala:231) at 
scala.collection.AbstractTraversable.map(Traversable.scala:108) at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$fetchSpecificTimestampBasedOffsets$4(KafkaOffsetReader.scala:236)
 at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$fetchSpecificOffsets0$1(KafkaOffsetReader.scala:265)
 at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$partitionsAssignedToConsumer$2(KafkaOffsetReader.scala:550)
 at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$withRetriesWithoutInterrupt$1(KafkaOffsetReader.scala:600)
 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
 at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.withRetriesWithoutInterrupt(KafkaOffsetReader.scala:599)
 at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.$anonfun$partitionsAssignedToConsumer$1(KafkaOffsetReader.scala:536)
 at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.runUninterruptibly(KafkaOffsetReader.scala:567)
 at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.partitionsAssignedToConsumer(KafkaOffsetReader.scala:536)
 at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.fetchSpecificOffsets0(KafkaOffsetReader.scala:261)
 at 
org.apache.spark.sql.kafka010.KafkaOffsetReader.fetchSpecificTimestampBasedOffsets(KafkaOffsetReader.scala:254)
 at 
org.apache.spark.sql.kafka010.KafkaMicroBatchStream.$anonfun$getOrCreateInitialPartitionOffsets$1(KafkaMicroBatchStream.scala:157)
 at scala.Option.getOrElse(Option.scala:189) at 
org.apache.spark.sql.kafka010.KafkaMicroBatchStream.getOrCreateInitialPartitionOffsets(KafkaMicroBatchStream.scala:148)
 at 
org.apache.spark.sql.kafka010.KafkaMicroBatchStream.initialOffset(KafkaMicroBatchStream.scala:76)
 at 
...
{code}
We figured out in this topic partition there was only 1 message older than the 
input timestamps set in the _startingOffsetsByTimestamp_. We are using kafka 
2.5.1.

Since we noticed in this PR changes are being performed in the 
_fetchSpecificTimestampBasedOffsets_ function, any chance that this will also 
resolve our issue?

Thanks in advance.

Best Regards,

Hélder Hugo Ferreira
 

> Eliminate deprecated poll(long) API calls to avoid infinite wait in driver
> --
>
> Key: SPARK-32032
> URL: https://issues.apache.org/jira/browse/SPARK-32032
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33169) Check propagation of datasource options to underlying file system for built-in file-based datasources

2020-10-16 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33169:
--

 Summary: Check propagation of datasource options to underlying 
file system for built-in file-based datasources
 Key: SPARK-33169
 URL: https://issues.apache.org/jira/browse/SPARK-33169
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Add a common trait with a test to check that datasource options are propagated 
to underlying file systems. Individual tests were already added by SPARK-33094 
and SPARK-33089. The ticket aims to de-duplicate the tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33169) Check propagation of datasource options to underlying file system for built-in file-based datasources

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33169:


Assignee: (was: Apache Spark)

> Check propagation of datasource options to underlying file system for 
> built-in file-based datasources
> -
>
> Key: SPARK-33169
> URL: https://issues.apache.org/jira/browse/SPARK-33169
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Add a common trait with a test to check that datasource options are 
> propagated to underlying file systems. Individual tests were already added by 
> SPARK-33094 and SPARK-33089. The ticket aims to de-duplicate the tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33169) Check propagation of datasource options to underlying file system for built-in file-based datasources

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33169:


Assignee: Apache Spark

> Check propagation of datasource options to underlying file system for 
> built-in file-based datasources
> -
>
> Key: SPARK-33169
> URL: https://issues.apache.org/jira/browse/SPARK-33169
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Add a common trait with a test to check that datasource options are 
> propagated to underlying file systems. Individual tests were already added by 
> SPARK-33094 and SPARK-33089. The ticket aims to de-duplicate the tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33169) Check propagation of datasource options to underlying file system for built-in file-based datasources

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215371#comment-17215371
 ] 

Apache Spark commented on SPARK-33169:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30067

> Check propagation of datasource options to underlying file system for 
> built-in file-based datasources
> -
>
> Key: SPARK-33169
> URL: https://issues.apache.org/jira/browse/SPARK-33169
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Add a common trait with a test to check that datasource options are 
> propagated to underlying file systems. Individual tests were already added by 
> SPARK-33094 and SPARK-33089. The ticket aims to de-duplicate the tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33088) Enhance ExecutorPlugin API to include methods for task start and end events

2020-10-16 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-33088:
-

Assignee: Samuel Souza

> Enhance ExecutorPlugin API to include methods for task start and end events
> ---
>
> Key: SPARK-33088
> URL: https://issues.apache.org/jira/browse/SPARK-33088
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Samuel Souza
>Assignee: Samuel Souza
>Priority: Major
> Fix For: 3.1.0
>
>
> On [SPARK-24918|https://issues.apache.org/jira/browse/SPARK-24918]'s 
> [SIPP|https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/view#|https://docs.google.com/document/d/1a20gHGMyRbCM8aicvq4LhWfQmoA5cbHBQtyqIA2hgtc/edit#],
>  it was raised to potentially add methods to ExecutorPlugin interface on task 
> start and end:
> {quote}The basic interface can just be a marker trait, as that allows a 
> plugin to monitor general characteristics of the JVM (eg. monitor memory or 
> take thread dumps).   Optionally, we could include methods for task start and 
> end events.   This would allow more control on monitoring – eg., you could 
> start polling thread dumps only if there was a task from a particular stage 
> that had been taking too long. But anything task related is a bit trickier to 
> decide the right api. Should the task end event also get the failure reason? 
> Should those events get called in the same thread as the task runner, or in 
> another thread?
> {quote}
> The ask is to add exactly that. I've put up a draft PR [in our fork of 
> spark|https://github.com/palantir/spark/pull/713] and I'm happy to push it 
> upstream. Also happy to receive comments on what's the right interface to 
> expose - not opinionated on that front, tried to expose the simplest 
> interface for now.
> The main reason for this ask is to propagate tracing information from the 
> driver to the executors 
> ([SPARK-21962|https://issues.apache.org/jira/browse/SPARK-21962] has some 
> context). On 
> [HADOOP-15566|https://issues.apache.org/jira/browse/HADOOP-15566] I see we're 
> discussing how to add tracing to the Apache ecosystem, but my problem is 
> slightly different: I want to use this interface to propagate tracing 
> information to my framework of choice. If the Hadoop issue gets solved we'll 
> have a framework to communicate tracing information inside the Apache 
> ecosystem, but it's highly unlikely that all Spark users will use the same 
> common framework. Therefore we should still provide plugin interfaces where 
> the tracing information can be propagated appropriately.
> To give more color, in our case the tracing information is [stored in a 
> thread 
> local|https://github.com/palantir/tracing-java/blob/4.9.0/tracing/src/main/java/com/palantir/tracing/Tracer.java#L61],
>  therefore it needs to be set in the same thread which is executing the task. 
> [*]
> While our framework is specific, I imagine such an interface could be useful 
> in general. Happy to hear your thoughts about it.
> [*] Something I did not mention was how to propagate the tracing information 
> from the driver to the executors. For that I intend to use 1. the driver's 
> localProperties, which 2. will be eventually propagated to the executors' 
> TaskContext, which 3. I'll be able to access from the methods above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33145) In Execution web page, when `Succeeded Job` has many child url elements,they will extend over the edge of the page.

2020-10-16 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-33145:
--

Assignee: akiyamaneko

> In Execution web page, when `Succeeded Job` has many child url elements,they 
> will extend over the edge of the page. 
> 
>
> Key: SPARK-33145
> URL: https://issues.apache.org/jira/browse/SPARK-33145
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1
>Reporter: akiyamaneko
>Assignee: akiyamaneko
>Priority: Major
> Fix For: 3.0.2
>
> Attachments: Screenshot.png, Screenshot1.png
>
>
> Spark Version:3.0.1
> Problem: In Execution web page, when *{color:#de350b}Succeeded Jobs(or failed 
> Jobs){color}* has many child url elements,they will extend over the edge of 
> the page, as the attachment shows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33145) In Execution web page, when `Succeeded Job` has many child url elements,they will extend over the edge of the page.

2020-10-16 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-33145:
---
Fix Version/s: (was: 3.0.2)
   3.1.0

> In Execution web page, when `Succeeded Job` has many child url elements,they 
> will extend over the edge of the page. 
> 
>
> Key: SPARK-33145
> URL: https://issues.apache.org/jira/browse/SPARK-33145
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1
>Reporter: akiyamaneko
>Assignee: akiyamaneko
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: Screenshot.png, Screenshot1.png
>
>
> Spark Version:3.0.1
> Problem: In Execution web page, when *{color:#de350b}Succeeded Jobs(or failed 
> Jobs){color}* has many child url elements,they will extend over the edge of 
> the page, as the attachment shows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33145) In Execution web page, when `Succeeded Job` has many child url elements,they will extend over the edge of the page.

2020-10-16 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215446#comment-17215446
 ] 

Gengliang Wang commented on SPARK-33145:


The issue is resolved in https://github.com/apache/spark/pull/30035

> In Execution web page, when `Succeeded Job` has many child url elements,they 
> will extend over the edge of the page. 
> 
>
> Key: SPARK-33145
> URL: https://issues.apache.org/jira/browse/SPARK-33145
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1
>Reporter: akiyamaneko
>Assignee: akiyamaneko
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: Screenshot.png, Screenshot1.png
>
>
> Spark Version:3.0.1
> Problem: In Execution web page, when *{color:#de350b}Succeeded Jobs(or failed 
> Jobs){color}* has many child url elements,they will extend over the edge of 
> the page, as the attachment shows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33145) In Execution web page, when `Succeeded Job` has many child url elements,they will extend over the edge of the page.

2020-10-16 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-33145:
---
Priority: Minor  (was: Major)

> In Execution web page, when `Succeeded Job` has many child url elements,they 
> will extend over the edge of the page. 
> 
>
> Key: SPARK-33145
> URL: https://issues.apache.org/jira/browse/SPARK-33145
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1
>Reporter: akiyamaneko
>Assignee: akiyamaneko
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: Screenshot.png, Screenshot1.png
>
>
> Spark Version:3.0.1
> Problem: In Execution web page, when *{color:#de350b}Succeeded Jobs(or failed 
> Jobs){color}* has many child url elements,they will extend over the edge of 
> the page, as the attachment shows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33145) In Execution web page, when `Succeeded Job` has many child url elements,they will extend over the edge of the page.

2020-10-16 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215447#comment-17215447
 ] 

Gengliang Wang commented on SPARK-33145:


[~echohlne] Thanks for the fix. I think it is minor and we can have the fix in 
3.1 instead of 3.0.2. WDYT?

> In Execution web page, when `Succeeded Job` has many child url elements,they 
> will extend over the edge of the page. 
> 
>
> Key: SPARK-33145
> URL: https://issues.apache.org/jira/browse/SPARK-33145
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0, 3.0.1
>Reporter: akiyamaneko
>Assignee: akiyamaneko
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: Screenshot.png, Screenshot1.png
>
>
> Spark Version:3.0.1
> Problem: In Execution web page, when *{color:#de350b}Succeeded Jobs(or failed 
> Jobs){color}* has many child url elements,they will extend over the edge of 
> the page, as the attachment shows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33161) Spark 3: Partition count changing on dataframe select cols

2020-10-16 Thread Ankush Kankariya (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215451#comment-17215451
 ] 

Ankush Kankariya commented on SPARK-33161:
--

[~hyukjin.kwon] Thank you !! will avoid using blocker in the future. 

My assumption is around the recent optimization that might be coalescing the 
partitions to a lower number and I am looking to revert to the original 
behaviour (2.4.4). Also would like to learn about when you say the "no of 
partitions is fairly internal" but spark does control the behaviour now with 
pruning partitions in joins so is there a similar pruning that could have 
caused this depending on some partitions being empty?

 

> Spark 3: Partition count changing on dataframe select cols
> --
>
> Key: SPARK-33161
> URL: https://issues.apache.org/jira/browse/SPARK-33161
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Ankush Kankariya
>Priority: Major
>  Labels: spark-core, spark-sql
>
> I am noticing a difference in behaviour on upgrading to spark 3 where the 
> NumPartitions are changing on df.select which causing my zip operations to 
> fail on mismatch. With spark 2.4.4 it works fine. This does not happen with 
> filter but only with select cols
> {code:java}
> spark = SparkSession.builder.appName("pytest-pyspark-local-testing"). \ 
> master("local[5]"). \ config("spark.executor.memory", "2g"). \ 
> config("spark.driver.memory", "2g"). \ config("spark.ui.showConsoleProgress", 
> "false"). \ config("spark.sql.shuffle.partitions",10). \ 
> config("spark.sql.optimizer.dynamicPartitionPruning.enabled","false").getOrCreate()
> {code}
>  
> With Spark 2.4.4:
>   df = spark.table("tableA")
>  print(df.rdd.getNumPartitions()) #10
>  new_df = df.filter("id is not null")
>  print(new_df.rdd.getNumPartitions()) #10
>  new_2_df = df.select("id")
>  print(new_2_df.rdd.getNumPartitions()) #10
>   
> With Spark 3.0.0:
>  df = spark.table("tableA")
>  print(df.rdd.getNumPartitions()) #10
>  new_df = df.filter("id is not null")
>  print(new_df.rdd.getNumPartitions()) #10
> new_1_df = df.select("*")
>  print(new_1_df.rdd.getNumPartitions()) #10
>  new_2_df = df.select("id")
>  print(new_2_df.rdd.getNumPartitions()) #1
>  See the last line where it changes to 1 partition from initial 10. Any 
> thoughts?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33077) Spark 3 / Cats 2.2.0 classpath issue

2020-10-16 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215461#comment-17215461
 ] 

Sean R. Owen commented on SPARK-33077:
--

Adding a little flavor - the issue ends up being the dependency chain breeze 
1.0 -> spire 1.17.0-M1 -> algebra 1.something -> cats 2.0.0-M4. Updating breeze 
to 1.1 would be cool enough but doesn't change spire's version. Manually 
managing it up to spire 1.17.0 does get cats to 2.1.1 which helps on this one. 
That last step is not hard in Spark, maybe mildly controversial just because we 
manually manage something that isn't used directly to resolve a downstream 
dependency. But not out of the question IMHO.

> Spark 3 / Cats 2.2.0 classpath issue
> 
>
> Key: SPARK-33077
> URL: https://issues.apache.org/jira/browse/SPARK-33077
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell
>Affects Versions: 3.0.1
>Reporter: Sami Dalouche
>Priority: Major
>
> A small project with minimal dependencies as well as instructions on how to 
> reproduce the issue is available at:
> [https://github.com/samidalouche/spark3-cats220]
> Executing this code works fine with cats 2.1.1 but fails with cats 2.2.0, 
> which is quite surprising since the spark and cats dependencies are pretty 
> much distinct from each other.
>  
> {code:java}
> java.lang.NoSuchMethodError: 'void 
> cats.kernel.CommutativeSemigroup.$init$(cats.kernel.CommutativeSemigroup)'
>  at cats.UnorderedFoldable$$anon$1.(UnorderedFoldable.scala:78)
>  at cats.UnorderedFoldable$.(UnorderedFoldable.scala:78)
>  at cats.UnorderedFoldable$.(UnorderedFoldable.scala)
>  at cats.data.NonEmptyListInstances$$anon$2.(NonEmptyList.scala:539)
>  at cats.data.NonEmptyListInstances.(NonEmptyList.scala:539)
>  at cats.data.NonEmptyList$.(NonEmptyList.scala:458)
>  at cats.data.NonEmptyList$.(NonEmptyList.scala)
>  at catsspark.Boom$.assumeValid_$bang(boom.scala:19)
>  at catsspark.Boom$.boom(boom.scala:14)
>  ... 47 elided{code}
> Thanks in advance for looking into this.
> I submitted the same issue to cat's bug tracker: 
> https://github.com/typelevel/cats/issues/3628



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33057) Cannot use filter with window operations

2020-10-16 Thread Li Jin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215468#comment-17215468
 ] 

Li Jin commented on SPARK-33057:


I agree this is an improvement rather than a bug.

Although, I am not sure why this is a "Won't fix"/"Invalid".

Yes the second code adds project and the first one doesn't. However, I think 
the logically they are doing the same thing - "Filtering based on the output of 
a window operation". Whether the output of the window operation is assigned to 
a new field or not doesn't seem to change the logically meaning. 

> Cannot use filter with window operations
> 
>
> Key: SPARK-33057
> URL: https://issues.apache.org/jira/browse/SPARK-33057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Li Jin
>Priority: Major
>
> Current, trying to use filter with a window operations will fail:
>  
> {code:java}
> df = spark.range(100)
> win = Window.partitionBy().orderBy('id')
> df.filter(F.rank().over(win) > 10).show()
> {code}
> Error:
> {code:java}
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/Users/icexelloss/opt/miniconda3/envs/ibis-dev-spark-3/lib/python3.8/site-packages/pyspark/sql/dataframe.py",
>  line 1461, in filter
>     jdf = self._jdf.filter(condition._jc)
>   File 
> "/Users/icexelloss/opt/miniconda3/envs/ibis-dev-spark-3/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",
>  line 1304, in __call__
>   File 
> "/Users/icexelloss/opt/miniconda3/envs/ibis-dev-spark-3/lib/python3.8/site-packages/pyspark/sql/utils.py",
>  line 134, in deco
>     raise_from(converted)
>   File "", line 3, in raise_from
> pyspark.sql.utils.AnalysisException: It is not allowed to use window 
> functions inside WHERE clause;{code}
> Although the code is same as the code below, which works:
> {code:java}
> df = spark.range(100)
> win = Window.partitionBy().orderBy('id')
> df = df.withColumn('rank', F.rank().over(win))
> df = df[df['rank'] > 10]
> df = df.drop('rank'){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33000) cleanCheckpoints config does not clean all checkpointed RDDs on shutdown

2020-10-16 Thread Nicholas Chammas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215469#comment-17215469
 ] 

Nicholas Chammas commented on SPARK-33000:
--

I've tested this out a bit more, and I think the original issue I reported is 
valid. Either that, or I'm still missing something.

I built Spark at the latest commit from {{master}}:
{code:java}
commit 3ae1520185e2d96d1bdbd08c989f0d48ad3ba578 (HEAD -> master, origin/master, 
origin/HEAD, apache/master)
Author: ulysses 
Date:   Fri Oct 16 11:26:27 2020 + {code}
One thing that has changed is that Spark now prevents you from setting 
{{cleanCheckpoints}} after startup:
{code:java}
>>> spark.conf.set('spark.cleaner.referenceTracking.cleanCheckpoints', 'true')
Traceback (most recent call last):
  File "", line 1, in 
  File ".../spark/python/pyspark/sql/conf.py", line 36, in set
    self._jconf.set(key, value)
  File ".../spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 
1304, in __call__
  File ".../spark/python/pyspark/sql/utils.py", line 117, in deco
    raise converted from None
pyspark.sql.utils.AnalysisException: Cannot modify the value of a Spark config: 
spark.cleaner.referenceTracking.cleanCheckpoints; {code}
So that's good! This makes it clear to the user that this setting cannot be set 
at this time (though it could be made more helpful it explained why).

However, if I try to set the config as part of invoking PySpark, I still don't 
see any checkpointed data get cleaned up on shutdown:
{code:java}
$ rm -rf /tmp/spark/checkpoint/
$ ./bin/pyspark --conf spark.cleaner.referenceTracking.cleanCheckpoints=true

>>> spark.conf.get('spark.cleaner.referenceTracking.cleanCheckpoints')
'true'
>>> spark.sparkContext.setCheckpointDir('/tmp/spark/checkpoint/')
>>> a = spark.range(10)
>>> a.checkpoint()
DataFrame[id: bigint]                                                           
>>> 
$ du -sh /tmp/spark/checkpoint/*
32K /tmp/spark/checkpoint/57b0a413-9d47-4bcd-99ef-265e9f5c0f3b{code}
 

> cleanCheckpoints config does not clean all checkpointed RDDs on shutdown
> 
>
> Key: SPARK-33000
> URL: https://issues.apache.org/jira/browse/SPARK-33000
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.6
>Reporter: Nicholas Chammas
>Priority: Minor
>
> Maybe it's just that the documentation needs to be updated, but I found this 
> surprising:
> {code:python}
> $ pyspark
> ...
> >>> spark.conf.set('spark.cleaner.referenceTracking.cleanCheckpoints', 'true')
> >>> spark.sparkContext.setCheckpointDir('/tmp/spark/checkpoint/')
> >>> a = spark.range(10)
> >>> a.checkpoint()
> DataFrame[id: bigint] 
>   
> >>> exit(){code}
> The checkpoint data is left behind in {{/tmp/spark/checkpoint/}}. I expected 
> Spark to clean it up on shutdown.
> The documentation for {{spark.cleaner.referenceTracking.cleanCheckpoints}} 
> says:
> {quote}Controls whether to clean checkpoint files if the reference is out of 
> scope.
> {quote}
> When Spark shuts down, everything goes out of scope, so I'd expect all 
> checkpointed RDDs to be cleaned up.
> For the record, I see the same behavior in both the Scala and Python REPLs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33159) Use hive-service-rpc dep instead of inline the generated code

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33159.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30055
[https://github.com/apache/spark/pull/30055]

> Use hive-service-rpc dep instead of inline the generated code
> -
>
> Key: SPARK-33159
> URL: https://issues.apache.org/jira/browse/SPARK-33159
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.0
>
>
> Hive's hive-service-rpc module started since hive-2.1.0 and it contains only 
> the thrift IDL file and the code generated by it.
> Removing the inlined code will help maintain and upgrade builtin  hive 
> versions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33159) Use hive-service-rpc dep instead of inline the generated code

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33159:
-

Assignee: Kent Yao

> Use hive-service-rpc dep instead of inline the generated code
> -
>
> Key: SPARK-33159
> URL: https://issues.apache.org/jira/browse/SPARK-33159
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> Hive's hive-service-rpc module started since hive-2.1.0 and it contains only 
> the thrift IDL file and the code generated by it.
> Removing the inlined code will help maintain and upgrade builtin  hive 
> versions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33167) Throw Spark specific exception for commit protocol errors

2020-10-16 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-33167.
-
Resolution: Not A Problem

> Throw Spark specific exception for commit protocol errors
> -
>
> Key: SPARK-33167
> URL: https://issues.apache.org/jira/browse/SPARK-33167
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> In SPARK-29649, we catch `FileAlreadyExistsException` in `FileFormatWriter` 
> and fail fast for the task set to prevent task retry.
> One concern for this approach is we might break some cases where users rely 
> on task retry and their tasks throw `FileAlreadyExistsException` in user 
> code, instead of Spark internally.
> To take care both cases, better approach is to throw a Spark specific 
> exception for commit protocol errors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33000) cleanCheckpoints config does not clean all checkpointed RDDs on shutdown

2020-10-16 Thread Haoyuan Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215515#comment-17215515
 ] 

Haoyuan Wang commented on SPARK-33000:
--

I'm not sure what ctrl-D does. It closed bash, but does it gracefully terminate 
SparkContext? Try calling spark.stop() and wait a minute or so.

> cleanCheckpoints config does not clean all checkpointed RDDs on shutdown
> 
>
> Key: SPARK-33000
> URL: https://issues.apache.org/jira/browse/SPARK-33000
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.6
>Reporter: Nicholas Chammas
>Priority: Minor
>
> Maybe it's just that the documentation needs to be updated, but I found this 
> surprising:
> {code:python}
> $ pyspark
> ...
> >>> spark.conf.set('spark.cleaner.referenceTracking.cleanCheckpoints', 'true')
> >>> spark.sparkContext.setCheckpointDir('/tmp/spark/checkpoint/')
> >>> a = spark.range(10)
> >>> a.checkpoint()
> DataFrame[id: bigint] 
>   
> >>> exit(){code}
> The checkpoint data is left behind in {{/tmp/spark/checkpoint/}}. I expected 
> Spark to clean it up on shutdown.
> The documentation for {{spark.cleaner.referenceTracking.cleanCheckpoints}} 
> says:
> {quote}Controls whether to clean checkpoint files if the reference is out of 
> scope.
> {quote}
> When Spark shuts down, everything goes out of scope, so I'd expect all 
> checkpointed RDDs to be cleaned up.
> For the record, I see the same behavior in both the Scala and Python REPLs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33170) Add SQL config to control fast-fail behavior in FileFormatWriter

2020-10-16 Thread L. C. Hsieh (Jira)
L. C. Hsieh created SPARK-33170:
---

 Summary: Add SQL config to control fast-fail behavior in 
FileFormatWriter
 Key: SPARK-33170
 URL: https://issues.apache.org/jira/browse/SPARK-33170
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


In SPARK-29649, we catch `FileAlreadyExistsException` in `FileFormatWriter` and 
fail fast for the task set to prevent task retry.

After latest discussion, it is important to be able to keep original behavior 
that is to retry tasks even `FileAlreadyExistsException` is thrown, because 
`FileAlreadyExistsException` could be recoverable in some cases.

We are going to add a config we can control this behavior and set it false for 
fast-fail by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33171) Mark Parquet*FilterSuite as ExtendedSQLTest

2020-10-16 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-33171:
-

 Summary: Mark Parquet*FilterSuite as ExtendedSQLTest
 Key: SPARK-33171
 URL: https://issues.apache.org/jira/browse/SPARK-33171
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33171) Mark Parquet*FilterSuite as ExtendedSQLTest

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33171:


Assignee: (was: Apache Spark)

> Mark Parquet*FilterSuite as ExtendedSQLTest
> ---
>
> Key: SPARK-33171
> URL: https://issues.apache.org/jira/browse/SPARK-33171
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33171) Mark Parquet*FilterSuite as ExtendedSQLTest

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215534#comment-17215534
 ] 

Apache Spark commented on SPARK-33171:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30068

> Mark Parquet*FilterSuite as ExtendedSQLTest
> ---
>
> Key: SPARK-33171
> URL: https://issues.apache.org/jira/browse/SPARK-33171
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33171) Mark Parquet*FilterSuite as ExtendedSQLTest

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33171:


Assignee: Apache Spark

> Mark Parquet*FilterSuite as ExtendedSQLTest
> ---
>
> Key: SPARK-33171
> URL: https://issues.apache.org/jira/browse/SPARK-33171
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33000) cleanCheckpoints config does not clean all checkpointed RDDs on shutdown

2020-10-16 Thread Nicholas Chammas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215577#comment-17215577
 ] 

Nicholas Chammas commented on SPARK-33000:
--

Ctrl-D gracefully shuts down the Python REPL, so that should trigger the 
appropriate cleanup.

I repeated my test and did {{spark.stop()}} instead of Ctrl-D and waited 2 
minutes. Same result. No cleanup.

> cleanCheckpoints config does not clean all checkpointed RDDs on shutdown
> 
>
> Key: SPARK-33000
> URL: https://issues.apache.org/jira/browse/SPARK-33000
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.6
>Reporter: Nicholas Chammas
>Priority: Minor
>
> Maybe it's just that the documentation needs to be updated, but I found this 
> surprising:
> {code:python}
> $ pyspark
> ...
> >>> spark.conf.set('spark.cleaner.referenceTracking.cleanCheckpoints', 'true')
> >>> spark.sparkContext.setCheckpointDir('/tmp/spark/checkpoint/')
> >>> a = spark.range(10)
> >>> a.checkpoint()
> DataFrame[id: bigint] 
>   
> >>> exit(){code}
> The checkpoint data is left behind in {{/tmp/spark/checkpoint/}}. I expected 
> Spark to clean it up on shutdown.
> The documentation for {{spark.cleaner.referenceTracking.cleanCheckpoints}} 
> says:
> {quote}Controls whether to clean checkpoint files if the reference is out of 
> scope.
> {quote}
> When Spark shuts down, everything goes out of scope, so I'd expect all 
> checkpointed RDDs to be cleaned up.
> For the record, I see the same behavior in both the Scala and Python REPLs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33171) Mark Parquet*FilterSuite as ExtendedSQLTest

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33171:
-

Assignee: Dongjoon Hyun

> Mark Parquet*FilterSuite as ExtendedSQLTest
> ---
>
> Key: SPARK-33171
> URL: https://issues.apache.org/jira/browse/SPARK-33171
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33171) Mark Parquet*FilterSuite as ExtendedSQLTest

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33171.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30068
[https://github.com/apache/spark/pull/30068]

> Mark Parquet*FilterSuite as ExtendedSQLTest
> ---
>
> Key: SPARK-33171
> URL: https://issues.apache.org/jira/browse/SPARK-33171
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33171) Mark Parquet*FilterSuite as ExtendedSQLTest

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33171:
--
Fix Version/s: 3.0.2

> Mark Parquet*FilterSuite as ExtendedSQLTest
> ---
>
> Key: SPARK-33171
> URL: https://issues.apache.org/jira/browse/SPARK-33171
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33152) Constraint Propagation code causes OOM issues or increasing compilation time to hours

2020-10-16 Thread Asif (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215645#comment-17215645
 ] 

Asif commented on SPARK-33152:
--

[~yumwang], Hi. I need some help.

I started the tests run via 

dev/run-tests.

The run exited after running for 6 + hrs, with the following tail end logging.

ble `default`.`javasavedtable` into Hive metastore in Spark SQL specific 
format, which is NOT compatible with Hive.\n"b'\x1b[0m[\x1b[0minfo\x1b[0m] 
\x1b[0mTest 
org.apache.spark.sql.hive.\x1b[33mJavaMetastoreDataSourcesSuite\x1b[0m.\x1b[36msaveExternalTableWithSchemaAndQueryIt\x1b[0m
 started\x1b[0m\n'b"13:16:18.288 WARN 
org.apache.spark.sql.hive.test.TestHiveExternalCatalog: Couldn't find 
corresponding Hive SerDe for data source provider org.apache.spark.sql.json. 
Persisting data source table `default`.`javasavedtable` into Hive metastore in 
Spark SQL specific format, which is NOT compatible with Hive.\n"b"13:16:18.705 
WARN org.apache.spark.sql.hive.test.TestHiveExternalCatalog: Couldn't find 
corresponding Hive SerDe for data source provider org.apache.spark.sql.json. 
Persisting data source table `default`.`externaltable` into Hive metastore in 
Spark SQL specific format, which is NOT compatible with 
Hive.\n"b'\x1b[0m[\x1b[0minfo\x1b[0m] \x1b[0m\x1b[34mTest run finished: 
\x1b[0m\x1b[34m0 failed\x1b[0m\x1b[34m, \x1b[0m\x1b[34m0 
ignored\x1b[0m\x1b[34m, 3 total, 
6.17s\x1b[0m\x1b[0m\n'b'\x1b[0m[\x1b[0minfo\x1b[0m] 
\x1b[0mScalaTest\x1b[0m\n'b'\x1b[0m[\x1b[0minfo\x1b[0m] \x1b[0m\x1b[36mRun 
completed in 5 hours, 59 minutes, 29 
seconds.\x1b[0m\x1b[0m\n'b'\x1b[0m[\x1b[0minfo\x1b[0m] \x1b[0m\x1b[36mTotal 
number of tests run: 3250\x1b[0m\x1b[0m\n'b'\x1b[0m[\x1b[0minfo\x1b[0m] 
\x1b[0m\x1b[36mSuites: completed 97, aborted 
1\x1b[0m\x1b[0m\n'b'\x1b[0m[\x1b[0minfo\x1b[0m] \x1b[0m\x1b[36mTests: succeeded 
3250, failed 0, canceled 0, ignored 596, pending 
0\x1b[0m\x1b[0m\n'b'\x1b[0m[\x1b[0minfo\x1b[0m] \x1b[0m\x1b[31m*** 1 SUITE 
ABORTED ***\x1b[0m\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m] \x1b[0mError: Total 
3256, Failed 0, Errors 1, Passed 3255, Ignored 
596\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m] \x1b[0mError during 
tests:\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m] 
\x1b[0m\torg.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m]
 \x1b[0m(hive-thriftserver/test:\x1b[31mtest\x1b[0m) sbt.TestsFailedException: 
Tests unsuccessful\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m] 
\x1b[0m(sql-kafka-0-10/test:\x1b[31mtest\x1b[0m) sbt.TestsFailedException: 
Tests unsuccessful\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m] 
\x1b[0m(yarn/test:\x1b[31mtest\x1b[0m) sbt.TestsFailedException: Tests 
unsuccessful\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m] 
\x1b[0m(core/test:\x1b[31mtest\x1b[0m) sbt.TestsFailedException: Tests 
unsuccessful\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m] 
\x1b[0m(hive/test:\x1b[31mtest\x1b[0m) sbt.TestsFailedException: Tests 
unsuccessful\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m] 
\x1b[0m(streaming/test:\x1b[31mtest\x1b[0m) sbt.TestsFailedException: Tests 
unsuccessful\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m] 
\x1b[0m(sql/test:\x1b[31mtest\x1b[0m) sbt.TestsFailedException: Tests 
unsuccessful\x1b[0m\n'b'\x1b[0m[\x1b[31merror\x1b[0m] \x1b[0mTotal time: 21853 
s, completed Oct 16, 2020 1:17:17 PM\x1b[0m\n'[error] running 
/Users/asif.shahid/workspace/code/stock-spark/spark/build/sbt -Phadoop-2.6 
-Pkafka-0-8 -Phive -Pkinesis-asl -Phive-thriftserver -Pmesos 
-Pspark-ganglia-lgpl -Pyarn -Pkubernetes -Pflume test ; received return code 1

 

 

I am not sure if the failure has got anything to do with my changes or not.

Any pointers as to how to proceed further or debug the issue, or which logs to 
check? 

 

> Constraint Propagation code causes OOM issues or increasing compilation time 
> to hours
> -
>
> Key: SPARK-33152
> URL: https://issues.apache.org/jira/browse/SPARK-33152
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Asif
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> We encountered this issue at Workday. 
> The issue is that current Constraints Propagation code pessimistically 
> generates all the possible permutations of base constraint for the aliases in 
> the project node.
> This causes blow up of the number of constraints generated causing OOM issues 
> at compile time of sql query, or queries taking 18 min to 2 hrs to compile.
> The problematic piece of code is in LogicalPlan.getAliasedConstraints
> projectList.foreach {
>  case a @ Alias(l: Literal, _) =>
>  allConstraints += EqualNullSafe(a.toAttribute, l)
>  case a @ Alias(e, _) =>
>  // For every alias in `projectList`,replace the reference in
>  // constraints by

[jira] [Commented] (SPARK-33109) Upgrade to SBT 1.4 and support `dependencyTree` back

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215651#comment-17215651
 ] 

Apache Spark commented on SPARK-33109:
--

User 'gemelen' has created a pull request for this issue:
https://github.com/apache/spark/pull/30070

> Upgrade to SBT 1.4 and support `dependencyTree` back
> 
>
> Key: SPARK-33109
> URL: https://issues.apache.org/jira/browse/SPARK-33109
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33109) Upgrade to SBT 1.4 and support `dependencyTree` back

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33109:


Assignee: (was: Apache Spark)

> Upgrade to SBT 1.4 and support `dependencyTree` back
> 
>
> Key: SPARK-33109
> URL: https://issues.apache.org/jira/browse/SPARK-33109
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33109) Upgrade to SBT 1.4 and support `dependencyTree` back

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33109:


Assignee: Apache Spark

> Upgrade to SBT 1.4 and support `dependencyTree` back
> 
>
> Key: SPARK-33109
> URL: https://issues.apache.org/jira/browse/SPARK-33109
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32436) Initialize numNonEmptyBlocks in HighlyCompressedMapStatus.readExternal

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32436:
--
Fix Version/s: 3.0.2

> Initialize numNonEmptyBlocks in HighlyCompressedMapStatus.readExternal
> --
>
> Key: SPARK-32436
> URL: https://issues.apache.org/jira/browse/SPARK-32436
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> In Scala 2.13, this causes several UT failures because 
> `HighlyCompressedMapStatus.readExternal` doesn't initialize this field. The 
> following two are the examples.
> - org.apache.spark.rdd.RDDSuite
> {code}
>   Cause: java.lang.NoSuchFieldError: numNonEmptyBlocks
>   at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.(MapStatus.scala:181)
>   at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$.apply(MapStatus.scala:280)
>   at org.apache.spark.scheduler.MapStatus$.apply(MapStatus.scala:73)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:231)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:464)
> {code}
> - org.apache.spark.scheduler.MapStatusSuite
> {code}
> *** RUN ABORTED ***
>   java.lang.NoSuchFieldError: numNonEmptyBlocks
>   at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.(MapStatus.scala:181)
>   at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$.apply(MapStatus.scala:280)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32436) Initialize numNonEmptyBlocks in HighlyCompressedMapStatus.readExternal

2020-10-16 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215655#comment-17215655
 ] 

Dongjoon Hyun commented on SPARK-32436:
---

This lands at `branch-3.0` based on the community request. 
- https://github.com/apache/spark/pull/29231#issuecomment-710061815

> Initialize numNonEmptyBlocks in HighlyCompressedMapStatus.readExternal
> --
>
> Key: SPARK-32436
> URL: https://issues.apache.org/jira/browse/SPARK-32436
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> In Scala 2.13, this causes several UT failures because 
> `HighlyCompressedMapStatus.readExternal` doesn't initialize this field. The 
> following two are the examples.
> - org.apache.spark.rdd.RDDSuite
> {code}
>   Cause: java.lang.NoSuchFieldError: numNonEmptyBlocks
>   at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.(MapStatus.scala:181)
>   at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$.apply(MapStatus.scala:280)
>   at org.apache.spark.scheduler.MapStatus$.apply(MapStatus.scala:73)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:231)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:179)
>   at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>   at org.apache.spark.scheduler.Task.run(Task.scala:127)
>   at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:464)
> {code}
> - org.apache.spark.scheduler.MapStatusSuite
> {code}
> *** RUN ABORTED ***
>   java.lang.NoSuchFieldError: numNonEmptyBlocks
>   at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus.(MapStatus.scala:181)
>   at 
> org.apache.spark.scheduler.HighlyCompressedMapStatus$.apply(MapStatus.scala:280)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33154) Handle shuffle blocks being removed during decommissioning

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33154.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30046
[https://github.com/apache/spark/pull/30046]

> Handle shuffle blocks being removed during decommissioning
> --
>
> Key: SPARK-33154
> URL: https://issues.apache.org/jira/browse/SPARK-33154
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-32376) Make unionByName null-filling behavior work with struct columns

2020-10-16 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-32376.
-
Resolution: Fixed

> Make unionByName null-filling behavior work with struct columns
> ---
>
> Key: SPARK-32376
> URL: https://issues.apache.org/jira/browse/SPARK-32376
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Mukul Murthy
>Assignee: L. C. Hsieh
>Priority: Major
> Attachments: tests.scala
>
>
> https://issues.apache.org/jira/browse/SPARK-29358 added support for 
> unionByName to work when the two datasets didn't necessarily have the same 
> schema, but it does not work with nested columns like structs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32376) Make unionByName null-filling behavior work with struct columns

2020-10-16 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215671#comment-17215671
 ] 

L. C. Hsieh commented on SPARK-32376:
-

Resolved by https://github.com/apache/spark/pull/29587.

> Make unionByName null-filling behavior work with struct columns
> ---
>
> Key: SPARK-32376
> URL: https://issues.apache.org/jira/browse/SPARK-32376
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Mukul Murthy
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: tests.scala
>
>
> https://issues.apache.org/jira/browse/SPARK-29358 added support for 
> unionByName to work when the two datasets didn't necessarily have the same 
> schema, but it does not work with nested columns like structs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32376) Make unionByName null-filling behavior work with struct columns

2020-10-16 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh updated SPARK-32376:

Fix Version/s: 3.1.0

> Make unionByName null-filling behavior work with struct columns
> ---
>
> Key: SPARK-32376
> URL: https://issues.apache.org/jira/browse/SPARK-32376
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Mukul Murthy
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: tests.scala
>
>
> https://issues.apache.org/jira/browse/SPARK-29358 added support for 
> unionByName to work when the two datasets didn't necessarily have the same 
> schema, but it does not work with nested columns like structs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33172) Spark SQL CodeGenerator does not check for UserDefined type

2020-10-16 Thread David Rabinowitz (Jira)
David Rabinowitz created SPARK-33172:


 Summary: Spark SQL CodeGenerator does not check for UserDefined 
type
 Key: SPARK-33172
 URL: https://issues.apache.org/jira/browse/SPARK-33172
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.0.1, 2.4.7
Reporter: David Rabinowitz


The CodeGenerator takes the DataType given to   `getValueFromVector()` as is, 
and generates code based on its type. The generated code is not aware of the 
actual type, and therefore cannot be compiled. For example, using a DataFrame 
with a Spark ML Vector (VectorUDT) the generated code is:

```
InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : 
(datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 
4));
```
Which leads to a runtime error of 

```
20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
153, Column 126: No applicable constructor/method found for actual parameters 
"int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow 
org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
153, Column 126: No applicable constructor/method found for actual parameters 
"int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow 
org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)
```
which then throws Spark to an infinite loop of this error.

The solution is quite simple, `getValueFromVector()` should match nad handle 
UserDefinedType the same as `CodeGenerator.javaType()` is doing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33172) Spark SQL CodeGenerator does not check for UserDefined type

2020-10-16 Thread David Rabinowitz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rabinowitz updated SPARK-33172:
-
Description: 
The CodeGenerator takes the DataType given to   {{getValueFromVector()}} as is, 
and generates code based on its type. The generated code is not aware of the 
actual type, and therefore cannot be compiled. For example, using a DataFrame 
with a Spark ML Vector (VectorUDT) the generated code is:

{{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : 
(datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 
4));
}}
Which leads to a runtime error of 

{{
20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
153, Column 126: No applicable constructor/method found for actual parameters 
"int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow 
org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
153, Column 126: No applicable constructor/method found for actual parameters 
"int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow 
org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)
}}
which then throws Spark to an infinite loop of this error.

The solution is quite simple, {{getValueFromVector()}} should match nad handle 
UserDefinedType the same as {{CodeGenerator.javaType()}} is doing.

  was:
The CodeGenerator takes the DataType given to   `getValueFromVector()` as is, 
and generates code based on its type. The generated code is not aware of the 
actual type, and therefore cannot be compiled. For example, using a DataFrame 
with a Spark ML Vector (VectorUDT) the generated code is:

```
InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : 
(datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 
4));
```
Which leads to a runtime error of 

```
20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
153, Column 126: No applicable constructor/method found for actual parameters 
"int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow 
org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
153, Column 126: No applicable constructor/method found for actual parameters 
"int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow 
org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)
```
which then throws Spark to an infinite loop of this error.

The solution is quite simple, `getValueFromVector()` should match nad handle 
UserDefinedType the same as `CodeGenerator.javaType()` is doing.


> Spark SQL CodeGenerator does not check for UserDefined type
> ---
>
> Key: SPARK-33172
> URL: https://issues.apache.org/jira/browse/SPARK-33172
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: David Rabinowitz
>Priority: Minor
>
> The CodeGenerator takes the DataType given to   {{getValueFromVector()}} as 
> is, and generates code based on its type. The generated code is not aware of 
> the actual type, and therefore cannot be compiled. For example, using a 
> DataFrame with a Spark ML Vector (VectorUDT) the generated code is:
> {{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : 
> (datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 
> 4));
> }}
> Which leads to a runtime error of 
> {{
> 20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 153, Column 126: No applicable constructor/method found for actual parameters 
> "int, int"; candidates are: "public 
> org.apache.spark.sql.vectorized.ColumnarRow 
> org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 153, Column 126: No applicable constructor/method found for actual parameters 
> "int, int"; candidates are: "public 
> org.apache.spark.sql.vectorized.ColumnarRow 
> org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
>   at 
> org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)
> }}
> which then throws Spark to an infinite loop of this error.
> The solution is quite simple, {{getValueFromVector()}} should match nad 
> handle UserDe

[jira] [Updated] (SPARK-33172) Spark SQL CodeGenerator does not check for UserDefined type

2020-10-16 Thread David Rabinowitz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Rabinowitz updated SPARK-33172:
-
Description: 
The CodeGenerator takes the DataType given to   {{getValueFromVector()}} as is, 
and generates code based on its type. The generated code is not aware of the 
actual type, and therefore cannot be compiled. For example, using a DataFrame 
with a Spark ML Vector (VectorUDT) the generated code is:

{{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : 
(datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 
4));}}

{{ Which leads to a runtime error of}}

{{20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
153, Column 126: No applicable constructor/method found for actual parameters 
"int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow 
org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}}
{{ org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
153, Column 126: No applicable constructor/method found for actual parameters 
"int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow 
org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}}
{{ at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)}}
{{...}}


{{ which then throws Spark to an infinite loop of this error.}}

The solution is quite simple, {{getValueFromVector()}} should match nad handle 
UserDefinedType the same as {{CodeGenerator.javaType()}} is doing.

  was:
The CodeGenerator takes the DataType given to   {{getValueFromVector()}} as is, 
and generates code based on its type. The generated code is not aware of the 
actual type, and therefore cannot be compiled. For example, using a DataFrame 
with a Spark ML Vector (VectorUDT) the generated code is:

{{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : 
(datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 
4));
}}
Which leads to a runtime error of 

{{
20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
153, Column 126: No applicable constructor/method found for actual parameters 
"int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow 
org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
153, Column 126: No applicable constructor/method found for actual parameters 
"int, int"; candidates are: "public org.apache.spark.sql.vectorized.ColumnarRow 
org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"
at 
org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)
}}
which then throws Spark to an infinite loop of this error.

The solution is quite simple, {{getValueFromVector()}} should match nad handle 
UserDefinedType the same as {{CodeGenerator.javaType()}} is doing.


> Spark SQL CodeGenerator does not check for UserDefined type
> ---
>
> Key: SPARK-33172
> URL: https://issues.apache.org/jira/browse/SPARK-33172
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: David Rabinowitz
>Priority: Minor
>
> The CodeGenerator takes the DataType given to   {{getValueFromVector()}} as 
> is, and generates code based on its type. The generated code is not aware of 
> the actual type, and therefore cannot be compiled. For example, using a 
> DataFrame with a Spark ML Vector (VectorUDT) the generated code is:
> {{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : 
> (datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 
> 4));}}
> {{ Which leads to a runtime error of}}
> {{20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 153, Column 126: No applicable constructor/method found for actual parameters 
> "int, int"; candidates are: "public 
> org.apache.spark.sql.vectorized.ColumnarRow 
> org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}}
> {{ org.codehaus.commons.compiler.CompileException: File 'generated.java', 
> Line 153, Column 126: No applicable constructor/method found for actual 
> parameters "int, int"; candidates are: "public 
> org.apache.spark.sql.vectorized.ColumnarRow 
> org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}}
> {{ at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)}}
> {{...}}
> {{ which then throws Spark to an infinite loop of this error.}}
> The solution is quite simple, {{getValueFromVector()}} sho

[jira] [Resolved] (SPARK-33109) Upgrade to SBT 1.4 and support `dependencyTree` back

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33109.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30070
[https://github.com/apache/spark/pull/30070]

> Upgrade to SBT 1.4 and support `dependencyTree` back
> 
>
> Key: SPARK-33109
> URL: https://issues.apache.org/jira/browse/SPARK-33109
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Denis Pyshev
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33109) Upgrade to SBT 1.4 and support `dependencyTree` back

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33109:
-

Assignee: Denis Pyshev

> Upgrade to SBT 1.4 and support `dependencyTree` back
> 
>
> Key: SPARK-33109
> URL: https://issues.apache.org/jira/browse/SPARK-33109
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Denis Pyshev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33109) Upgrade to SBT 1.4

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33109:
--
Summary: Upgrade to SBT 1.4  (was: Upgrade to SBT 1.4 and support 
`dependencyTree` back)

> Upgrade to SBT 1.4
> --
>
> Key: SPARK-33109
> URL: https://issues.apache.org/jira/browse/SPARK-33109
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Denis Pyshev
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33109) Upgrade to SBT 1.4 and support `dependencyTree` back

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33109:
--
Summary: Upgrade to SBT 1.4 and support `dependencyTree` back  (was: 
Upgrade to SBT 1.4)

> Upgrade to SBT 1.4 and support `dependencyTree` back
> 
>
> Key: SPARK-33109
> URL: https://issues.apache.org/jira/browse/SPARK-33109
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Denis Pyshev
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33172) Spark SQL CodeGenerator does not check for UserDefined type

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215698#comment-17215698
 ] 

Apache Spark commented on SPARK-33172:
--

User 'davidrabinowitz' has created a pull request for this issue:
https://github.com/apache/spark/pull/30071

> Spark SQL CodeGenerator does not check for UserDefined type
> ---
>
> Key: SPARK-33172
> URL: https://issues.apache.org/jira/browse/SPARK-33172
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: David Rabinowitz
>Priority: Minor
>
> The CodeGenerator takes the DataType given to   {{getValueFromVector()}} as 
> is, and generates code based on its type. The generated code is not aware of 
> the actual type, and therefore cannot be compiled. For example, using a 
> DataFrame with a Spark ML Vector (VectorUDT) the generated code is:
> {{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : 
> (datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 
> 4));}}
> {{ Which leads to a runtime error of}}
> {{20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 153, Column 126: No applicable constructor/method found for actual parameters 
> "int, int"; candidates are: "public 
> org.apache.spark.sql.vectorized.ColumnarRow 
> org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}}
> {{ org.codehaus.commons.compiler.CompileException: File 'generated.java', 
> Line 153, Column 126: No applicable constructor/method found for actual 
> parameters "int, int"; candidates are: "public 
> org.apache.spark.sql.vectorized.ColumnarRow 
> org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}}
> {{ at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)}}
> {{...}}
> {{ which then throws Spark to an infinite loop of this error.}}
> The solution is quite simple, {{getValueFromVector()}} should match nad 
> handle UserDefinedType the same as {{CodeGenerator.javaType()}} is doing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33172) Spark SQL CodeGenerator does not check for UserDefined type

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33172:


Assignee: (was: Apache Spark)

> Spark SQL CodeGenerator does not check for UserDefined type
> ---
>
> Key: SPARK-33172
> URL: https://issues.apache.org/jira/browse/SPARK-33172
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: David Rabinowitz
>Priority: Minor
>
> The CodeGenerator takes the DataType given to   {{getValueFromVector()}} as 
> is, and generates code based on its type. The generated code is not aware of 
> the actual type, and therefore cannot be compiled. For example, using a 
> DataFrame with a Spark ML Vector (VectorUDT) the generated code is:
> {{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : 
> (datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 
> 4));}}
> {{ Which leads to a runtime error of}}
> {{20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 153, Column 126: No applicable constructor/method found for actual parameters 
> "int, int"; candidates are: "public 
> org.apache.spark.sql.vectorized.ColumnarRow 
> org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}}
> {{ org.codehaus.commons.compiler.CompileException: File 'generated.java', 
> Line 153, Column 126: No applicable constructor/method found for actual 
> parameters "int, int"; candidates are: "public 
> org.apache.spark.sql.vectorized.ColumnarRow 
> org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}}
> {{ at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)}}
> {{...}}
> {{ which then throws Spark to an infinite loop of this error.}}
> The solution is quite simple, {{getValueFromVector()}} should match nad 
> handle UserDefinedType the same as {{CodeGenerator.javaType()}} is doing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33172) Spark SQL CodeGenerator does not check for UserDefined type

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33172:


Assignee: Apache Spark

> Spark SQL CodeGenerator does not check for UserDefined type
> ---
>
> Key: SPARK-33172
> URL: https://issues.apache.org/jira/browse/SPARK-33172
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: David Rabinowitz
>Assignee: Apache Spark
>Priority: Minor
>
> The CodeGenerator takes the DataType given to   {{getValueFromVector()}} as 
> is, and generates code based on its type. The generated code is not aware of 
> the actual type, and therefore cannot be compiled. For example, using a 
> DataFrame with a Spark ML Vector (VectorUDT) the generated code is:
> {{InternalRow datasourcev2scan_value_2 = datasourcev2scan_isNull_2 ? null : 
> (datasourcev2scan_mutableStateArray_2[2].getStruct(datasourcev2scan_rowIdx_0, 
> 4));}}
> {{ Which leads to a runtime error of}}
> {{20/10/14 13:20:51 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 153, Column 126: No applicable constructor/method found for actual parameters 
> "int, int"; candidates are: "public 
> org.apache.spark.sql.vectorized.ColumnarRow 
> org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}}
> {{ org.codehaus.commons.compiler.CompileException: File 'generated.java', 
> Line 153, Column 126: No applicable constructor/method found for actual 
> parameters "int, int"; candidates are: "public 
> org.apache.spark.sql.vectorized.ColumnarRow 
> org.apache.spark.sql.vectorized.ColumnVector.getStruct(int)"}}
> {{ at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:12124)}}
> {{...}}
> {{ which then throws Spark to an infinite loop of this error.}}
> The solution is quite simple, {{getValueFromVector()}} should match nad 
> handle UserDefinedType the same as {{CodeGenerator.javaType()}} is doing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33173) Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"

2020-10-16 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-33173:
-

 Summary: Fix Flaky Test "SPARK-33088: executor failed tasks 
trigger plugin calls"
 Key: SPARK-33173
 URL: https://issues.apache.org/jira/browse/SPARK-33173
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun


{code}
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1 did not 
equal 2
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
at 
org.apache.spark.internal.plugin.PluginContainerSuite.$anonfun$new$8(PluginContainerSuite.scala:161)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33173) Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33173:


Assignee: Apache Spark

> Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"
> 
>
> Key: SPARK-33173
> URL: https://issues.apache.org/jira/browse/SPARK-33173
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1 did 
> not equal 2
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   at 
> org.apache.spark.internal.plugin.PluginContainerSuite.$anonfun$new$8(PluginContainerSuite.scala:161)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33173) Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215699#comment-17215699
 ] 

Apache Spark commented on SPARK-33173:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/30072

> Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"
> 
>
> Key: SPARK-33173
> URL: https://issues.apache.org/jira/browse/SPARK-33173
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1 did 
> not equal 2
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   at 
> org.apache.spark.internal.plugin.PluginContainerSuite.$anonfun$new$8(PluginContainerSuite.scala:161)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33173) Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33173:


Assignee: (was: Apache Spark)

> Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"
> 
>
> Key: SPARK-33173
> URL: https://issues.apache.org/jira/browse/SPARK-33173
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1 did 
> not equal 2
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   at 
> org.apache.spark.internal.plugin.PluginContainerSuite.$anonfun$new$8(PluginContainerSuite.scala:161)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33173) Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33173:
--
Description: 
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11/lastCompletedBuild/testReport/org.apache.spark.internal.plugin/PluginContainerSuite/SPARK_33088__executor_failed_tasks_trigger_plugin_calls/

{code}
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1 did not 
equal 2
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
at 
org.apache.spark.internal.plugin.PluginContainerSuite.$anonfun$new$8(PluginContainerSuite.scala:161)
{code}

  was:
{code}
sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1 did not 
equal 2
at 
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
at 
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
at 
org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
at 
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
at 
org.apache.spark.internal.plugin.PluginContainerSuite.$anonfun$new$8(PluginContainerSuite.scala:161)
{code}


> Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"
> 
>
> Key: SPARK-33173
> URL: https://issues.apache.org/jira/browse/SPARK-33173
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11/lastCompletedBuild/testReport/org.apache.spark.internal.plugin/PluginContainerSuite/SPARK_33088__executor_failed_tasks_trigger_plugin_calls/
> {code}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1 did 
> not equal 2
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   at 
> org.apache.spark.internal.plugin.PluginContainerSuite.$anonfun$new$8(PluginContainerSuite.scala:161)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33170) Add SQL config to control fast-fail behavior in FileFormatWriter

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33170:


Assignee: Apache Spark  (was: L. C. Hsieh)

> Add SQL config to control fast-fail behavior in FileFormatWriter
> 
>
> Key: SPARK-33170
> URL: https://issues.apache.org/jira/browse/SPARK-33170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> In SPARK-29649, we catch `FileAlreadyExistsException` in `FileFormatWriter` 
> and fail fast for the task set to prevent task retry.
> After latest discussion, it is important to be able to keep original behavior 
> that is to retry tasks even `FileAlreadyExistsException` is thrown, because 
> `FileAlreadyExistsException` could be recoverable in some cases.
> We are going to add a config we can control this behavior and set it false 
> for fast-fail by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33170) Add SQL config to control fast-fail behavior in FileFormatWriter

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215705#comment-17215705
 ] 

Apache Spark commented on SPARK-33170:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30073

> Add SQL config to control fast-fail behavior in FileFormatWriter
> 
>
> Key: SPARK-33170
> URL: https://issues.apache.org/jira/browse/SPARK-33170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> In SPARK-29649, we catch `FileAlreadyExistsException` in `FileFormatWriter` 
> and fail fast for the task set to prevent task retry.
> After latest discussion, it is important to be able to keep original behavior 
> that is to retry tasks even `FileAlreadyExistsException` is thrown, because 
> `FileAlreadyExistsException` could be recoverable in some cases.
> We are going to add a config we can control this behavior and set it false 
> for fast-fail by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33170) Add SQL config to control fast-fail behavior in FileFormatWriter

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33170:


Assignee: L. C. Hsieh  (was: Apache Spark)

> Add SQL config to control fast-fail behavior in FileFormatWriter
> 
>
> Key: SPARK-33170
> URL: https://issues.apache.org/jira/browse/SPARK-33170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> In SPARK-29649, we catch `FileAlreadyExistsException` in `FileFormatWriter` 
> and fail fast for the task set to prevent task retry.
> After latest discussion, it is important to be able to keep original behavior 
> that is to retry tasks even `FileAlreadyExistsException` is thrown, because 
> `FileAlreadyExistsException` could be recoverable in some cases.
> We are going to add a config we can control this behavior and set it false 
> for fast-fail by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33170) Add SQL config to control fast-fail behavior in FileFormatWriter

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215706#comment-17215706
 ] 

Apache Spark commented on SPARK-33170:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/30073

> Add SQL config to control fast-fail behavior in FileFormatWriter
> 
>
> Key: SPARK-33170
> URL: https://issues.apache.org/jira/browse/SPARK-33170
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> In SPARK-29649, we catch `FileAlreadyExistsException` in `FileFormatWriter` 
> and fail fast for the task set to prevent task retry.
> After latest discussion, it is important to be able to keep original behavior 
> that is to retry tasks even `FileAlreadyExistsException` is thrown, because 
> `FileAlreadyExistsException` could be recoverable in some cases.
> We are going to add a config we can control this behavior and set it false 
> for fast-fail by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33131) Fix grouping sets with having clause can not resolve qualified col name

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215709#comment-17215709
 ] 

Apache Spark commented on SPARK-33131:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/30075

> Fix grouping sets with having clause can not resolve qualified col name
> ---
>
> Key: SPARK-33131
> URL: https://issues.apache.org/jira/browse/SPARK-33131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
> Fix For: 3.1.0
>
>
> Grouping sets construct new aggregate lost the qualified name of grouping 
> expression. Here is a example:
> {code:java}
> -- Works resolved by ResolveReferences
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 
> 1
> -- Works because of the extra expression c1
> select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) 
> having t1.c1 = 1
> -- Failed
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having 
> t1.c1 = 1{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32862) Left semi stream-stream join

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32862:


Assignee: (was: Apache Spark)

> Left semi stream-stream join
> 
>
> Key: SPARK-32862
> URL: https://issues.apache.org/jira/browse/SPARK-32862
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Priority: Minor
>
> Current stream-stream join supports inner, left outer and right outer join 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala#L166]
>  ). We do see internally a lot of users are using left semi stream-stream 
> join (not spark structured streaming), e.g. I want to get the ad impression 
> (join left side) which has click (joint right side), but I don't care how 
> many clicks per ad (left semi semantics).
>  
> Left semi stream-stream join will work as followed:
> (1).for left side input row, check if there's a match on right side state 
> store
>   (1.1). if there's a match, output the left side row.
>   (1.2). if there's no match, put the row in left side state store (with 
> "matched" field to set to false in state store).
> (2).for right side input row, check if there's a match on left side state 
> store. If there's a match, update left side row state with "matched" field to 
> set to true. Put the right side row in right side state store.
> (3).for left side row needs to be evicted from state store, output the row if 
> "matched" field is true.
> (4).for right side row needs to be evicted from state store, doing nothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32862) Left semi stream-stream join

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215712#comment-17215712
 ] 

Apache Spark commented on SPARK-32862:
--

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/30076

> Left semi stream-stream join
> 
>
> Key: SPARK-32862
> URL: https://issues.apache.org/jira/browse/SPARK-32862
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Priority: Minor
>
> Current stream-stream join supports inner, left outer and right outer join 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala#L166]
>  ). We do see internally a lot of users are using left semi stream-stream 
> join (not spark structured streaming), e.g. I want to get the ad impression 
> (join left side) which has click (joint right side), but I don't care how 
> many clicks per ad (left semi semantics).
>  
> Left semi stream-stream join will work as followed:
> (1).for left side input row, check if there's a match on right side state 
> store
>   (1.1). if there's a match, output the left side row.
>   (1.2). if there's no match, put the row in left side state store (with 
> "matched" field to set to false in state store).
> (2).for right side input row, check if there's a match on left side state 
> store. If there's a match, update left side row state with "matched" field to 
> set to true. Put the right side row in right side state store.
> (3).for left side row needs to be evicted from state store, output the row if 
> "matched" field is true.
> (4).for right side row needs to be evicted from state store, doing nothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-32862) Left semi stream-stream join

2020-10-16 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32862:


Assignee: Apache Spark

> Left semi stream-stream join
> 
>
> Key: SPARK-32862
> URL: https://issues.apache.org/jira/browse/SPARK-32862
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Assignee: Apache Spark
>Priority: Minor
>
> Current stream-stream join supports inner, left outer and right outer join 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala#L166]
>  ). We do see internally a lot of users are using left semi stream-stream 
> join (not spark structured streaming), e.g. I want to get the ad impression 
> (join left side) which has click (joint right side), but I don't care how 
> many clicks per ad (left semi semantics).
>  
> Left semi stream-stream join will work as followed:
> (1).for left side input row, check if there's a match on right side state 
> store
>   (1.1). if there's a match, output the left side row.
>   (1.2). if there's no match, put the row in left side state store (with 
> "matched" field to set to false in state store).
> (2).for right side input row, check if there's a match on left side state 
> store. If there's a match, update left side row state with "matched" field to 
> set to true. Put the right side row in right side state store.
> (3).for left side row needs to be evicted from state store, output the row if 
> "matched" field is true.
> (4).for right side row needs to be evicted from state store, doing nothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32862) Left semi stream-stream join

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215713#comment-17215713
 ] 

Apache Spark commented on SPARK-32862:
--

User 'c21' has created a pull request for this issue:
https://github.com/apache/spark/pull/30076

> Left semi stream-stream join
> 
>
> Key: SPARK-32862
> URL: https://issues.apache.org/jira/browse/SPARK-32862
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Priority: Minor
>
> Current stream-stream join supports inner, left outer and right outer join 
> ([https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingSymmetricHashJoinExec.scala#L166]
>  ). We do see internally a lot of users are using left semi stream-stream 
> join (not spark structured streaming), e.g. I want to get the ad impression 
> (join left side) which has click (joint right side), but I don't care how 
> many clicks per ad (left semi semantics).
>  
> Left semi stream-stream join will work as followed:
> (1).for left side input row, check if there's a match on right side state 
> store
>   (1.1). if there's a match, output the left side row.
>   (1.2). if there's no match, put the row in left side state store (with 
> "matched" field to set to false in state store).
> (2).for right side input row, check if there's a match on left side state 
> store. If there's a match, update left side row state with "matched" field to 
> set to true. Put the right side row in right side state store.
> (3).for left side row needs to be evicted from state store, output the row if 
> "matched" field is true.
> (4).for right side row needs to be evicted from state store, doing nothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33131) Fix grouping sets with having clause can not resolve qualified col name

2020-10-16 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17215714#comment-17215714
 ] 

Apache Spark commented on SPARK-33131:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/30077

> Fix grouping sets with having clause can not resolve qualified col name
> ---
>
> Key: SPARK-33131
> URL: https://issues.apache.org/jira/browse/SPARK-33131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
> Fix For: 3.1.0
>
>
> Grouping sets construct new aggregate lost the qualified name of grouping 
> expression. Here is a example:
> {code:java}
> -- Works resolved by ResolveReferences
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 
> 1
> -- Works because of the extra expression c1
> select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) 
> having t1.c1 = 1
> -- Failed
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having 
> t1.c1 = 1{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33173) Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33173.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30072
[https://github.com/apache/spark/pull/30072]

> Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"
> 
>
> Key: SPARK-33173
> URL: https://issues.apache.org/jira/browse/SPARK-33173
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11/lastCompletedBuild/testReport/org.apache.spark.internal.plugin/PluginContainerSuite/SPARK_33088__executor_failed_tasks_trigger_plugin_calls/
> {code}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1 did 
> not equal 2
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   at 
> org.apache.spark.internal.plugin.PluginContainerSuite.$anonfun$new$8(PluginContainerSuite.scala:161)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33173) Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33173:
-

Assignee: Dongjoon Hyun

> Fix Flaky Test "SPARK-33088: executor failed tasks trigger plugin calls"
> 
>
> Key: SPARK-33173
> URL: https://issues.apache.org/jira/browse/SPARK-33173
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-hive-2.3-jdk-11/lastCompletedBuild/testReport/org.apache.spark.internal.plugin/PluginContainerSuite/SPARK_33088__executor_failed_tasks_trigger_plugin_calls/
> {code}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 1 did 
> not equal 2
>   at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
>   at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
>   at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
>   at 
> org.apache.spark.internal.plugin.PluginContainerSuite.$anonfun$new$8(PluginContainerSuite.scala:161)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33131) Fix grouping sets with having clause can not resolve qualified col name

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33131:
--
Affects Version/s: 2.4.7

> Fix grouping sets with having clause can not resolve qualified col name
> ---
>
> Key: SPARK-33131
> URL: https://issues.apache.org/jira/browse/SPARK-33131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
> Fix For: 3.1.0
>
>
> Grouping sets construct new aggregate lost the qualified name of grouping 
> expression. Here is a example:
> {code:java}
> -- Works resolved by ResolveReferences
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 
> 1
> -- Works because of the extra expression c1
> select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) 
> having t1.c1 = 1
> -- Failed
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having 
> t1.c1 = 1{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33131) Fix grouping sets with having clause can not resolve qualified col name

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33131:
--
Affects Version/s: 2.4.0
   2.4.1
   2.4.2
   2.4.3
   2.4.4
   2.4.5
   2.4.6

> Fix grouping sets with having clause can not resolve qualified col name
> ---
>
> Key: SPARK-33131
> URL: https://issues.apache.org/jira/browse/SPARK-33131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 
> 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
> Fix For: 3.1.0
>
>
> Grouping sets construct new aggregate lost the qualified name of grouping 
> expression. Here is a example:
> {code:java}
> -- Works resolved by ResolveReferences
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 
> 1
> -- Works because of the extra expression c1
> select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) 
> having t1.c1 = 1
> -- Failed
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having 
> t1.c1 = 1{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33131) Fix grouping sets with having clause can not resolve qualified col name

2020-10-16 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33131:
--
Fix Version/s: 3.0.2

> Fix grouping sets with having clause can not resolve qualified col name
> ---
>
> Key: SPARK-33131
> URL: https://issues.apache.org/jira/browse/SPARK-33131
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 
> 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
> Fix For: 3.0.2, 3.1.0
>
>
> Grouping sets construct new aggregate lost the qualified name of grouping 
> expression. Here is a example:
> {code:java}
> -- Works resolved by ResolveReferences
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having c1 = 
> 1
> -- Works because of the extra expression c1
> select c1 as c2 from values (1) as t1(c1) group by grouping sets(t1.c1) 
> having t1.c1 = 1
> -- Failed
> select c1 from values (1) as t1(c1) group by grouping sets(t1.c1) having 
> t1.c1 = 1{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org