[jira] [Resolved] (SPARK-31337) Support MS Sql Kerberos login in JDBC connector

2020-06-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-31337.

Fix Version/s: 3.1.0
 Assignee: Gabor Somogyi
   Resolution: Fixed

> Support MS Sql Kerberos login in JDBC connector
> ---
>
> Key: SPARK-31337
> URL: https://issues.apache.org/jira/browse/SPARK-31337
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31559) AM starts with initial fetched tokens in any attempt

2020-05-11 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-31559.

Fix Version/s: 3.0.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> AM starts with initial fetched tokens in any attempt
> 
>
> Key: SPARK-31559
> URL: https://issues.apache.org/jira/browse/SPARK-31559
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> The issue is only occurred in yarn-cluster mode.
> Submitter will obtain delegation tokens for yarn-cluster mode, and add these 
> credentials to the launch context. AM will be launched with these 
> credentials, and AM and driver are able to leverage these tokens.
> In Yarn cluster mode, driver is launched in AM, which in turn initializes 
> token manager (while initializing SparkContext) and obtain delegation tokens 
> (+ schedule to renew) if both principal and keytab are available.
> That said, even we provide principal and keytab to run application with 
> yarn-cluster mode, AM always starts with initial tokens from launch context 
> until token manager runs and obtains delegation tokens.
> So there's a "gap", and if user codes (driver) access to external system with 
> delegation tokens (e.g. HDFS) before initializing SparkContext, it cannot 
> leverage the tokens token manager will obtain. It will make the application 
> fail if AM is killed "after" the initial tokens are expired and relaunched.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31272) Support DB2 Kerberos login in JDBC connector

2020-04-22 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-31272.

Fix Version/s: 3.1.0
 Assignee: Gabor Somogyi
   Resolution: Fixed

> Support DB2 Kerberos login in JDBC connector
> 
>
> Key: SPARK-31272
> URL: https://issues.apache.org/jira/browse/SPARK-31272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2020-04-21 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089190#comment-17089190
 ] 

Marcelo Masiero Vanzin commented on SPARK-1537:
---

Well, the only thing to start with is the existing SHS code. 
EventLoggingListener + FsHistoryProvider.

> Add integration with Yarn's Application Timeline Server
> ---
>
> Key: SPARK-1537
> URL: https://issues.apache.org/jira/browse/SPARK-1537
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>Reporter: Marcelo Masiero Vanzin
>Priority: Major
> Attachments: SPARK-1537.txt, spark-1573.patch
>
>
> It would be nice to have Spark integrate with Yarn's Application Timeline 
> Server (see YARN-321, YARN-1530). This would allow users running Spark on 
> Yarn to have a single place to go for all their history needs, and avoid 
> having to manage a separate service (Spark's built-in server).
> At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
> although there is still some ongoing work. But the basics are there, and I 
> wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1537) Add integration with Yarn's Application Timeline Server

2020-04-21 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17089137#comment-17089137
 ] 

Marcelo Masiero Vanzin commented on SPARK-1537:
---

[~templedf] sorry forgot to reply.

ATSv1 wasn't a good match for this, and by the time ATSv2 was developed, 
interest in this feature had long lost traction in the Spark community. So this 
was closed.

Also you probably can do this without requiring the code to live in Spark.

But if you actually want to contribute the integration, there's nothing 
preventing you from opening a new bug and posting a PR.

> Add integration with Yarn's Application Timeline Server
> ---
>
> Key: SPARK-1537
> URL: https://issues.apache.org/jira/browse/SPARK-1537
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>Reporter: Marcelo Masiero Vanzin
>Priority: Major
> Attachments: SPARK-1537.txt, spark-1573.patch
>
>
> It would be nice to have Spark integrate with Yarn's Application Timeline 
> Server (see YARN-321, YARN-1530). This would allow users running Spark on 
> Yarn to have a single place to go for all their history needs, and avoid 
> having to manage a separate service (Spark's built-in server).
> At the moment, there's a working version of the ATS in the Hadoop 2.4 branch, 
> although there is still some ongoing work. But the basics are there, and I 
> wouldn't expect them to change (much) at this point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31021) Support MariaDB Kerberos login in JDBC connector

2020-04-09 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-31021.

Fix Version/s: 3.1.0
 Assignee: Gabor Somogyi
   Resolution: Fixed

> Support MariaDB Kerberos login in JDBC connector
> 
>
> Key: SPARK-31021
> URL: https://issues.apache.org/jira/browse/SPARK-31021
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30874) Support Postgres Kerberos login in JDBC connector

2020-03-12 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30874.

Fix Version/s: 3.1.0
 Assignee: Gabor Somogyi
   Resolution: Fixed

> Support Postgres Kerberos login in JDBC connector
> -
>
> Key: SPARK-30874
> URL: https://issues.apache.org/jira/browse/SPARK-30874
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30481) Integrate event log compactor into Spark History Server

2020-01-28 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30481.

Fix Version/s: 3.0.0
 Assignee: Jungtaek Lim
   Resolution: Fixed

> Integrate event log compactor into Spark History Server
> ---
>
> Key: SPARK-30481
> URL: https://issues.apache.org/jira/browse/SPARK-30481
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> This issue depends on SPARK-29779 and SPARK-30479, and focuses on integrating 
> event log compactor into Spark History Server and enable configurations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30557) Add public documentation for SPARK_SUBMIT_OPTS

2020-01-17 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018256#comment-17018256
 ] 

Marcelo Masiero Vanzin commented on SPARK-30557:


I don't exactly remember what that does, but a quick looks seem to indicate 
it's basically another way of setting JVM options, used in some internal code. 
We have {{--driver-java-options}} for users already.

> Add public documentation for SPARK_SUBMIT_OPTS
> --
>
> Key: SPARK-30557
> URL: https://issues.apache.org/jira/browse/SPARK-30557
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Documentation
>Affects Versions: 2.4.4
>Reporter: Nicholas Chammas
>Priority: Minor
>
> Is `SPARK_SUBMIT_OPTS` part of Spark's public interface? If so, it needs some 
> documentation. I cannot see it documented 
> [anywhere|https://github.com/apache/spark/search?q=SPARK_SUBMIT_OPTS_q=SPARK_SUBMIT_OPTS]
>  in the docs.
> How do you use it? What is it useful for? What's an example usage? etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29876) Delete/archive file source completed files in separate thread

2020-01-17 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29876.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26502
[https://github.com/apache/spark/pull/26502]

> Delete/archive file source completed files in separate thread
> -
>
> Key: SPARK-29876
> URL: https://issues.apache.org/jira/browse/SPARK-29876
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-20568 added the possibility to clean up completed files in streaming 
> query. Deleting/archiving uses the main thread which can slow down 
> processing. It would be good to do this on separate thread(s).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29876) Delete/archive file source completed files in separate thread

2020-01-17 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29876:
--

Assignee: Gabor Somogyi

> Delete/archive file source completed files in separate thread
> -
>
> Key: SPARK-29876
> URL: https://issues.apache.org/jira/browse/SPARK-29876
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>
> SPARK-20568 added the possibility to clean up completed files in streaming 
> query. Deleting/archiving uses the main thread which can slow down 
> processing. It would be good to do this on separate thread(s).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27868) Better document shuffle / RPC listen backlog

2020-01-17 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018219#comment-17018219
 ] 

Marcelo Masiero Vanzin commented on SPARK-27868:


It's ok for now, it's done. Hopefully 3.0 will come out soon and the "main" 
documentation on the site will have the info.

> Better document shuffle / RPC listen backlog
> 
>
> Key: SPARK-27868
> URL: https://issues.apache.org/jira/browse/SPARK-27868
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Spark Core
>Affects Versions: 2.4.3
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> The option to control the listen socket backlog for RPC and shuffle servers 
> is not documented in our public docs.
> The only piece of documentation is in a Java class, and even that 
> documentation is incorrect:
> {code}
>   /** Requested maximum length of the queue of incoming connections. Default 
> -1 for no backlog. */
>   public int backLog() { return conf.getInt(SPARK_NETWORK_IO_BACKLOG_KEY, 
> -1); }
> {code}
> The default value actual causes the default value from the JRE to be used, 
> which is 50 according to the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29950) Deleted excess executors can connect back to driver in K8S with dyn alloc on

2020-01-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29950:
--

Assignee: Marcelo Masiero Vanzin

> Deleted excess executors can connect back to driver in K8S with dyn alloc on
> 
>
> Key: SPARK-29950
> URL: https://issues.apache.org/jira/browse/SPARK-29950
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Minor
>
> {{ExecutorPodsAllocator}} currently has code to delete excess pods that the 
> K8S server hasn't started yet, and aren't needed anymore due to downscaling.
> The problem is that there is a race between K8S starting the pod and the 
> Spark code deleting it. This may cause the pod to connect back to Spark and 
> do a lot of initialization, sometimes even being considered for task 
> allocation, just to be killed almost immediately.
> This doesn't cause any problems that I could detect in my tests, but wastes 
> resources, and causes logs to contains misleading messages about the executor 
> being killed. It would be nice to avoid that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29950) Deleted excess executors can connect back to driver in K8S with dyn alloc on

2020-01-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29950.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26586
[https://github.com/apache/spark/pull/26586]

> Deleted excess executors can connect back to driver in K8S with dyn alloc on
> 
>
> Key: SPARK-29950
> URL: https://issues.apache.org/jira/browse/SPARK-29950
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Minor
> Fix For: 3.0.0
>
>
> {{ExecutorPodsAllocator}} currently has code to delete excess pods that the 
> K8S server hasn't started yet, and aren't needed anymore due to downscaling.
> The problem is that there is a race between K8S starting the pod and the 
> Spark code deleting it. This may cause the pod to connect back to Spark and 
> do a lot of initialization, sometimes even being considered for task 
> allocation, just to be killed almost immediately.
> This doesn't cause any problems that I could detect in my tests, but wastes 
> resources, and causes logs to contains misleading messages about the executor 
> being killed. It would be nice to avoid that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27868) Better document shuffle / RPC listen backlog

2020-01-16 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17017281#comment-17017281
 ] 

Marcelo Masiero Vanzin commented on SPARK-27868:


You shoudn't have reverted the whole change. The documentation and extra 
logging are still really useful.

> Better document shuffle / RPC listen backlog
> 
>
> Key: SPARK-27868
> URL: https://issues.apache.org/jira/browse/SPARK-27868
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, Spark Core
>Affects Versions: 2.4.3
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Minor
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> The option to control the listen socket backlog for RPC and shuffle servers 
> is not documented in our public docs.
> The only piece of documentation is in a Java class, and even that 
> documentation is incorrect:
> {code}
>   /** Requested maximum length of the queue of incoming connections. Default 
> -1 for no backlog. */
>   public int backLog() { return conf.getInt(SPARK_NETWORK_IO_BACKLOG_KEY, 
> -1); }
> {code}
> The default value actual causes the default value from the JRE to be used, 
> which is 50 according to the docs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30246) Spark on Yarn External Shuffle Service Memory Leak

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin updated SPARK-30246:
---
Fix Version/s: (was: 2.4.5)
   2.4.6

> Spark on Yarn External Shuffle Service Memory Leak
> --
>
> Key: SPARK-30246
> URL: https://issues.apache.org/jira/browse/SPARK-30246
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.4.3
> Environment: hadoop 2.7.3
> spark 2.4.3
> jdk 1.8.0_60
>Reporter: huangweiyi
>Assignee: Henrique dos Santos Goulart
>Priority: Major
> Fix For: 3.0.0, 2.4.6
>
>
> In our large busy yarn cluster which deploy Spark external shuffle service as 
> part of YARN NM aux service, we encountered OOM in some NMs.
> after i dump the heap memory and found there are some StremState objects 
> still in heap, but the app which the StreamState belongs to is already 
> finished.
> Here is some relate Figures:
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_oom.png|width=100%!
> The heap dump below shows that the memory consumption mainly consists of two 
> parts:
> *(1) OneForOneStreamManager (4,429,796,424 (77.11%) bytes)*
> *(2) PoolChunk(occupy 1,059,201,712 (18.44%) bytes. )*
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_heap_overview.png|width=100%!
> dig into the OneForOneStreamManager, there are some StreaStates still 
> remained :
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/streamState.png|width=100%!
> incomming references to StreamState::associatedChannel: 
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/associatedChannel_incomming_reference.png|width=100%!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30246) Spark on Yarn External Shuffle Service Memory Leak

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30246:
--

Assignee: Henrique dos Santos Goulart

> Spark on Yarn External Shuffle Service Memory Leak
> --
>
> Key: SPARK-30246
> URL: https://issues.apache.org/jira/browse/SPARK-30246
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.4.3
> Environment: hadoop 2.7.3
> spark 2.4.3
> jdk 1.8.0_60
>Reporter: huangweiyi
>Assignee: Henrique dos Santos Goulart
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> In our large busy yarn cluster which deploy Spark external shuffle service as 
> part of YARN NM aux service, we encountered OOM in some NMs.
> after i dump the heap memory and found there are some StremState objects 
> still in heap, but the app which the StreamState belongs to is already 
> finished.
> Here is some relate Figures:
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_oom.png|width=100%!
> The heap dump below shows that the memory consumption mainly consists of two 
> parts:
> *(1) OneForOneStreamManager (4,429,796,424 (77.11%) bytes)*
> *(2) PoolChunk(occupy 1,059,201,712 (18.44%) bytes. )*
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_heap_overview.png|width=100%!
> dig into the OneForOneStreamManager, there are some StreaStates still 
> remained :
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/streamState.png|width=100%!
> incomming references to StreamState::associatedChannel: 
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/associatedChannel_incomming_reference.png|width=100%!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30246) Spark on Yarn External Shuffle Service Memory Leak

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30246.

Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 27064
[https://github.com/apache/spark/pull/27064]

> Spark on Yarn External Shuffle Service Memory Leak
> --
>
> Key: SPARK-30246
> URL: https://issues.apache.org/jira/browse/SPARK-30246
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 2.4.3
> Environment: hadoop 2.7.3
> spark 2.4.3
> jdk 1.8.0_60
>Reporter: huangweiyi
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> In our large busy yarn cluster which deploy Spark external shuffle service as 
> part of YARN NM aux service, we encountered OOM in some NMs.
> after i dump the heap memory and found there are some StremState objects 
> still in heap, but the app which the StreamState belongs to is already 
> finished.
> Here is some relate Figures:
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_oom.png|width=100%!
> The heap dump below shows that the memory consumption mainly consists of two 
> parts:
> *(1) OneForOneStreamManager (4,429,796,424 (77.11%) bytes)*
> *(2) PoolChunk(occupy 1,059,201,712 (18.44%) bytes. )*
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/nm_heap_overview.png|width=100%!
> dig into the OneForOneStreamManager, there are some StreaStates still 
> remained :
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/streamState.png|width=100%!
> incomming references to StreamState::associatedChannel: 
> !https://raw.githubusercontent.com/012huang/public_source/master/SparkPRFigures/associatedChannel_incomming_reference.png|width=100%!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30495) How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30495.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27191
[https://github.com/apache/spark/pull/27191]

> How to disable 'spark.security.credentials.${service}.enabled' in Structured 
> streaming while connecting to a kafka cluster
> --
>
> Key: SPARK-30495
> URL: https://issues.apache.org/jira/browse/SPARK-30495
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: act_coder
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> Trying to read data from a secured Kafka cluster using spark structured
>  streaming. Also, using the below library to read the data -
>  +*"spark-sql-kafka-0-10_2.12":"3.0.0-preview"*+ since it has the feature to
>  specify our custom group id (instead of spark setting its own custom group
>  id)
> +*Dependency used in code:*+
>         org.apache.spark
>          spark-sql-kafka-0-10_2.12
>          3.0.0-preview
>  
> +*Logs:*+
> Getting the below error - even after specifying the required JAAS
>  configuration in spark options.
> Caused by: java.lang.IllegalArgumentException: requirement failed:
>  *Delegation token must exist for this connector*. at
>  scala.Predef$.require(Predef.scala:281) at
> org.apache.spark.kafka010.KafkaTokenUtil$.isConnectorUsingCurrentToken(KafkaTokenUtil.scala:299)
>  at
>  
> org.apache.spark.sql.kafka010.KafkaDataConsumer.getOrRetrieveConsumer(KafkaDataConsumer.scala:533)
>  at
>  
> org.apache.spark.sql.kafka010.KafkaDataConsumer.$anonfun$get$1(KafkaDataConsumer.scala:275)
>  
> +*Spark configuration used to read from Kafka:*+
> val kafkaDF = sparkSession.readStream
>  .format("kafka")
>  .option("kafka.bootstrap.servers", bootStrapServer)
>  .option("subscribe", kafkaTopic )
>  
> //Setting JAAS Configuration
> .option("kafka.sasl.jaas.config", KAFKA_JAAS_SASL)
>  .option("kafka.sasl.mechanism", "PLAIN")
>  .option("kafka.security.protocol", "SASL_SSL")
> // Setting custom consumer group id
> .option("kafka.group.id", "test_cg")
>  .load()
>  
> Following document specifies that we can disable the feature of obtaining
>  delegation token -
>  
> [https://spark.apache.org/docs/3.0.0-preview/structured-streaming-kafka-integration.html]
> Tried setting this property *spark.security.credentials.kafka.enabled to*
>  *false in spark config,* but it is still failing with the same error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30495) How to disable 'spark.security.credentials.${service}.enabled' in Structured streaming while connecting to a kafka cluster

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30495:
--

Assignee: Gabor Somogyi

> How to disable 'spark.security.credentials.${service}.enabled' in Structured 
> streaming while connecting to a kafka cluster
> --
>
> Key: SPARK-30495
> URL: https://issues.apache.org/jira/browse/SPARK-30495
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: act_coder
>Assignee: Gabor Somogyi
>Priority: Major
>
> Trying to read data from a secured Kafka cluster using spark structured
>  streaming. Also, using the below library to read the data -
>  +*"spark-sql-kafka-0-10_2.12":"3.0.0-preview"*+ since it has the feature to
>  specify our custom group id (instead of spark setting its own custom group
>  id)
> +*Dependency used in code:*+
>         org.apache.spark
>          spark-sql-kafka-0-10_2.12
>          3.0.0-preview
>  
> +*Logs:*+
> Getting the below error - even after specifying the required JAAS
>  configuration in spark options.
> Caused by: java.lang.IllegalArgumentException: requirement failed:
>  *Delegation token must exist for this connector*. at
>  scala.Predef$.require(Predef.scala:281) at
> org.apache.spark.kafka010.KafkaTokenUtil$.isConnectorUsingCurrentToken(KafkaTokenUtil.scala:299)
>  at
>  
> org.apache.spark.sql.kafka010.KafkaDataConsumer.getOrRetrieveConsumer(KafkaDataConsumer.scala:533)
>  at
>  
> org.apache.spark.sql.kafka010.KafkaDataConsumer.$anonfun$get$1(KafkaDataConsumer.scala:275)
>  
> +*Spark configuration used to read from Kafka:*+
> val kafkaDF = sparkSession.readStream
>  .format("kafka")
>  .option("kafka.bootstrap.servers", bootStrapServer)
>  .option("subscribe", kafkaTopic )
>  
> //Setting JAAS Configuration
> .option("kafka.sasl.jaas.config", KAFKA_JAAS_SASL)
>  .option("kafka.sasl.mechanism", "PLAIN")
>  .option("kafka.security.protocol", "SASL_SSL")
> // Setting custom consumer group id
> .option("kafka.group.id", "test_cg")
>  .load()
>  
> Following document specifies that we can disable the feature of obtaining
>  delegation token -
>  
> [https://spark.apache.org/docs/3.0.0-preview/structured-streaming-kafka-integration.html]
> Tried setting this property *spark.security.credentials.kafka.enabled to*
>  *false in spark config,* but it is still failing with the same error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30479) Apply compaction of event log to SQL events

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30479.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27164
[https://github.com/apache/spark/pull/27164]

> Apply compaction of event log to SQL events
> ---
>
> Key: SPARK-30479
> URL: https://issues.apache.org/jira/browse/SPARK-30479
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> This issue depends on SPARK-29779 and focuses on dealing with SQL events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30479) Apply compaction of event log to SQL events

2020-01-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30479:
--

Assignee: Jungtaek Lim

> Apply compaction of event log to SQL events
> ---
>
> Key: SPARK-30479
> URL: https://issues.apache.org/jira/browse/SPARK-30479
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> This issue depends on SPARK-29779 and focuses on dealing with SQL events.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27142) Provide REST API for SQL level information

2020-01-14 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-27142:
--

Assignee: Ajith S

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Minor
> Attachments: image-2019-03-13-19-29-26-896.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found
>  
> Details: 
> https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27142) Provide REST API for SQL level information

2020-01-14 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-27142.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 24076
[https://github.com/apache/spark/pull/24076]

> Provide REST API for SQL level information
> --
>
> Key: SPARK-27142
> URL: https://issues.apache.org/jira/browse/SPARK-27142
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: image-2019-03-13-19-29-26-896.png
>
>
> Currently for Monitoring Spark application SQL information is not available 
> from REST but only via UI. REST provides only 
> applications,jobs,stages,environment. This Jira is targeted to provide a REST 
> API so that SQL level information can be found
>  
> Details: 
> https://issues.apache.org/jira/browse/SPARK-27142?focusedCommentId=16791728=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16791728



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29779) Compact old event log files and clean up

2020-01-10 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29779.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27085
[https://github.com/apache/spark/pull/27085]

> Compact old event log files and clean up
> 
>
> Key: SPARK-29779
> URL: https://issues.apache.org/jira/browse/SPARK-29779
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> Please note that this issue leaves below functionalities for future JIRA 
> issue as the patch for SPARK-29779 is too huge and we decided to break down.
>  * apply filter in SQL events
>  * integrate compaction into FsHistoryProvider
>  * documentation about new configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29779) Compact old event log files and clean up

2020-01-10 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29779:
--

Assignee: Jungtaek Lim

> Compact old event log files and clean up
> 
>
> Key: SPARK-29779
> URL: https://issues.apache.org/jira/browse/SPARK-29779
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> Please note that this issue leaves below functionalities for future JIRA 
> issue as the patch for SPARK-29779 is too huge and we decided to break down.
>  * apply filter in SQL events
>  * integrate compaction into FsHistoryProvider
>  * documentation about new configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30281) 'archive' option in FileStreamSource misses to consider partitioned and recursive option

2020-01-08 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30281:
--

Assignee: Jungtaek Lim

> 'archive' option in FileStreamSource misses to consider partitioned and 
> recursive option
> 
>
> Key: SPARK-30281
> URL: https://issues.apache.org/jira/browse/SPARK-30281
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> Cleanup option for FileStreamSource is introduced in SPARK-20568.
> To simplify the condition of verifying archive path, it took the fact that 
> FileStreamSource reads the files where these files meet one of conditions: 1) 
> parent directory matches the source pattern 2) the file itself matches the 
> source pattern.
> We found there're other cases during post-hoc review which invalidate above 
> fact: partitioned, and recursive option. With these options, FileStreamSource 
> can read the arbitrary files in subdirectories which match the source 
> pattern, so simply checking the depth of archive path doesn't work.
> We need to restore the path check logic, though it would be not easy to 
> explain to end users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30281) 'archive' option in FileStreamSource misses to consider partitioned and recursive option

2020-01-08 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30281.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26920
[https://github.com/apache/spark/pull/26920]

> 'archive' option in FileStreamSource misses to consider partitioned and 
> recursive option
> 
>
> Key: SPARK-30281
> URL: https://issues.apache.org/jira/browse/SPARK-30281
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Cleanup option for FileStreamSource is introduced in SPARK-20568.
> To simplify the condition of verifying archive path, it took the fact that 
> FileStreamSource reads the files where these files meet one of conditions: 1) 
> parent directory matches the source pattern 2) the file itself matches the 
> source pattern.
> We found there're other cases during post-hoc review which invalidate above 
> fact: partitioned, and recursive option. With these options, FileStreamSource 
> can read the arbitrary files in subdirectories which match the source 
> pattern, so simply checking the depth of archive path doesn't work.
> We need to restore the path check logic, though it would be not easy to 
> explain to end users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30313) Flaky test: MasterSuite.master/worker web ui available with reverseProxy

2020-01-06 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30313:
--

Assignee: Jungtaek Lim

> Flaky test: MasterSuite.master/worker web ui available with reverseProxy
> 
>
> Key: SPARK-30313
> URL: https://issues.apache.org/jira/browse/SPARK-30313
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
>
> Saw this test fail a few times on PRs. e.g.:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115583/testReport/org.apache.spark.deploy.master/MasterSuite/master_worker_web_ui_available_with_reverseProxy/]
>  
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 43 times over 
> 5.064226577995 seconds. Last failure message: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
> Stacktrace
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 43 times over 
> 5.064226577995 seconds. Last failure message: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
>   at 
> org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
>   at 
> org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307)
>   at 
> org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
>   at 
> org.apache.spark.deploy.master.MasterSuite.$anonfun$new$14(MasterSuite.scala:318)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   ---
> Caused by: sbt.ForkMain$ForkError: java.io.IOException: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
>   at java.net.URL.openStream(URL.java:1045)
>   at scala.io.Source$.fromURL(Source.scala:144)
>   at scala.io.Source$.fromURL(Source.scala:134)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30313) Flaky test: MasterSuite.master/worker web ui available with reverseProxy

2020-01-06 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30313.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27010
[https://github.com/apache/spark/pull/27010]

> Flaky test: MasterSuite.master/worker web ui available with reverseProxy
> 
>
> Key: SPARK-30313
> URL: https://issues.apache.org/jira/browse/SPARK-30313
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Saw this test fail a few times on PRs. e.g.:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115583/testReport/org.apache.spark.deploy.master/MasterSuite/master_worker_web_ui_available_with_reverseProxy/]
>  
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 43 times over 
> 5.064226577995 seconds. Last failure message: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
> Stacktrace
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 43 times over 
> 5.064226577995 seconds. Last failure message: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
>   at 
> org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
>   at 
> org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307)
>   at 
> org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
>   at 
> org.apache.spark.deploy.master.MasterSuite.$anonfun$new$14(MasterSuite.scala:318)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   ---
> Caused by: sbt.ForkMain$ForkError: java.io.IOException: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
>   at java.net.URL.openStream(URL.java:1045)
>   at scala.io.Source$.fromURL(Source.scala:144)
>   at scala.io.Source$.fromURL(Source.scala:134)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30285) Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2020-01-02 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30285:
--

Assignee: Wang Shuo

> Fix deadlock between LiveListenerBus#stop and 
> AsyncEventQueue#removeListenerOnError
> ---
>
> Key: SPARK-30285
> URL: https://issues.apache.org/jira/browse/SPARK-30285
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Wang Shuo
>Assignee: Wang Shuo
>Priority: Major
>
> There is a deadlock between LiveListenerBus#stop and 
> AsyncEventQueue#removeListenerOnError.
> we can reproduce as follows:
>  # Post some events to LiveListenerBus
>  # Call LiveListenerBus#stop and hold the synchronized lock of bus, waiting 
> until all the events are processed by listeners, then remove all the queues
>  # Event queue would drain out events by posting to its listeners. If a 
> listener is interrupted, it will call AsyncEventQueue#removeListenerOnError,  
> inside it will call bus.removeListener, trying to acquire synchronized lock 
> of bus, resulting in deadlock



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30285) Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError

2020-01-02 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30285.

Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26924
[https://github.com/apache/spark/pull/26924]

> Fix deadlock between LiveListenerBus#stop and 
> AsyncEventQueue#removeListenerOnError
> ---
>
> Key: SPARK-30285
> URL: https://issues.apache.org/jira/browse/SPARK-30285
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Wang Shuo
>Assignee: Wang Shuo
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> There is a deadlock between LiveListenerBus#stop and 
> AsyncEventQueue#removeListenerOnError.
> we can reproduce as follows:
>  # Post some events to LiveListenerBus
>  # Call LiveListenerBus#stop and hold the synchronized lock of bus, waiting 
> until all the events are processed by listeners, then remove all the queues
>  # Event queue would drain out events by posting to its listeners. If a 
> listener is interrupted, it will call AsyncEventQueue#removeListenerOnError,  
> inside it will call bus.removeListener, trying to acquire synchronized lock 
> of bus, resulting in deadlock



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30225) "Stream is corrupted at" exception on reading disk-spilled data of a shuffle operation

2019-12-30 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005838#comment-17005838
 ] 

Marcelo Masiero Vanzin commented on SPARK-30225:


The changes in SPARK-23366 may not necessarily have caused this. But that 
change actually flipped the configuration's default value from false to true; 
so 2.3 has the feature disabled by default, and 2.4 has it enabled by default. 
So the bug may have existed in the 2.3 version of the code, making it a bit 
harder to track. Still taking a look at the code but nothing popped up yet...

> "Stream is corrupted at" exception on reading disk-spilled data of a shuffle 
> operation
> --
>
> Key: SPARK-30225
> URL: https://issues.apache.org/jira/browse/SPARK-30225
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.0
>Reporter: Mala Chikka Kempanna
>Priority: Major
>
> There is issues with spark.unsafe.sorter.spill.read.ahead.enabled in spark 
> 2.4.0, which is introduced by 
> https://issues.apache.org/jira/browse/SPARK-23366
>  
> Workaround for this problem is to disable readahead of unsafe spill with 
> following.
>  --conf spark.unsafe.sorter.spill.read.ahead.enabled=false
>  
> This issue can be reproduced on Spark 2.4.0 by following the steps in this 
> comment of Jira SPARK-18105.
> https://issues.apache.org/jira/browse/SPARK-18105?focusedCommentId=16981461=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16981461
>  
> Exception looks like below: 
> {code:java}
> 19/12/10 01:51:31 INFO sort.ShuffleExternalSorter: Thread 142 spilling sort 
> data of 5.1 GB to disk (1  time so far)19/12/10 01:51:31 INFO 
> sort.ShuffleExternalSorter: Thread 142 spilling sort data of 5.1 GB to disk 
> (1  time so far)19/12/10 01:52:48 INFO sort.ShuffleExternalSorter: Thread 142 
> spilling sort data of 5.1 GB to disk (2  times so far)19/12/10 01:53:53 ERROR 
> executor.Executor: Exception in task 6.0 in stage 0.0 (TID 
> 6)java.io.IOException: Stream is corrupted at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:202) at 
> net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:228) at 
> net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157) at 
> org.apache.spark.io.ReadAheadInputStream$1.run(ReadAheadInputStream.java:168) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)19/12/10 01:53:53 INFO 
> executor.CoarseGrainedExecutorBackend: Got assigned task 3319/12/10 01:53:53 
> INFO executor.Executor: Running task 8.1 in stage 0.0 (TID 33)19/12/10 
> 01:54:00 INFO sort.UnsafeExternalSorter: Thread 142 spilling sort data of 3.3 
> GB to disk (0  time so far)19/12/10 01:54:30 INFO executor.Executor: Executor 
> is trying to kill task 8.1 in stage 0.0 (TID 33), reason: Stage 
> cancelled19/12/10 01:54:30 INFO executor.Executor: Executor killed task 8.1 
> in stage 0.0 (TID 33), reason: Stage cancelled19/12/10 01:54:52 INFO 
> executor.CoarseGrainedExecutorBackend: Driver commanded a shutdown{code}
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data

2019-12-30 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-18105:
--

Assignee: Marcelo Masiero Vanzin  (was: Davies Liu)

> LZ4 failed to decompress a stream of shuffled data
> --
>
> Key: SPARK-18105
> URL: https://issues.apache.org/jira/browse/SPARK-18105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Davies Liu
>Assignee: Marcelo Masiero Vanzin
>Priority: Major
>
> When lz4 is used to compress the shuffle files, it may fail to decompress it 
> as "stream is corrupt"
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in 
> stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted
>   at 
> org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220)
>   at 
> org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:353)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at com.google.common.io.ByteStreams.read(ByteStreams.java:828)
>   at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110)
>   at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> https://github.com/jpountz/lz4-java/issues/89



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18105) LZ4 failed to decompress a stream of shuffled data

2019-12-30 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-18105:
--

Assignee: (was: Marcelo Masiero Vanzin)

> LZ4 failed to decompress a stream of shuffled data
> --
>
> Key: SPARK-18105
> URL: https://issues.apache.org/jira/browse/SPARK-18105
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Davies Liu
>Priority: Major
>
> When lz4 is used to compress the shuffle files, it may fail to decompress it 
> as "stream is corrupt"
> {code}
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 92 in stage 5.0 failed 4 times, most recent failure: Lost task 92.3 in 
> stage 5.0 (TID 16616, 10.0.27.18): java.io.IOException: Stream is corrupted
>   at 
> org.apache.spark.io.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:220)
>   at 
> org.apache.spark.io.LZ4BlockInputStream.available(LZ4BlockInputStream.java:109)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:353)
>   at java.io.DataInputStream.read(DataInputStream.java:149)
>   at com.google.common.io.ByteStreams.read(ByteStreams.java:828)
>   at com.google.common.io.ByteStreams.readFully(ByteStreams.java:695)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:127)
>   at 
> org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$3$$anon$1.next(UnsafeRowSerializer.scala:110)
>   at scala.collection.Iterator$$anon$13.next(Iterator.scala:372)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionWriterContainer.writeRows(WriterContainer.scala:397)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> https://github.com/jpountz/lz4-java/issues/89



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.

2019-12-23 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-21869.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26845
[https://github.com/apache/spark/pull/26845]

> A cached Kafka producer should not be closed if any task is using it.
> -
>
> Key: SPARK-21869
> URL: https://issues.apache.org/jira/browse/SPARK-21869
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> Right now a cached Kafka producer may be closed if a large task uses it for 
> more than 10 minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.

2019-12-23 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-21869:
--

Assignee: Jungtaek Lim  (was: Gabor Somogyi)

> A cached Kafka producer should not be closed if any task is using it.
> -
>
> Key: SPARK-21869
> URL: https://issues.apache.org/jira/browse/SPARK-21869
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Right now a cached Kafka producer may be closed if a large task uses it for 
> more than 10 minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26418) Only OpenBlocks without any ChunkFetch for one stream will cause memory leak in ExternalShuffleService

2019-12-20 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-26418.

Resolution: Duplicate

> Only OpenBlocks without any ChunkFetch for one stream will cause memory leak 
> in ExternalShuffleService
> --
>
> Key: SPARK-26418
> URL: https://issues.apache.org/jira/browse/SPARK-26418
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Wang Shuo
>Priority: Major
>
> In current code path,  OneForOneStreamManager holds StreamState in a Map 
> named streams. 
> A StreamState is initialized and put into streams when OpenBlocks request 
> received.
> One specific StreamState is removed from streams in two scenarios below:
>  # The last chunk of a stream is fetched
>  # The connection of ChunkFetch is closed
> StreamState will never be clean up, if OpenBlocks request is received without 
> and following  ChunkFetch request. This will cause memory leak in server 
> side, which is harmful for long running service such as 
> ExternalShuffleService.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-17398) Failed to query on external JSon Partitioned table

2019-12-20 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-17398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-17398.

Fix Version/s: (was: 2.0.1)
   3.0.0
   2.4.5
 Assignee: Wing Yew Poon
   Resolution: Fixed

> Failed to query on external JSon Partitioned table
> --
>
> Key: SPARK-17398
> URL: https://issues.apache.org/jira/browse/SPARK-17398
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: pin_zhang
>Assignee: Wing Yew Poon
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
> Attachments: screenshot-1.png
>
>
> 1. Create External Json partitioned table 
> with SerDe in hive-hcatalog-core-1.2.1.jar, download fom
> https://mvnrepository.com/artifact/org.apache.hive.hcatalog/hive-hcatalog-core/1.2.1
> 2. Query table meet exception, which works in spark1.5.2
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: 
> Lost task
>  0.0 in stage 1.0 (TID 1, localhost): java.lang.ClassCastException: 
> java.util.ArrayList cannot be cast to org.apache.hive.hcatalog.data.HCatRecord
> at 
> org.apache.hive.hcatalog.data.HCatRecordObjectInspector.getStructFieldData(HCatRecordObjectInspector.java:45)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:430)
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:426)
>  
> 3. Test Code
> import org.apache.spark.SparkConf
> import org.apache.spark.SparkContext
> import org.apache.spark.sql.hive.HiveContext
> object JsonBugs {
>   def main(args: Array[String]): Unit = {
> val table = "test_json"
> val location = "file:///g:/home/test/json"
> val create = s"""CREATE   EXTERNAL  TABLE  ${table}
>  (id string,  seq string )
>   PARTITIONED BY(index int)
>   ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
>   LOCATION "${location}" 
>   """
> val add_part = s"""
>  ALTER TABLE ${table} ADD 
>  PARTITION (index=1)LOCATION '${location}/index=1'
> """
> val conf = new SparkConf().setAppName("scala").setMaster("local[2]")
> conf.set("spark.sql.warehouse.dir", "file:///g:/home/warehouse")
> val ctx = new SparkContext(conf)
> val hctx = new HiveContext(ctx)
> val exist = hctx.tableNames().map { x => x.toLowerCase() }.contains(table)
> if (!exist) {
>   hctx.sql(create)
>   hctx.sql(add_part)
> } else {
>   hctx.sql("show partitions " + table).show()
> }
> hctx.sql("select * from test_json").show()
>   }
> }



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30313) Flaky test: MasterSuite.master/worker web ui available with reverseProxy

2019-12-19 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000493#comment-17000493
 ] 

Marcelo Masiero Vanzin commented on SPARK-30313:


>From the logs (in case jenkins cleans the up):

{noformat}
19/12/19 13:48:39.160 dispatcher-event-loop-4 INFO Worker: WorkerWebUI is 
available at http://localhost:8080/proxy/worker-20191219
134839-localhost-36054
19/12/19 13:48:39.296 WorkerUI-52072 WARN JettyUtils: GET /json/ failed: 
java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.spark.deploy.worker.ui.WorkerPage.renderJson(WorkerPage.scala:39)
at org.apache.spark.ui.WebUI.$anonfun$attachPage$2(WebUI.scala:91)
at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:80)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:505)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
{noformat}


> Flaky test: MasterSuite.master/worker web ui available with reverseProxy
> 
>
> Key: SPARK-30313
> URL: https://issues.apache.org/jira/browse/SPARK-30313
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Priority: Major
>
> Saw this test fail a few times on PRs. e.g.:
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115583/testReport/org.apache.spark.deploy.master/MasterSuite/master_worker_web_ui_available_with_reverseProxy/]
>  
> {noformat}
> Error Message
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 43 times over 
> 5.064226577995 seconds. Last failure message: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
> Stacktrace
> sbt.ForkMain$ForkError: 
> org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
> eventually never returned normally. Attempted 43 times over 
> 5.064226577995 seconds. Last failure message: Server returned HTTP 
> response code: 500 for URL: 
> http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
>   at 
> org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
>   at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
>   at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
>   at 
> org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
>   at 

[jira] [Comment Edited] (SPARK-30313) Flaky test: MasterSuite.master/worker web ui available with reverseProxy

2019-12-19 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000493#comment-17000493
 ] 

Marcelo Masiero Vanzin edited comment on SPARK-30313 at 12/20/19 12:02 AM:
---

>From the logs (in case jenkins cleans them up):

{noformat}
19/12/19 13:48:39.160 dispatcher-event-loop-4 INFO Worker: WorkerWebUI is 
available at http://localhost:8080/proxy/worker-20191219
134839-localhost-36054
19/12/19 13:48:39.296 WorkerUI-52072 WARN JettyUtils: GET /json/ failed: 
java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.spark.deploy.worker.ui.WorkerPage.renderJson(WorkerPage.scala:39)
at org.apache.spark.ui.WebUI.$anonfun$attachPage$2(WebUI.scala:91)
at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:80)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at 
org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:505)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:370)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at 
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
{noformat}



was (Author: vanzin):
>From the logs (in case jenkins cleans the up):

{noformat}
19/12/19 13:48:39.160 dispatcher-event-loop-4 INFO Worker: WorkerWebUI is 
available at http://localhost:8080/proxy/worker-20191219
134839-localhost-36054
19/12/19 13:48:39.296 WorkerUI-52072 WARN JettyUtils: GET /json/ failed: 
java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.spark.deploy.worker.ui.WorkerPage.renderJson(WorkerPage.scala:39)
at org.apache.spark.ui.WebUI.$anonfun$attachPage$2(WebUI.scala:91)
at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:80)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at 
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:873)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623)
at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
at 
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at 

[jira] [Created] (SPARK-30313) Flaky test: MasterSuite.master/worker web ui available with reverseProxy

2019-12-19 Thread Marcelo Masiero Vanzin (Jira)
Marcelo Masiero Vanzin created SPARK-30313:
--

 Summary: Flaky test: MasterSuite.master/worker web ui available 
with reverseProxy
 Key: SPARK-30313
 URL: https://issues.apache.org/jira/browse/SPARK-30313
 Project: Spark
  Issue Type: Bug
  Components: Tests
Affects Versions: 3.0.0
Reporter: Marcelo Masiero Vanzin


Saw this test fail a few times on PRs. e.g.:

[https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115583/testReport/org.apache.spark.deploy.master/MasterSuite/master_worker_web_ui_available_with_reverseProxy/]

 
{noformat}

Error Message

org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
eventually never returned normally. Attempted 43 times over 5.064226577995 
seconds. Last failure message: Server returned HTTP response code: 500 for URL: 
http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.

Stacktrace

sbt.ForkMain$ForkError: 
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to 
eventually never returned normally. Attempted 43 times over 5.064226577995 
seconds. Last failure message: Server returned HTTP response code: 500 for URL: 
http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/.
at 
org.scalatest.concurrent.Eventually.tryTryAgain$1(Eventually.scala:432)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:439)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:391)
at 
org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
at org.scalatest.concurrent.Eventually.eventually(Eventually.scala:308)
at org.scalatest.concurrent.Eventually.eventually$(Eventually.scala:307)
at 
org.apache.spark.deploy.master.MasterSuite.eventually(MasterSuite.scala:111)
at 
org.apache.spark.deploy.master.MasterSuite.$anonfun$new$14(MasterSuite.scala:318)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
---
Caused by: sbt.ForkMain$ForkError: java.io.IOException: Server returned HTTP 
response code: 500 for URL: 
http://localhost:45395/proxy/worker-20191219134839-localhost-36054/json/
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
at java.net.URL.openStream(URL.java:1045)
at scala.io.Source$.fromURL(Source.scala:144)
at scala.io.Source$.fromURL(Source.scala:134)
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30235) Keeping compatibility with 2.4 external shuffle service regarding host local shuffle blocks reading

2019-12-17 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-30235:
--

Assignee: Attila Zsolt Piros

> Keeping compatibility with 2.4 external shuffle service regarding host local 
> shuffle blocks reading
> ---
>
> Key: SPARK-30235
> URL: https://issues.apache.org/jira/browse/SPARK-30235
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Minor
>
> When `spark.shuffle.readHostLocalDisk.enabled` is true then a new message is 
> used which is not supported by Spark 2.4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30235) Keeping compatibility with 2.4 external shuffle service regarding host local shuffle blocks reading

2019-12-17 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30235.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26869
[https://github.com/apache/spark/pull/26869]

> Keeping compatibility with 2.4 external shuffle service regarding host local 
> shuffle blocks reading
> ---
>
> Key: SPARK-30235
> URL: https://issues.apache.org/jira/browse/SPARK-30235
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Minor
> Fix For: 3.0.0
>
>
> When `spark.shuffle.readHostLocalDisk.enabled` is true then a new message is 
> used which is not supported by Spark 2.4.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30277) NoSuchMethodError in Spark 3.0.0-preview with Delta Lake

2019-12-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30277.

Resolution: Not A Problem

That's a internal Spark class, which means that if there is a problem, it's not 
in Spark.

> NoSuchMethodError in Spark 3.0.0-preview with Delta Lake
> 
>
> Key: SPARK-30277
> URL: https://issues.apache.org/jira/browse/SPARK-30277
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: Spark: 3.0.0-preview-bin-hadoop2.7
> Delta Lake: 0.5.0_2.12
> Java: 1.8.0_171
>Reporter: Victor Zhang
>Priority: Major
>
> Open spark shell with delta lake packages:
> {code:java}
> bin/spark-shell --master local --packages io.delta:delta-core_2.12:0.5.0{code}
> Create a delta table:
> {code:java}
> spark.range(5).write.format("delta").save("/tmp/delta-table1")
> {code}
> Throws NoSuchMethodException.
> {code:java}
> com.google.common.util.concurrent.ExecutionError: 
> java.lang.NoSuchMethodError: 
> org.apache.spark.util.Utils$.classForName(Ljava/lang/String;)Ljava/lang/Class;
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at 
> com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
>   at org.apache.spark.sql.delta.DeltaLog$.apply(DeltaLog.scala:740)
>   at org.apache.spark.sql.delta.DeltaLog$.forTable(DeltaLog.scala:702)
>   at 
> org.apache.spark.sql.delta.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:126)
>   at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:71)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:69)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:87)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:189)
>   at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:227)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:224)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:185)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:110)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:109)
>   at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:829)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$4(SQLExecution.scala:100)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:87)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:829)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:309)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:236)
>   ... 47 elided
> Caused by: java.lang.NoSuchMethodError: 
> org.apache.spark.util.Utils$.classForName(Ljava/lang/String;)Ljava/lang/Class;
>   at 
> org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore(LogStore.scala:122)
>   at 
> org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore$(LogStore.scala:120)
>   at org.apache.spark.sql.delta.DeltaLog.createLogStore(DeltaLog.scala:58)
>   at 
> org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore(LogStore.scala:117)
>   at 
> org.apache.spark.sql.delta.storage.LogStoreProvider.createLogStore$(LogStore.scala:115)
>   at org.apache.spark.sql.delta.DeltaLog.createLogStore(DeltaLog.scala:58)
>   at org.apache.spark.sql.delta.DeltaLog.(DeltaLog.scala:79)
>   at 
> org.apache.spark.sql.delta.DeltaLog$$anon$3.$anonfun$call$2(DeltaLog.scala:744)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)
>   at 
> org.apache.spark.sql.delta.DeltaLog$$anon$3.$anonfun$call$1(DeltaLog.scala:744)
>   at 
> com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
>   at 
> com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
>   at org.apache.spark.sql.delta.DeltaLog$.recordOperation(DeltaLog.scala:671)
>   at 
> 

[jira] [Commented] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page

2019-12-16 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16997698#comment-16997698
 ] 

Marcelo Masiero Vanzin commented on SPARK-25392:


The fix basically hides pool details from the history server; actually showing 
pool info is a more involved change, and if that's wanted a new bug should be 
filed. (I know there was a PR for it, but, well, that requires more committer 
time for reviewing too...)

> [Spark Job History]Inconsistent behaviour for pool details in spark web UI 
> and history server page 
> ---
>
> Key: SPARK-25392
> URL: https://issues.apache.org/jira/browse/SPARK-25392
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: OS: SUSE 11
> Spark Version: 2.3
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: shahid
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> Steps:
> 1.Enable spark.scheduler.mode = FAIR
> 2.Submitted beeline jobs
> create database JH;
> use JH;
> create table one12( id int );
> insert into one12 values(12);
> insert into one12 values(13);
> Select * from one12;
> 3.Click on JDBC Incompleted Application ID in Job History Page
> 4. Go to Job Tab in staged Web UI page
> 5. Click on run at AccessController.java:0 under Desription column
> 6 . Click default under Pool Name column of Completed Stages table
> URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default
> 7. It throws below error
> HTTP ERROR 400
> Problem accessing /history/application_1536399199015_0006/stages/pool/. 
> Reason:
> Unknown pool: default
> Powered by Jetty:// x.y.z
> But under 
> Yarn resource page it display the summary under Fair Scheduler Pool: default 
> URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default
> Summary
> Pool Name Minimum Share   Pool Weight Active Stages   Running Tasks   
> SchedulingMode
> default   0   1   0   0   FIFO



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page

2019-12-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-25392:
--

Assignee: shahid

> [Spark Job History]Inconsistent behaviour for pool details in spark web UI 
> and history server page 
> ---
>
> Key: SPARK-25392
> URL: https://issues.apache.org/jira/browse/SPARK-25392
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: OS: SUSE 11
> Spark Version: 2.3
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: shahid
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> Steps:
> 1.Enable spark.scheduler.mode = FAIR
> 2.Submitted beeline jobs
> create database JH;
> use JH;
> create table one12( id int );
> insert into one12 values(12);
> insert into one12 values(13);
> Select * from one12;
> 3.Click on JDBC Incompleted Application ID in Job History Page
> 4. Go to Job Tab in staged Web UI page
> 5. Click on run at AccessController.java:0 under Desription column
> 6 . Click default under Pool Name column of Completed Stages table
> URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default
> 7. It throws below error
> HTTP ERROR 400
> Problem accessing /history/application_1536399199015_0006/stages/pool/. 
> Reason:
> Unknown pool: default
> Powered by Jetty:// x.y.z
> But under 
> Yarn resource page it display the summary under Fair Scheduler Pool: default 
> URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default
> Summary
> Pool Name Minimum Share   Pool Weight Active Stages   Running Tasks   
> SchedulingMode
> default   0   1   0   0   FIFO



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page

2019-12-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-25392.

Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26616
[https://github.com/apache/spark/pull/26616]

> [Spark Job History]Inconsistent behaviour for pool details in spark web UI 
> and history server page 
> ---
>
> Key: SPARK-25392
> URL: https://issues.apache.org/jira/browse/SPARK-25392
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: OS: SUSE 11
> Spark Version: 2.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> Steps:
> 1.Enable spark.scheduler.mode = FAIR
> 2.Submitted beeline jobs
> create database JH;
> use JH;
> create table one12( id int );
> insert into one12 values(12);
> insert into one12 values(13);
> Select * from one12;
> 3.Click on JDBC Incompleted Application ID in Job History Page
> 4. Go to Job Tab in staged Web UI page
> 5. Click on run at AccessController.java:0 under Desription column
> 6 . Click default under Pool Name column of Completed Stages table
> URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default
> 7. It throws below error
> HTTP ERROR 400
> Problem accessing /history/application_1536399199015_0006/stages/pool/. 
> Reason:
> Unknown pool: default
> Powered by Jetty:// x.y.z
> But under 
> Yarn resource page it display the summary under Fair Scheduler Pool: default 
> URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default
> Summary
> Pool Name Minimum Share   Pool Weight Active Stages   Running Tasks   
> SchedulingMode
> default   0   1   0   0   FIFO



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler

2019-12-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29043:
--

Assignee: feiwang

> [History Server]Only one replay thread of FsHistoryProvider work because of 
> straggler
> -
>
> Key: SPARK-29043
> URL: https://issues.apache.org/jira/browse/SPARK-29043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: feiwang
>Assignee: feiwang
>Priority: Major
> Attachments: image-2019-09-11-15-09-22-912.png, 
> image-2019-09-11-15-10-25-326.png, screenshot-1.png
>
>
> As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for 
> spark history server.
> However, there is only one replay thread work because of straggler.
> Let's check the code.
> https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547
> There is a synchronous operation for all replay tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29043) [History Server]Only one replay thread of FsHistoryProvider work because of straggler

2019-12-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29043.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25797
[https://github.com/apache/spark/pull/25797]

> [History Server]Only one replay thread of FsHistoryProvider work because of 
> straggler
> -
>
> Key: SPARK-29043
> URL: https://issues.apache.org/jira/browse/SPARK-29043
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: feiwang
>Assignee: feiwang
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: image-2019-09-11-15-09-22-912.png, 
> image-2019-09-11-15-10-25-326.png, screenshot-1.png
>
>
> As shown in the attachment, we set spark.history.fs.numReplayThreads=30 for 
> spark history server.
> However, there is only one replay thread work because of straggler.
> Let's check the code.
> https://github.com/apache/spark/blob/7f36cd2aa5e066a807d498b8c51645b136f08a75/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L509-L547
> There is a synchronous operation for all replay tasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29574) spark with user provided hadoop doesn't work on kubernetes

2019-12-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29574:
--

Assignee: Shahin Shakeri

> spark with user provided hadoop doesn't work on kubernetes
> --
>
> Key: SPARK-29574
> URL: https://issues.apache.org/jira/browse/SPARK-29574
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Michał Wesołowski
>Assignee: Shahin Shakeri
>Priority: Major
> Fix For: 3.0.0
>
>
> When spark-submit is run with image built with "hadoop free" spark and user 
> provided hadoop it fails on kubernetes (hadoop libraries are not on spark's 
> classpath). 
> I downloaded spark [Pre-built with user-provided Apache 
> Hadoop|https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-without-hadoop.tgz].
>  
> I created docker image with usage of 
> [docker-image-tool.sh|[https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh]].
>  
>  
> Based on this image (2.4.4-without-hadoop)
> I created another one with Dockerfile
> {code:java}
> FROM spark-py:2.4.4-without-hadoop
> ENV SPARK_HOME=/opt/spark/
> # This is needed for newer kubernetes versions
> ADD 
> https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.6.1/kubernetes-client-4.6.1.jar
>  $SPARK_HOME/jars
> COPY spark-env.sh /opt/spark/conf/spark-env.sh
> RUN chmod +x /opt/spark/conf/spark-env.sh
> RUN wget -qO- 
> https://www-eu.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz 
> | tar xz  -C /opt/
> ENV HADOOP_HOME=/opt/hadoop-3.2.1
> ENV PATH=${HADOOP_HOME}/bin:${PATH}
> {code}
> Contents of spark-env.sh:
> {code:java}
> #!/usr/bin/env bash
> export SPARK_DIST_CLASSPATH=$(hadoop 
> classpath):$HADOOP_HOME/share/hadoop/tools/lib/*
> export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
> {code}
> spark-submit run with image crated this way fails since spark-env.sh is 
> overwritten by [volume created when pod 
> starts|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L108]
> As quick workaround I tried to modify [entrypoint 
> script|https://github.com/apache/spark/blob/ea8b5df47476fe66b63bd7f7bcd15acfb80bde78/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh]
>  to run spark-env.sh during startup and moving spark-env.sh to a different 
> directory. 
>  Driver starts without issues in this setup however, evethough 
> SPARK_DIST_CLASSPATH is set executor is constantly failing:
> {code:java}
> PS 
> C:\Sandbox\projekty\roboticdrive-analytics\components\docker-images\spark-rda>
>  kubectl logs rda-script-1571835692837-exec-12
> ++ id -u
> + myuid=0
> ++ id -g
> + mygid=0
> + set +e
> ++ getent passwd 0
> + uidentry=root:x:0:0:root:/root:/bin/ash
> + set -e
> + '[' -z root:x:0:0:root:/root:/bin/ash ']'
> + source /opt/spark-env.sh
> +++ hadoop classpath
> ++ export 
> 'SPARK_DIST_CLASSPATH=/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoo++
>  
> SPARK_DIST_CLASSPATH='/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoop/yarn:/opt/hadoop-3.2.1/share/hadoop/yarn/lib/*:/opt/hadoop-3.2.1/share/hadoop/yarn/*:/opt/hadoop-3.2.1/share/hadoop/tools/lib/*'
> ++ export LD_LIBRARY_PATH=/opt/hadoop-3.2.1/lib/native
> ++ LD_LIBRARY_PATH=/opt/hadoop-3.2.1/lib/native
> ++ echo 
> 'SPARK_DIST_CLASSPATH=/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoop/yarn:/opt/hadoop-3.2.1/share/hadoop/yarn/lib/*:/opt/hadoop-3.2.1/share/hadoop/yarn/*:/opt/hadoop-3.2.1/share/hadoop/tools/lib/*'
> 

[jira] [Resolved] (SPARK-29574) spark with user provided hadoop doesn't work on kubernetes

2019-12-16 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29574.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26493
[https://github.com/apache/spark/pull/26493]

> spark with user provided hadoop doesn't work on kubernetes
> --
>
> Key: SPARK-29574
> URL: https://issues.apache.org/jira/browse/SPARK-29574
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
>Reporter: Michał Wesołowski
>Priority: Major
> Fix For: 3.0.0
>
>
> When spark-submit is run with image built with "hadoop free" spark and user 
> provided hadoop it fails on kubernetes (hadoop libraries are not on spark's 
> classpath). 
> I downloaded spark [Pre-built with user-provided Apache 
> Hadoop|https://www-us.apache.org/dist/spark/spark-2.4.4/spark-2.4.4-bin-without-hadoop.tgz].
>  
> I created docker image with usage of 
> [docker-image-tool.sh|[https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh]].
>  
>  
> Based on this image (2.4.4-without-hadoop)
> I created another one with Dockerfile
> {code:java}
> FROM spark-py:2.4.4-without-hadoop
> ENV SPARK_HOME=/opt/spark/
> # This is needed for newer kubernetes versions
> ADD 
> https://repo1.maven.org/maven2/io/fabric8/kubernetes-client/4.6.1/kubernetes-client-4.6.1.jar
>  $SPARK_HOME/jars
> COPY spark-env.sh /opt/spark/conf/spark-env.sh
> RUN chmod +x /opt/spark/conf/spark-env.sh
> RUN wget -qO- 
> https://www-eu.apache.org/dist/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz 
> | tar xz  -C /opt/
> ENV HADOOP_HOME=/opt/hadoop-3.2.1
> ENV PATH=${HADOOP_HOME}/bin:${PATH}
> {code}
> Contents of spark-env.sh:
> {code:java}
> #!/usr/bin/env bash
> export SPARK_DIST_CLASSPATH=$(hadoop 
> classpath):$HADOOP_HOME/share/hadoop/tools/lib/*
> export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
> {code}
> spark-submit run with image crated this way fails since spark-env.sh is 
> overwritten by [volume created when pod 
> starts|https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L108]
> As quick workaround I tried to modify [entrypoint 
> script|https://github.com/apache/spark/blob/ea8b5df47476fe66b63bd7f7bcd15acfb80bde78/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh]
>  to run spark-env.sh during startup and moving spark-env.sh to a different 
> directory. 
>  Driver starts without issues in this setup however, evethough 
> SPARK_DIST_CLASSPATH is set executor is constantly failing:
> {code:java}
> PS 
> C:\Sandbox\projekty\roboticdrive-analytics\components\docker-images\spark-rda>
>  kubectl logs rda-script-1571835692837-exec-12
> ++ id -u
> + myuid=0
> ++ id -g
> + mygid=0
> + set +e
> ++ getent passwd 0
> + uidentry=root:x:0:0:root:/root:/bin/ash
> + set -e
> + '[' -z root:x:0:0:root:/root:/bin/ash ']'
> + source /opt/spark-env.sh
> +++ hadoop classpath
> ++ export 
> 'SPARK_DIST_CLASSPATH=/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoo++
>  
> SPARK_DIST_CLASSPATH='/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoop/yarn:/opt/hadoop-3.2.1/share/hadoop/yarn/lib/*:/opt/hadoop-3.2.1/share/hadoop/yarn/*:/opt/hadoop-3.2.1/share/hadoop/tools/lib/*'
> ++ export LD_LIBRARY_PATH=/opt/hadoop-3.2.1/lib/native
> ++ LD_LIBRARY_PATH=/opt/hadoop-3.2.1/lib/native
> ++ echo 
> 'SPARK_DIST_CLASSPATH=/opt/hadoop-3.2.1/etc/hadoop:/opt/hadoop-3.2.1/share/hadoop/common/lib/*:/opt/hadoop-3.2.1/share/hadoop/common/*:/opt/hadoop-3.2.1/share/hadoop/hdfs:/opt/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/opt/hadoop-3.2.1/share/hadoop/hdfs/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-3.2.1/share/hadoop/mapreduce/*:/opt/hadoop-3.2.1/share/hadoop/yarn:/opt/hadoop-3.2.1/share/hadoop/yarn/lib/*:/opt/hadoop-3.2.1/share/hadoop/yarn/*:/opt/hadoop-3.2.1/share/hadoop/tools/lib/*'
> 

[jira] [Resolved] (SPARK-30167) Log4j configuration for REPL can't override the root logger properly.

2019-12-13 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-30167.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26798
[https://github.com/apache/spark/pull/26798]

> Log4j configuration for REPL can't override the root logger properly.
> -
>
> Key: SPARK-30167
> URL: https://issues.apache.org/jira/browse/SPARK-30167
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
> Fix For: 3.0.0
>
>
> SPARK-11929 enabled REPL's log4j configuration to override root logger but 
> SPARK-26753 seems to have broken the feature.
> You can see one example when you modifies the default log4j configuration 
> like as follows.
> {code:java}
> # Change the log level for rootCategory to DEBUG
> log4j.rootCategory=DEBUG, console
> ...
> # The log level for repl.Main remains WARN
> log4j.logger.org.apache.spark.repl.Main=WARN{code}
> If you launch REPL with the configuration, INFO level logs appear even though 
> the log level for REPL is WARN.
> {code:java}
> ・・・
> 19/12/08 23:31:38 INFO Utils: Successfully started service 'sparkDriver' on 
> port 33083.
> 19/12/08 23:31:38 INFO SparkEnv: Registering MapOutputTracker
> 19/12/08 23:31:38 INFO SparkEnv: Registering BlockManagerMaster
> 19/12/08 23:31:38 INFO BlockManagerMasterEndpoint: Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
> 19/12/08 23:31:38 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint 
> up
> 19/12/08 23:31:38 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
> ・・・{code}
>  
> Before SPARK-26753 was applied, those INFO level logs are not shown with the 
> same log4j.properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30240) Spark UI redirects do not always work behind (dumb) proxies

2019-12-12 Thread Marcelo Masiero Vanzin (Jira)
Marcelo Masiero Vanzin created SPARK-30240:
--

 Summary: Spark UI redirects do not always work behind (dumb) 
proxies
 Key: SPARK-30240
 URL: https://issues.apache.org/jira/browse/SPARK-30240
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.0.0
Reporter: Marcelo Masiero Vanzin


Spark's support for proxy servers allows the code to prepend a prefix to URIs 
generated by Spark pages. But if Spark sends a redirect to the client, then 
Spark's own full URL is exposed. If the client cannot access that URL, or it's 
incorrect for whatever reason, then things do not work.

For example, if you set up an stunnel HTTPS proxy on port 4443, and get the 
root of the Spark UI, you get this back (with all the TLS stuff stripped):
{noformat}
$ curl -v -k https://vanzin-t460p:4443/
*   Trying 127.0.1.1...
* Connected to vanzin-t460p (127.0.1.1) port 4443 (#0)
> GET / HTTP/1.1
> Host: vanzin-t460p:4443
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 302 Found
< Date: Thu, 12 Dec 2019 22:09:52 GMT
< Cache-Control: no-cache, no-store, must-revalidate
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Location: http://vanzin-t460p:4443/jobs/
< Content-Length: 0
< Server: Jetty(9.4.18.v20190429)
{noformat}
So you can see that Jetty respects the "Host" header, but that has no 
information about the protocol, and Spark has no idea that the proxy is using 
HTTPS. So the returned URL does not work.

 

Some proxies are smart enough to rewrite responses, but it would be nice (and 
pretty easy) for Spark to support this simple use case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29152) Spark Executor Plugin API shutdown is not proper when dynamic allocation enabled

2019-12-10 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29152:
--

Assignee: Rakesh Raushan

> Spark Executor Plugin API shutdown is not proper when dynamic allocation 
> enabled
> 
>
> Key: SPARK-29152
> URL: https://issues.apache.org/jira/browse/SPARK-29152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
>Reporter: jobit mathew
>Assignee: Rakesh Raushan
>Priority: Major
>
> *Issue Description*
> Spark Executor Plugin API *shutdown handling is not proper*, when dynamic 
> allocation enabled .Plugin's shutdown method is not processed when dynamic 
> allocation is enabled and *executors become dead* after inactive time.
> *Test Precondition*
> 1. Create a plugin and make a jar named SparkExecutorplugin.jar
> import org.apache.spark.ExecutorPlugin;
> public class ExecutoTest1 implements ExecutorPlugin{
> public void init(){
> System.out.println("Executor Plugin Initialised.");
> }
> public void shutdown(){
> System.out.println("Executor plugin closed successfully.");
> }
> }
> 2. Create the  jars with the same and put it in folder /spark/examples/jars
> *Test Steps*
> 1. launch bin/spark-sql with dynamic allocation enabled
> ./spark-sql --master yarn --conf spark.executor.plugins=ExecutoTest1  --jars 
> /opt/HA/C10/install/spark/spark/examples/jars/SparkExecutorPlugin.jar --conf 
> spark.dynamicAllocation.enabled=true --conf 
> spark.dynamicAllocation.initialExecutors=2 --conf 
> spark.dynamicAllocation.minExecutors=1
> 2 create a table , insert the data and select * from tablename
> 3.Check the spark UI Jobs tab/SQL tab
> 4. Check all Executors(executor tab will give all executors details) 
> application log file for Executor plugin Initialization and Shutdown messages 
> or operations.
> Example 
> /yarn/logdir/application_1567156749079_0025/container_e02_1567156749079_0025_01_05/
>  stdout
> 5. Wait for the executor to be dead after the inactive time and check the 
> same container log 
> 6. Kill the spark sql and check the container log  for executor plugin 
> shutdown.
> *Expect Output*
> 1. Job should be success. Create table ,insert and select query should be 
> success.
> 2.While running query All Executors  log should contain the executor plugin 
> Init messages or operations.
> "Executor Plugin Initialised.
> 3.Once the executors are dead ,shutdown message should be there in log file.
> “ Executor plugin closed successfully.
> 4.Once the sql application closed ,shutdown message should be there in log.
> “ Executor plugin closed successfully". 
> *Actual Output*
> Shutdown message is not called when executor is dead after inactive time.
> *Observation*
> Without dynamic allocation Executor plugin is working fine. But after 
> enabling dynamic allocation,Executor shutdown is not processed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29152) Spark Executor Plugin API shutdown is not proper when dynamic allocation enabled

2019-12-10 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29152.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26810
[https://github.com/apache/spark/pull/26810]

> Spark Executor Plugin API shutdown is not proper when dynamic allocation 
> enabled
> 
>
> Key: SPARK-29152
> URL: https://issues.apache.org/jira/browse/SPARK-29152
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
>Reporter: jobit mathew
>Assignee: Rakesh Raushan
>Priority: Major
> Fix For: 3.0.0
>
>
> *Issue Description*
> Spark Executor Plugin API *shutdown handling is not proper*, when dynamic 
> allocation enabled .Plugin's shutdown method is not processed when dynamic 
> allocation is enabled and *executors become dead* after inactive time.
> *Test Precondition*
> 1. Create a plugin and make a jar named SparkExecutorplugin.jar
> import org.apache.spark.ExecutorPlugin;
> public class ExecutoTest1 implements ExecutorPlugin{
> public void init(){
> System.out.println("Executor Plugin Initialised.");
> }
> public void shutdown(){
> System.out.println("Executor plugin closed successfully.");
> }
> }
> 2. Create the  jars with the same and put it in folder /spark/examples/jars
> *Test Steps*
> 1. launch bin/spark-sql with dynamic allocation enabled
> ./spark-sql --master yarn --conf spark.executor.plugins=ExecutoTest1  --jars 
> /opt/HA/C10/install/spark/spark/examples/jars/SparkExecutorPlugin.jar --conf 
> spark.dynamicAllocation.enabled=true --conf 
> spark.dynamicAllocation.initialExecutors=2 --conf 
> spark.dynamicAllocation.minExecutors=1
> 2 create a table , insert the data and select * from tablename
> 3.Check the spark UI Jobs tab/SQL tab
> 4. Check all Executors(executor tab will give all executors details) 
> application log file for Executor plugin Initialization and Shutdown messages 
> or operations.
> Example 
> /yarn/logdir/application_1567156749079_0025/container_e02_1567156749079_0025_01_05/
>  stdout
> 5. Wait for the executor to be dead after the inactive time and check the 
> same container log 
> 6. Kill the spark sql and check the container log  for executor plugin 
> shutdown.
> *Expect Output*
> 1. Job should be success. Create table ,insert and select query should be 
> success.
> 2.While running query All Executors  log should contain the executor plugin 
> Init messages or operations.
> "Executor Plugin Initialised.
> 3.Once the executors are dead ,shutdown message should be there in log file.
> “ Executor plugin closed successfully.
> 4.Once the sql application closed ,shutdown message should be there in log.
> “ Executor plugin closed successfully". 
> *Actual Output*
> Shutdown message is not called when executor is dead after inactive time.
> *Observation*
> Without dynamic allocation Executor plugin is working fine. But after 
> enabling dynamic allocation,Executor shutdown is not processed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30129) New auth engine does not keep client ID in TransportClient after auth

2019-12-04 Thread Marcelo Masiero Vanzin (Jira)
Marcelo Masiero Vanzin created SPARK-30129:
--

 Summary: New auth engine does not keep client ID in 
TransportClient after auth
 Key: SPARK-30129
 URL: https://issues.apache.org/jira/browse/SPARK-30129
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4, 3.0.0
Reporter: Marcelo Masiero Vanzin


Found a little bug when working on a feature; when auth is on, it's expected 
that the {{TransportClient}} provides the authenticated ID of the client 
(generally the app ID), but the new auth engine is not setting that information.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27651) Avoid the network when block manager fetches shuffle blocks from the same host

2019-11-26 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-27651.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25299
[https://github.com/apache/spark/pull/25299]

> Avoid the network when block manager fetches shuffle blocks from the same host
> --
>
> Key: SPARK-27651
> URL: https://issues.apache.org/jira/browse/SPARK-27651
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Affects Versions: 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 3.0.0
>
>
> When a shuffle block (content) is fetched the network is always used even 
> when it is fetched from an executor (or the external shuffle service) running 
> on the same host.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27651) Avoid the network when block manager fetches shuffle blocks from the same host

2019-11-26 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-27651:
--

Assignee: Attila Zsolt Piros

> Avoid the network when block manager fetches shuffle blocks from the same host
> --
>
> Key: SPARK-27651
> URL: https://issues.apache.org/jira/browse/SPARK-27651
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Affects Versions: 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
>
> When a shuffle block (content) is fetched the network is always used even 
> when it is fetched from an executor (or the external shuffle service) running 
> on the same host.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30033) Use Spark plugin support to manage shuffle plugin lifecycle

2019-11-25 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16981997#comment-16981997
 ] 

Marcelo Masiero Vanzin commented on SPARK-30033:


[~mcheah] [~yifeih]

I had plans to do this at some point, but after taking a look at your S3 
prototype it seems you guys already have a use case for the extra functionality 
Spark already has. I'll send a PR soon with my current implementation.

> Use Spark plugin support to manage shuffle plugin lifecycle
> ---
>
> Key: SPARK-30033
> URL: https://issues.apache.org/jira/browse/SPARK-30033
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Priority: Major
>
> SPARK-29396 added support for Spark plugins that have driver and executor 
> components. It provides lifecycle APIs (initialization / shutdown) and also 
> some extra features like metric registration and easy access to Spark's RPC 
> system.
> We could use those APIs as the base for the shuffle-related plugins, so that 
> they have easy access to the extra functionality if they need to.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30033) Use Spark plugin support to manage shuffle plugin lifecycle

2019-11-25 Thread Marcelo Masiero Vanzin (Jira)
Marcelo Masiero Vanzin created SPARK-30033:
--

 Summary: Use Spark plugin support to manage shuffle plugin 
lifecycle
 Key: SPARK-30033
 URL: https://issues.apache.org/jira/browse/SPARK-30033
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Marcelo Masiero Vanzin


SPARK-29396 added support for Spark plugins that have driver and executor 
components. It provides lifecycle APIs (initialization / shutdown) and also 
some extra features like metric registration and easy access to Spark's RPC 
system.

We could use those APIs as the base for the shuffle-related plugins, so that 
they have easy access to the extra functionality if they need to.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29971) Multiple possible buffer leaks in TransportFrameDecoder and TransportCipher

2019-11-25 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin updated SPARK-29971:
---
Fix Version/s: 2.4.5

> Multiple possible buffer leaks in TransportFrameDecoder and TransportCipher
> ---
>
> Key: SPARK-29971
> URL: https://issues.apache.org/jira/browse/SPARK-29971
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Norman Maurer
>Assignee: Norman Maurer
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> TransportFrameDecoder and TransportCipher currently not carefully manage the 
> life-cycle of ByteBuf instances and so leak memory in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26260) Task Summary Metrics for Stage Page: Efficient implementation for SHS when using disk store.

2019-11-25 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-26260:
--

Assignee: shahid

> Task Summary Metrics for Stage Page: Efficient implementation for SHS when 
> using disk store.
> 
>
> Key: SPARK-26260
> URL: https://issues.apache.org/jira/browse/SPARK-26260
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: shahid
>Assignee: shahid
>Priority: Major
>
> Currently, tasks summary metrics is calculated based on all the tasks, 
> instead of successful tasks. 
> After the JIRA, https://issues.apache.org/jira/browse/SPARK-26119, when using 
> InMemory store, it find task summary metrics for all the successful tasks 
> metrics. But we need to find an efficient implementation for disk store case 
> for SHS. The main bottle neck for disk store is deserialization time overhead.
> Hints: Need to rework on the way indexing works, so that we can index by 
> specific metrics for successful and failed tasks differently (would be 
> tricky). Also would require changing the disk store version (to invalidate 
> old stores).
> OR any other efficient solutions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26260) Task Summary Metrics for Stage Page: Efficient implementation for SHS when using disk store.

2019-11-25 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-26260.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26508
[https://github.com/apache/spark/pull/26508]

> Task Summary Metrics for Stage Page: Efficient implementation for SHS when 
> using disk store.
> 
>
> Key: SPARK-26260
> URL: https://issues.apache.org/jira/browse/SPARK-26260
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: shahid
>Assignee: shahid
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, tasks summary metrics is calculated based on all the tasks, 
> instead of successful tasks. 
> After the JIRA, https://issues.apache.org/jira/browse/SPARK-26119, when using 
> InMemory store, it find task summary metrics for all the successful tasks 
> metrics. But we need to find an efficient implementation for disk store case 
> for SHS. The main bottle neck for disk store is deserialization time overhead.
> Hints: Need to rework on the way indexing works, so that we can index by 
> specific metrics for successful and failed tasks differently (would be 
> tricky). Also would require changing the disk store version (to invalidate 
> old stores).
> OR any other efficient solutions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29971) Multiple possible buffer leaks in TransportFrameDecoder and TransportCipher

2019-11-22 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29971.

Fix Version/s: 3.0.0
 Assignee: Norman Maurer
   Resolution: Fixed

> Multiple possible buffer leaks in TransportFrameDecoder and TransportCipher
> ---
>
> Key: SPARK-29971
> URL: https://issues.apache.org/jira/browse/SPARK-29971
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Norman Maurer
>Assignee: Norman Maurer
>Priority: Major
> Fix For: 3.0.0
>
>
> TransportFrameDecoder and TransportCipher currently not carefully manage the 
> life-cycle of ByteBuf instances and so leak memory in some cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29965) Race in executor shutdown handling can lead to executor never fully unregistering

2019-11-19 Thread Marcelo Masiero Vanzin (Jira)
Marcelo Masiero Vanzin created SPARK-29965:
--

 Summary: Race in executor shutdown handling can lead to executor 
never fully unregistering
 Key: SPARK-29965
 URL: https://issues.apache.org/jira/browse/SPARK-29965
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Marcelo Masiero Vanzin


I ran into a situation that I had never noticed before, but I seem to be able 
to hit with just a few retries when using K8S with dynamic allocation.

Basically, there's a race when killing an executor, where it may send a 
heartbeat to the driver right at the wrong time during shutdown, e.g.:

{noformat}
19/11/19 21:14:05 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
19/11/19 21:14:05 INFO Executor: Told to re-register on heartbeat
19/11/19 21:14:05 INFO BlockManager: BlockManager BlockManagerId(10, 
192.168.3.99, 39923, None) re-registering with master
19/11/19 21:14:05 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(10, 192.168.3.99, 39923, None)
19/11/19 21:14:05 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(10, 192.168.3.99, 39923, None)
19/11/19 21:14:06 INFO BlockManager: Reporting 0 blocks to the master.
{noformat}

On the driver side it will happily re-register the executor (time diff is just 
because of time zone in log4j config):

{noformat}
19/11/19 13:14:05 INFO BlockManagerMasterEndpoint: Trying to remove executor 10 
from BlockManagerMaster.
19/11/19 13:14:05 INFO BlockManagerMasterEndpoint: Removing block manager 
BlockManagerId(10, 192.168.3.99, 39923, None)
19/11/19 13:14:05 INFO BlockManagerMaster: Removed 10 successfully in 
removeExecutor
19/11/19 13:14:05 INFO DAGScheduler: Shuffle files lost for executor: 10 (epoch 
18)
{noformat}

And a little later:

{noformat}
19/11/19 13:14:05 DEBUG HeartbeatReceiver: Received heartbeat from unknown 
executor 10
19/11/19 13:14:05 INFO BlockManagerMasterEndpoint: Registering block manager 
192.168.3.99:39923 with 413.9 MiB RAM, BlockManagerId(10, 192.168.3.99, 39923, 
None)
{noformat}

This becomes a problem later, where you start to see period exceptions in the 
driver's logs:

{noformat}
19/11/19 13:14:39 WARN BlockManagerMasterEndpoint: Error trying to remove 
broadcast 4 from block manager BlockManagerId(10, 192.168.3.99, 39923, None)
java.io.IOException: Failed to send RPC RPC 4999007301825869809 to 
/10.65.55.240:14233: java.nio.channels.ClosedChannelException
at 
org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:362)
at 
org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:339)
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
at 
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
{noformat}

That happens every time some code calls into the block manager to request stuff 
from all executors. Meaning that the dead executor re-registered, and then was 
never removed from the block manager.

I found a few races in the code that can lead to this situation. I'll post a PR 
once I test it more.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29950) Deleted excess executors can connect back to driver in K8S with dyn alloc on

2019-11-18 Thread Marcelo Masiero Vanzin (Jira)
Marcelo Masiero Vanzin created SPARK-29950:
--

 Summary: Deleted excess executors can connect back to driver in 
K8S with dyn alloc on
 Key: SPARK-29950
 URL: https://issues.apache.org/jira/browse/SPARK-29950
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Marcelo Masiero Vanzin


{{ExecutorPodsAllocator}} currently has code to delete excess pods that the K8S 
server hasn't started yet, and aren't needed anymore due to downscaling.

The problem is that there is a race between K8S starting the pod and the Spark 
code deleting it. This may cause the pod to connect back to Spark and do a lot 
of initialization, sometimes even being considered for task allocation, just to 
be killed almost immediately.

This doesn't cause any problems that I could detect in my tests, but wastes 
resources, and causes logs to contains misleading messages about the executor 
being killed. It would be nice to avoid that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29833) Add FileNotFoundException check for spark.yarn.jars

2019-11-15 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29833.

Fix Version/s: 3.0.0
 Assignee: ulysses you
   Resolution: Fixed

> Add FileNotFoundException check  for spark.yarn.jars
> 
>
> Key: SPARK-29833
> URL: https://issues.apache.org/jira/browse/SPARK-29833
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.4.4
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
> Fix For: 3.0.0
>
>
> When set `spark.yarn.jars=/xxx/xxx` which is just a no schema path, spark 
> will throw a NullPointerException.
> The reason is hdfs will return null if pathFs.globStatus(path) is not exist, 
> and spark just use `pathFs.globStatus(path).filter(_.isFile())` without check 
> it.
> Related Globber code is here
> {noformat}
> /*
>  * When the input pattern "looks" like just a simple filename, and we
>  * can't find it, we return null rather than an empty array.
>  * This is a special case which the shell relies on.
>  *
>  * To be more precise: if there were no results, AND there were no
>  * groupings (aka brackets), and no wildcards in the input (aka stars),
>  * we return null.
>  */
> if ((!sawWildcard) && results.isEmpty() &&
> (flattenedPatterns.size() <= 1)) {
>   return null;
> }
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29905) ExecutorPodsLifecycleManager has sub-optimal behavior with dynamic allocation

2019-11-14 Thread Marcelo Masiero Vanzin (Jira)
Marcelo Masiero Vanzin created SPARK-29905:
--

 Summary: ExecutorPodsLifecycleManager has sub-optimal behavior 
with dynamic allocation
 Key: SPARK-29905
 URL: https://issues.apache.org/jira/browse/SPARK-29905
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Marcelo Masiero Vanzin


I've been playing with dynamic allocation on k8s and noticed some weird 
behavior from ExecutorPodsLifecycleManager when it's on.

The cause of this behavior is mostly because of the higher rate of pod updates 
when you have dynamic allocation. Pods being created and going away all the 
time generate lots of events, that are then translated into "snapshots" 
internally in Spark, and fed to subscribers such as 
ExecutorPodsLifecycleManager.

The first effect of that is that you get a lot of spurious logging. Since 
snapshots are incremental, you can get lots of snapshots with the same 
"PodDeleted" information, for example, and ExecutorPodsLifecycleManager will 
log for all of them. Yes, log messages are at debug level, but if you're 
debugging that stuff, it's really noisy and distracting.

The second effect is that the same way you get multiple log messages, you end 
up calling into the Spark scheduler, and worse, into the K8S API server, 
multiple times for the same pod update. We can optimize that and reduce the 
chattiness with the API server.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29882) SPARK on Kubernetes is Broken for SPARK with no Hadoop release

2019-11-13 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29882.

   Fix Version/s: (was: 2.4.4)
Target Version/s:   (was: 2.4.4)
  Resolution: Duplicate

> SPARK on Kubernetes is Broken for SPARK with no Hadoop release
> --
>
> Key: SPARK-29882
> URL: https://issues.apache.org/jira/browse/SPARK-29882
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.4
> Environment: h3. How was this patch tested?
> Kubernetes 1.14, Spark 2.4.4, Hadoop 3.2.1. Adding $SPARK_DIST_CLASSPATH to 
> {{-cp }} param of entrypoint.sh enables launching the executors correctly.
>Reporter: Shahin Shakeri
>Priority: Major
>
> h3. What changes were proposed in this pull request?
> Include {{$SPARK_DIST_CLASSPATH}} in class path when launching 
> {{CoarseGrainedExecutorBackend}} on Kubernetes executors using the provided 
> {{entrypoint.sh}}
> h3. Why are the changes needed?
> For user provided Hadoop (3.2.1 in this example) {{$SPARK_DIST_CLASSPATH}} 
> contains the required jars.
> h3. Does this PR introduce any user-facing change?
> no
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29865) k8s executor pods all have different prefixes in client mode

2019-11-12 Thread Marcelo Masiero Vanzin (Jira)
Marcelo Masiero Vanzin created SPARK-29865:
--

 Summary: k8s executor pods all have different prefixes in client 
mode
 Key: SPARK-29865
 URL: https://issues.apache.org/jira/browse/SPARK-29865
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Marcelo Masiero Vanzin


This works in cluster mode since the features set things up so that all 
executor pods have the same name prefix.

But in client mode features are not used; so each executor ends up with a 
different name prefix, which makes debugging a little bit annoying.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29487) Ability to run Spark Kubernetes other than from /opt/spark

2019-11-12 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29487.

Resolution: Duplicate

This is basically allowing people to customize their docker images, which is 
SPARK-24655.

> Ability to run Spark Kubernetes other than from /opt/spark
> --
>
> Key: SPARK-29487
> URL: https://issues.apache.org/jira/browse/SPARK-29487
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Submit
>Affects Versions: 2.4.4
>Reporter: Benjamin Miao CAI
>Priority: Minor
>
> On spark kubernetes Dockerfile, the spark binaries are copied to 
> */opt/spark.* 
> If we try to create our own Dockerfile without using */opt/spark* then the 
> image will not run.
> After looking at the source code, it seem that in various places, the path is 
> hard-coded to */opt/spark*
> *Example :*
> Constants.scala :
> {color:#808080}// Spark app configs for containers
>  {color}{color:#80}val {color}SPARK_CONF_VOLUME = 
> {color:#008000}"spark-conf-volume"{color}
>  *{color:#80}val {color}SPARK_CONF_DIR_INTERNAL = 
> {color:#008000}"/opt/spark/conf"{color}*
>  
> Is it possible to make this configurable so we can put spark elsewhere than 
> /opt/.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29755) ClassCastException occurs when reading events from SHS

2019-11-11 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29755:
--

Assignee: Jungtaek Lim

> ClassCastException occurs when reading events from SHS
> --
>
> Key: SPARK-29755
> URL: https://issues.apache.org/jira/browse/SPARK-29755
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> Looks like SPARK-28869 triggered a technical issue on jackson-scala: 
> https://github.com/FasterXML/jackson-module-scala/wiki/FAQ#deserializing-optionint-and-other-primitive-challenges
> {noformat}
> 19/11/05 17:59:23 INFO FsHistoryProvider: Leasing disk manager space for app 
> app-20191105152223- / None...
> 19/11/05 17:59:23 INFO FsHistoryProvider: Parsing 
> /apps/spark/eventlogs/app-20191105152223- to re-build UI...
> 19/11/05 17:59:24 ERROR FsHistoryProvider: Exception in checking for event 
> log updates
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
>   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.shouldReloadLog(FsHistoryProvider.scala:585)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$6(FsHistoryProvider.scala:458)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$6$adapted(FsHistoryProvider.scala:444)
>   at 
> scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:256)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at 
> scala.collection.TraversableLike.filterImpl(TraversableLike.scala:255)
>   at 
> scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:249)
>   at 
> scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
>   at scala.collection.TraversableLike.filter(TraversableLike.scala:347)
>   at scala.collection.TraversableLike.filter$(TraversableLike.scala:347)
>   at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.checkForLogs(FsHistoryProvider.scala:444)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$startPolling$3(FsHistoryProvider.scala:267)
>   at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1302)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$getRunner$1(FsHistoryProvider.scala:190)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29755) ClassCastException occurs when reading events from SHS

2019-11-11 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29755.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26397
[https://github.com/apache/spark/pull/26397]

> ClassCastException occurs when reading events from SHS
> --
>
> Key: SPARK-29755
> URL: https://issues.apache.org/jira/browse/SPARK-29755
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Looks like SPARK-28869 triggered a technical issue on jackson-scala: 
> https://github.com/FasterXML/jackson-module-scala/wiki/FAQ#deserializing-optionint-and-other-primitive-challenges
> {noformat}
> 19/11/05 17:59:23 INFO FsHistoryProvider: Leasing disk manager space for app 
> app-20191105152223- / None...
> 19/11/05 17:59:23 INFO FsHistoryProvider: Parsing 
> /apps/spark/eventlogs/app-20191105152223- to re-build UI...
> 19/11/05 17:59:24 ERROR FsHistoryProvider: Exception in checking for event 
> log updates
> java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.Long
>   at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.shouldReloadLog(FsHistoryProvider.scala:585)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$6(FsHistoryProvider.scala:458)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$checkForLogs$6$adapted(FsHistoryProvider.scala:444)
>   at 
> scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:256)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at 
> scala.collection.TraversableLike.filterImpl(TraversableLike.scala:255)
>   at 
> scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:249)
>   at 
> scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108)
>   at scala.collection.TraversableLike.filter(TraversableLike.scala:347)
>   at scala.collection.TraversableLike.filter$(TraversableLike.scala:347)
>   at scala.collection.AbstractTraversable.filter(Traversable.scala:108)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.checkForLogs(FsHistoryProvider.scala:444)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$startPolling$3(FsHistoryProvider.scala:267)
>   at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1302)
>   at 
> org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$getRunner$1(FsHistoryProvider.scala:190)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26154) Stream-stream joins - left outer join gives inconsistent output

2019-11-11 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-26154.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26108
[https://github.com/apache/spark/pull/26108]

> Stream-stream joins - left outer join gives inconsistent output
> ---
>
> Key: SPARK-26154
> URL: https://issues.apache.org/jira/browse/SPARK-26154
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.2, 3.0.0
> Environment: Spark version - Spark 2.3.2
> OS- Suse 11
>Reporter: Haripriya
>Assignee: Jungtaek Lim
>Priority: Blocker
>  Labels: correctness
> Fix For: 3.0.0
>
>
> Stream-stream joins using left outer join gives inconsistent  output 
> The data processed once, is being processed again and gives null value. In 
> Batch 2, the input data  "3" is processed. But again in batch 6, null value 
> is provided for same data
> Steps
> In spark-shell
> {code:java}
> scala> import org.apache.spark.sql.functions.{col, expr}
> import org.apache.spark.sql.functions.{col, expr}
> scala> import org.apache.spark.sql.streaming.Trigger
> import org.apache.spark.sql.streaming.Trigger
> scala> val lines_stream1 = spark.readStream.
>  |   format("kafka").
>  |   option("kafka.bootstrap.servers", "ip:9092").
>  |   option("subscribe", "topic1").
>  |   option("includeTimestamp", true).
>  |   load().
>  |   selectExpr("CAST (value AS String)","CAST(timestamp AS 
> TIMESTAMP)").as[(String,Timestamp)].
>  |   select(col("value") as("data"),col("timestamp") 
> as("recordTime")).
>  |   select("data","recordTime").
>  |   withWatermark("recordTime", "5 seconds ")
> lines_stream1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = 
> [data: string, recordTime: timestamp]
> scala> val lines_stream2 = spark.readStream.
>  |   format("kafka").
>  |   option("kafka.bootstrap.servers", "ip:9092").
>  |   option("subscribe", "topic2").
>  |   option("includeTimestamp", value = true).
>  |   load().
>  |   selectExpr("CAST (value AS String)","CAST(timestamp AS 
> TIMESTAMP)").as[(String,Timestamp)].
>  |   select(col("value") as("data1"),col("timestamp") 
> as("recordTime1")).
>  |   select("data1","recordTime1").
>  |   withWatermark("recordTime1", "10 seconds ")
> lines_stream2: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = 
> [data1: string, recordTime1: timestamp]
> scala> val query = lines_stream1.join(lines_stream2, expr (
>  |   """
>  | | data == data1 and
>  | | recordTime1 >= recordTime and
>  | | recordTime1 <= recordTime + interval 5 seconds
>  |   """.stripMargin),"left").
>  |   writeStream.
>  |   option("truncate","false").
>  |   outputMode("append").
>  |   format("console").option("checkpointLocation", 
> "/tmp/leftouter/").
>  |   trigger(Trigger.ProcessingTime ("5 seconds")).
>  |   start()
> query: org.apache.spark.sql.streaming.StreamingQuery = 
> org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@1a48f55b
> {code}
> Step2 : Start producing data
> kafka-console-producer.sh --broker-list ip:9092 --topic topic1
>  >1
>  >2
>  >3
>  >4
>  >5
>  >aa
>  >bb
>  >cc
> kafka-console-producer.sh --broker-list ip:9092 --topic topic2
>  >2
>  >2
>  >3
>  >4
>  >5
>  >aa
>  >cc
>  >ee
>  >ee
>  
> Output obtained:
> {code:java}
> Batch: 0
> ---
> ++--+-+---+
> |data|recordTime|data1|recordTime1|
> ++--+-+---+
> ++--+-+---+
> ---
> Batch: 1
> ---
> ++--+-+---+
> |data|recordTime|data1|recordTime1|
> ++--+-+---+
> ++--+-+---+
> ---
> Batch: 2
> ---
> ++---+-+---+
> |data|recordTime |data1|recordTime1|
> ++---+-+---+
> |3   |2018-11-22 20:09:35.053|3|2018-11-22 20:09:36.506|
> |2   |2018-11-22 20:09:31.613|2|2018-11-22 20:09:33.116|
> ++---+-+---+
> ---
> Batch: 3
> ---
> ++---+-+---+
> |data|recordTime 

[jira] [Assigned] (SPARK-26154) Stream-stream joins - left outer join gives inconsistent output

2019-11-11 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-26154:
--

Assignee: Jungtaek Lim

> Stream-stream joins - left outer join gives inconsistent output
> ---
>
> Key: SPARK-26154
> URL: https://issues.apache.org/jira/browse/SPARK-26154
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.2, 3.0.0
> Environment: Spark version - Spark 2.3.2
> OS- Suse 11
>Reporter: Haripriya
>Assignee: Jungtaek Lim
>Priority: Blocker
>  Labels: correctness
>
> Stream-stream joins using left outer join gives inconsistent  output 
> The data processed once, is being processed again and gives null value. In 
> Batch 2, the input data  "3" is processed. But again in batch 6, null value 
> is provided for same data
> Steps
> In spark-shell
> {code:java}
> scala> import org.apache.spark.sql.functions.{col, expr}
> import org.apache.spark.sql.functions.{col, expr}
> scala> import org.apache.spark.sql.streaming.Trigger
> import org.apache.spark.sql.streaming.Trigger
> scala> val lines_stream1 = spark.readStream.
>  |   format("kafka").
>  |   option("kafka.bootstrap.servers", "ip:9092").
>  |   option("subscribe", "topic1").
>  |   option("includeTimestamp", true).
>  |   load().
>  |   selectExpr("CAST (value AS String)","CAST(timestamp AS 
> TIMESTAMP)").as[(String,Timestamp)].
>  |   select(col("value") as("data"),col("timestamp") 
> as("recordTime")).
>  |   select("data","recordTime").
>  |   withWatermark("recordTime", "5 seconds ")
> lines_stream1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = 
> [data: string, recordTime: timestamp]
> scala> val lines_stream2 = spark.readStream.
>  |   format("kafka").
>  |   option("kafka.bootstrap.servers", "ip:9092").
>  |   option("subscribe", "topic2").
>  |   option("includeTimestamp", value = true).
>  |   load().
>  |   selectExpr("CAST (value AS String)","CAST(timestamp AS 
> TIMESTAMP)").as[(String,Timestamp)].
>  |   select(col("value") as("data1"),col("timestamp") 
> as("recordTime1")).
>  |   select("data1","recordTime1").
>  |   withWatermark("recordTime1", "10 seconds ")
> lines_stream2: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = 
> [data1: string, recordTime1: timestamp]
> scala> val query = lines_stream1.join(lines_stream2, expr (
>  |   """
>  | | data == data1 and
>  | | recordTime1 >= recordTime and
>  | | recordTime1 <= recordTime + interval 5 seconds
>  |   """.stripMargin),"left").
>  |   writeStream.
>  |   option("truncate","false").
>  |   outputMode("append").
>  |   format("console").option("checkpointLocation", 
> "/tmp/leftouter/").
>  |   trigger(Trigger.ProcessingTime ("5 seconds")).
>  |   start()
> query: org.apache.spark.sql.streaming.StreamingQuery = 
> org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@1a48f55b
> {code}
> Step2 : Start producing data
> kafka-console-producer.sh --broker-list ip:9092 --topic topic1
>  >1
>  >2
>  >3
>  >4
>  >5
>  >aa
>  >bb
>  >cc
> kafka-console-producer.sh --broker-list ip:9092 --topic topic2
>  >2
>  >2
>  >3
>  >4
>  >5
>  >aa
>  >cc
>  >ee
>  >ee
>  
> Output obtained:
> {code:java}
> Batch: 0
> ---
> ++--+-+---+
> |data|recordTime|data1|recordTime1|
> ++--+-+---+
> ++--+-+---+
> ---
> Batch: 1
> ---
> ++--+-+---+
> |data|recordTime|data1|recordTime1|
> ++--+-+---+
> ++--+-+---+
> ---
> Batch: 2
> ---
> ++---+-+---+
> |data|recordTime |data1|recordTime1|
> ++---+-+---+
> |3   |2018-11-22 20:09:35.053|3|2018-11-22 20:09:36.506|
> |2   |2018-11-22 20:09:31.613|2|2018-11-22 20:09:33.116|
> ++---+-+---+
> ---
> Batch: 3
> ---
> ++---+-+---+
> |data|recordTime |data1|recordTime1|
> ++---+-+---+
> |4   |2018-11-22 20:09:38.654|4

[jira] [Resolved] (SPARK-29770) Allow setting spark.app.id when spark-submit for Spark on Kubernetes

2019-11-11 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29770.

Resolution: Won't Fix

See comments in PR.

> Allow setting spark.app.id when spark-submit for Spark on Kubernetes
> 
>
> Key: SPARK-29770
> URL: https://issues.apache.org/jira/browse/SPARK-29770
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Liu Runzhong
>Priority: Minor
>  Labels: easyfix
>
> when the user provides  `spark.app.id`  by `spark-submit`, it's actually 
> doing nothing to change the `spark.app.id`, as `spark.app.id` can only be set 
> by `kubernetesAppId` every time, which makes the users feel confused.
> [https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala#L196]
> Knowing that `spark.app.id` would be labeled to Driver/Executor pods and 
> other resources and the strict limitation of the label values, but I think it 
> would be more flexible to users to decide how to generate the `spark.app.id` 
> by themselves. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29790) Add notes about port being required for Kubernetes API URL when set as master

2019-11-08 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29790.

Fix Version/s: 3.0.0
   2.4.5
 Assignee: Emil Sandstø
   Resolution: Fixed

> Add notes about port being required for Kubernetes API URL when set as master
> -
>
> Key: SPARK-29790
> URL: https://issues.apache.org/jira/browse/SPARK-29790
> Project: Spark
>  Issue Type: Documentation
>  Components: Kubernetes
>Affects Versions: 2.4.3, 2.4.4
>Reporter: Emil Sandstø
>Assignee: Emil Sandstø
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> Apparently, when configuring the master endpoint, the Kubernetes API url 
> needs to include the port, even if it's 443. This should be noted in the 
> documentation of "Running Spark on Kubernetes guide".
> Reported in the wild 
> [https://medium.com/@kidane.weldemariam_75349/thanks-james-on-issuing-spark-submit-i-run-into-this-error-cc507d4f8f0d]
> I had the same issue myself.
> We might want to create an issue for fixing the implementation, if it's 
> considered a bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.

2019-11-07 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-21869:
--

Assignee: Gabor Somogyi

> A cached Kafka producer should not be closed if any task is using it.
> -
>
> Key: SPARK-21869
> URL: https://issues.apache.org/jira/browse/SPARK-21869
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Gabor Somogyi
>Priority: Major
>
> Right now a cached Kafka producer may be closed if a large task uses it for 
> more than 10 minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21869) A cached Kafka producer should not be closed if any task is using it.

2019-11-07 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-21869.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25853
[https://github.com/apache/spark/pull/25853]

> A cached Kafka producer should not be closed if any task is using it.
> -
>
> Key: SPARK-21869
> URL: https://issues.apache.org/jira/browse/SPARK-21869
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.0
>
>
> Right now a cached Kafka producer may be closed if a large task uses it for 
> more than 10 minutes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29635) Deduplicate test suites between Kafka micro-batch sink and Kafka continuous sink

2019-11-06 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29635.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26292
[https://github.com/apache/spark/pull/26292]

> Deduplicate test suites between Kafka micro-batch sink and Kafka continuous 
> sink
> 
>
> Key: SPARK-29635
> URL: https://issues.apache.org/jira/browse/SPARK-29635
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> There's a comment in KafkaContinuousSinkSuite which is most likely explaining 
> TODO:
> https://github.com/apache/spark/blob/37690dea107623ebca1e47c64db59196ee388f2f/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSinkSuite.scala#L35-L39
> {noformat}
> /**
>  * This is a temporary port of KafkaSinkSuite, since we do not yet have a V2 
> memory stream.
>  * Once we have one, this will be changed to a specialization of 
> KafkaSinkSuite and we won't have
>  * to duplicate all the code.
>  */
> {noformat}
> Given latest master branch has V2 memory stream now, it is a good time to 
> deduplicate two suites into one, via having base class and let these suites 
> override necessary things.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29635) Deduplicate test suites between Kafka micro-batch sink and Kafka continuous sink

2019-11-06 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29635:
--

Assignee: Jungtaek Lim

> Deduplicate test suites between Kafka micro-batch sink and Kafka continuous 
> sink
> 
>
> Key: SPARK-29635
> URL: https://issues.apache.org/jira/browse/SPARK-29635
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
>
> There's a comment in KafkaContinuousSinkSuite which is most likely explaining 
> TODO:
> https://github.com/apache/spark/blob/37690dea107623ebca1e47c64db59196ee388f2f/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSinkSuite.scala#L35-L39
> {noformat}
> /**
>  * This is a temporary port of KafkaSinkSuite, since we do not yet have a V2 
> memory stream.
>  * Once we have one, this will be changed to a specialization of 
> KafkaSinkSuite and we won't have
>  * to duplicate all the code.
>  */
> {noformat}
> Given latest master branch has V2 memory stream now, it is a good time to 
> deduplicate two suites into one, via having base class and let these suites 
> override necessary things.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29642) ContinuousMemoryStream throws error on String type

2019-11-06 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29642:
--

Assignee: Jungtaek Lim

> ContinuousMemoryStream throws error on String type
> --
>
> Key: SPARK-29642
> URL: https://issues.apache.org/jira/browse/SPARK-29642
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> While we can set String as a generic type of ContinuousMemoryStream, it 
> doesn't work really because it doesn't convert String to UTFString and 
> accessing it from Row interface would throw error.
> We should encode the input and convert the input to Row properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29642) ContinuousMemoryStream throws error on String type

2019-11-06 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29642.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26300
[https://github.com/apache/spark/pull/26300]

> ContinuousMemoryStream throws error on String type
> --
>
> Key: SPARK-29642
> URL: https://issues.apache.org/jira/browse/SPARK-29642
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> While we can set String as a generic type of ContinuousMemoryStream, it 
> doesn't work really because it doesn't convert String to UTFString and 
> accessing it from Row interface would throw error.
> We should encode the input and convert the input to Row properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29603) Support application priority for spark on yarn

2019-11-06 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29603:
--

Assignee: Kent Yao

> Support application priority for spark on yarn
> --
>
> Key: SPARK-29603
> URL: https://issues.apache.org/jira/browse/SPARK-29603
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> We can set priority to an application for YARN to define pending applications 
> ordering policy, those with higher priority have a better opportunity to be 
> activated. YARN CapacityScheduler only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29603) Support application priority for spark on yarn

2019-11-06 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29603.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26255
[https://github.com/apache/spark/pull/26255]

> Support application priority for spark on yarn
> --
>
> Key: SPARK-29603
> URL: https://issues.apache.org/jira/browse/SPARK-29603
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> We can set priority to an application for YARN to define pending applications 
> ordering policy, those with higher priority have a better opportunity to be 
> activated. YARN CapacityScheduler only.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29766) Aggregate metrics asynchronously in SQL listener

2019-11-05 Thread Marcelo Masiero Vanzin (Jira)
Marcelo Masiero Vanzin created SPARK-29766:
--

 Summary: Aggregate metrics asynchronously in SQL listener
 Key: SPARK-29766
 URL: https://issues.apache.org/jira/browse/SPARK-29766
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Marcelo Masiero Vanzin


This is a follow up to SPARK-29562.

That change made metrics collection faster, and also sped up metrics 
aggregation. But it is still too slow to execute in an event handler, so we 
should do it asynchronously to minimize events being dropped by the listener 
bus.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29763) Stage UI Page not showing all accumulators in Task Table

2019-11-05 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29763.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26402
[https://github.com/apache/spark/pull/26402]

> Stage UI Page not showing all accumulators in Task Table
> 
>
> Key: SPARK-29763
> URL: https://issues.apache.org/jira/browse/SPARK-29763
> Project: Spark
>  Issue Type: Story
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.0.0
>
>
> In the Stage specific ui page, the Task table doesn't properly show all 
> accumulators. Its only showing the last one.
> We need to fix the javascript to show all of them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20568) Delete files after processing in structured streaming

2019-11-04 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-20568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-20568.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 22952
[https://github.com/apache/spark/pull/22952]

> Delete files after processing in structured streaming
> -
>
> Key: SPARK-20568
> URL: https://issues.apache.org/jira/browse/SPARK-20568
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 2.1.0, 2.2.1
>Reporter: Saul Shanabrook
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> It would be great to be able to delete files after processing them with 
> structured streaming.
> For example, I am reading in a bunch of JSON files and converting them into 
> Parquet. If the JSON files are not deleted after they are processed, it 
> quickly fills up my hard drive. I originally [posted this on Stack 
> Overflow|http://stackoverflow.com/q/43671757/907060] and was recommended to 
> make a feature request for it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20568) Delete files after processing in structured streaming

2019-11-04 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-20568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-20568:
--

Assignee: Jungtaek Lim

> Delete files after processing in structured streaming
> -
>
> Key: SPARK-20568
> URL: https://issues.apache.org/jira/browse/SPARK-20568
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 2.1.0, 2.2.1
>Reporter: Saul Shanabrook
>Assignee: Jungtaek Lim
>Priority: Major
>
> It would be great to be able to delete files after processing them with 
> structured streaming.
> For example, I am reading in a bunch of JSON files and converting them into 
> Parquet. If the JSON files are not deleted after they are processed, it 
> quickly fills up my hard drive. I originally [posted this on Stack 
> Overflow|http://stackoverflow.com/q/43671757/907060] and was recommended to 
> make a feature request for it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29397) Create new plugin interface for driver and executor plugins

2019-11-04 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29397.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26170
[https://github.com/apache/spark/pull/26170]

> Create new plugin interface for driver and executor plugins
> ---
>
> Key: SPARK-29397
> URL: https://issues.apache.org/jira/browse/SPARK-29397
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> This task covers the work of adding a new interface for Spark plugins, 
> covering both driver and executor side component.
> See parent bug for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29397) Create new plugin interface for driver and executor plugins

2019-11-04 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29397:
--

Assignee: Marcelo Masiero Vanzin

> Create new plugin interface for driver and executor plugins
> ---
>
> Key: SPARK-29397
> URL: https://issues.apache.org/jira/browse/SPARK-29397
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Major
>
> This task covers the work of adding a new interface for Spark plugins, 
> covering both driver and executor side component.
> See parent bug for details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29637) SHS Endpoint /applications//jobs/ doesn't include description

2019-10-29 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29637.

Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26295
[https://github.com/apache/spark/pull/26295]

> SHS Endpoint /applications//jobs/ doesn't include description
> -
>
> Key: SPARK-29637
> URL: https://issues.apache.org/jira/browse/SPARK-29637
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.4, 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Minor
> Fix For: 2.4.5, 3.0.0
>
>
> Starting from Spark 2.3, the SHS REST API endpoint 
> /applications//jobs/  is not including description in the JobData 
> returned. This is not the case until Spark 2.2.
> Steps to reproduce:
>  * Open spark-shell
> {code:java}
> scala> sc.setJobGroup("test", "job", false); 
> scala> val foo = sc.textFile("/user/foo.txt");
> foo: org.apache.spark.rdd.RDD[String] = /user/foo.txt MapPartitionsRDD[1] at 
> textFile at :24
> scala> foo.foreach(println);
> {code}
>  * Access end REST API 
> [http://SHS-host:port/api/v1/applications/|http://shs-host:port/api/v1/applications/]/jobs/
>  * REST API of Spark 2.3 and above will not return description



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29637) SHS Endpoint /applications//jobs/ doesn't include description

2019-10-29 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29637:
--

Assignee: Gabor Somogyi

> SHS Endpoint /applications//jobs/ doesn't include description
> -
>
> Key: SPARK-29637
> URL: https://issues.apache.org/jira/browse/SPARK-29637
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4, 2.4.4, 3.0.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Minor
>
> Starting from Spark 2.3, the SHS REST API endpoint 
> /applications//jobs/  is not including description in the JobData 
> returned. This is not the case until Spark 2.2.
> Steps to reproduce:
>  * Open spark-shell
> {code:java}
> scala> sc.setJobGroup("test", "job", false); 
> scala> val foo = sc.textFile("/user/foo.txt");
> foo: org.apache.spark.rdd.RDD[String] = /user/foo.txt MapPartitionsRDD[1] at 
> textFile at :24
> scala> foo.foreach(println);
> {code}
>  * Access end REST API 
> [http://SHS-host:port/api/v1/applications/|http://shs-host:port/api/v1/applications/]/jobs/
>  * REST API of Spark 2.3 and above will not return description



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29509) Deduplicate code blocks in Kafka data source

2019-10-28 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29509.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26158
[https://github.com/apache/spark/pull/26158]

> Deduplicate code blocks in Kafka data source
> 
>
> Key: SPARK-29509
> URL: https://issues.apache.org/jira/browse/SPARK-29509
> Project: Spark
>  Issue Type: Task
>  Components: SQL, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> There're bunch of methods in Kafka data source which have repeated lines in a 
> method - especially they're tied to the number of fields in writer schema, so 
> once we add a new field redundant code lines will be increased. This issue 
> tracks the efforts to deduplicate them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29509) Deduplicate code blocks in Kafka data source

2019-10-28 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29509:
--

Assignee: Jungtaek Lim

> Deduplicate code blocks in Kafka data source
> 
>
> Key: SPARK-29509
> URL: https://issues.apache.org/jira/browse/SPARK-29509
> Project: Spark
>  Issue Type: Task
>  Components: SQL, Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
>
> There're bunch of methods in Kafka data source which have repeated lines in a 
> method - especially they're tied to the number of fields in writer schema, so 
> once we add a new field redundant code lines will be increased. This issue 
> tracks the efforts to deduplicate them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29472) Mechanism for Excluding Jars at Launch for YARN

2019-10-24 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29472.

Resolution: Won't Fix

> Mechanism for Excluding Jars at Launch for YARN
> ---
>
> Key: SPARK-29472
> URL: https://issues.apache.org/jira/browse/SPARK-29472
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>Affects Versions: 2.4.4
>Reporter: Abhishek Modi
>Priority: Minor
>
> *Summary*
> It would be convenient if there were an easy way to exclude jars from Spark’s 
> classpath at launch time. This would complement the way in which jars can be 
> added to the classpath using {{extraClassPath}}.
>  
> *Context*
> The Spark build contains its dependency jars in the {{/jars}} directory. 
> These jars become part of the executor’s classpath. By default on YARN, these 
> jars are packaged and distributed to containers at launch ({{spark-submit}}) 
> time.
>  
> While developing Spark applications, customers sometimes need to debug using 
> different versions of dependencies. This can become difficult if the 
> dependency (eg. Parquet 1.11.0) is one that Spark already has in {{/jars}} 
> (eg. Parquet 1.10.1 in Spark 2.4), as the dependency included with Spark is 
> preferentially loaded. 
>  
> Configurations such as {{userClassPathFirst}} are available. However these 
> have often come with other side effects. For example, if the customer’s build 
> includes Avro they will likely see {{Caused by: java.lang.LinkageError: 
> loader constraint violation: when resolving method 
> "org.apache.spark.SparkConf.registerAvroSchemas(Lscala/collection/Seq;)Lorg/apache/spark/SparkConf;"
>  the class loader (instance of 
> org/apache/spark/util/ChildFirstURLClassLoader) of the current class, 
> com/uber/marmaray/common/spark/SparkFactory, and the class loader (instance 
> of sun/misc/Launcher$AppClassLoader) for the method's defining class, 
> org/apache/spark/SparkConf, have different Class objects for the type 
> scala/collection/Seq used in the signature}}. Resolving such issues often 
> takes many hours.
>  
> To deal with these sorts of issues, customers often download the Spark build, 
> remove the target jars and then do spark-submit. Other times, customers may 
> not be able to do spark-submit as it is gated behind some Spark Job Server. 
> In this case, customers may try downloading the build, removing the jars, and 
> then using configurations such as {{spark.yarn.dist.jars}} or 
> {{spark.yarn.dist.archives}}. Both of these options are undesirable as they 
> are very operationally heavy, error prone and often result in the customer’s 
> spark builds going out of sync with the authoritative build. 
>  
> *Solution*
> I’d like to propose adding a {{spark.yarn.jars.exclusionRegex}} 
> configuration. Customers could provide a regex such as {{.\*parquet.\*}} and 
> jar files matching this regex would not be included in the driver and 
> executor classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29472) Mechanism for Excluding Jars at Launch for YARN

2019-10-24 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959179#comment-16959179
 ] 

Marcelo Masiero Vanzin commented on SPARK-29472:


bq. customers sometimes need to debug using different versions of dependencies

That's trivial to do with Spark-on-YARN.

{code}
spark-submit --deploy-mode cluster \
  --files /path/to/my-custom-parquet.jar \
  --conf spark.driver.extraClassPath=my-custom-parquet.jar \
  --conf spark.executor.extraClassPath=my-custom-parquet.jar
{code}

Or in client mode:

{code}
spark-submit --deploy-mode cluster \
  --files /path/to/my-custom-parquet.jar \
  --conf spark.driver.extraClassPath=/path/to/my-custom-parquet.jar \
  --conf spark.executor.extraClassPath=my-custom-parquet.jar
{code}

Done. No need for a new option, no need to change Spark's install directory, no 
need for {{userClassPathFirst}} or anything. I don't see the point of adding 
the new option - it's confusing, easy to break things, and doesn't completely 
solve the problem by itself, since you still have to upload the new jar and add 
it to the class path with other existing options.

> Mechanism for Excluding Jars at Launch for YARN
> ---
>
> Key: SPARK-29472
> URL: https://issues.apache.org/jira/browse/SPARK-29472
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>Affects Versions: 2.4.4
>Reporter: Abhishek Modi
>Priority: Minor
>
> *Summary*
> It would be convenient if there were an easy way to exclude jars from Spark’s 
> classpath at launch time. This would complement the way in which jars can be 
> added to the classpath using {{extraClassPath}}.
>  
> *Context*
> The Spark build contains its dependency jars in the {{/jars}} directory. 
> These jars become part of the executor’s classpath. By default on YARN, these 
> jars are packaged and distributed to containers at launch ({{spark-submit}}) 
> time.
>  
> While developing Spark applications, customers sometimes need to debug using 
> different versions of dependencies. This can become difficult if the 
> dependency (eg. Parquet 1.11.0) is one that Spark already has in {{/jars}} 
> (eg. Parquet 1.10.1 in Spark 2.4), as the dependency included with Spark is 
> preferentially loaded. 
>  
> Configurations such as {{userClassPathFirst}} are available. However these 
> have often come with other side effects. For example, if the customer’s build 
> includes Avro they will likely see {{Caused by: java.lang.LinkageError: 
> loader constraint violation: when resolving method 
> "org.apache.spark.SparkConf.registerAvroSchemas(Lscala/collection/Seq;)Lorg/apache/spark/SparkConf;"
>  the class loader (instance of 
> org/apache/spark/util/ChildFirstURLClassLoader) of the current class, 
> com/uber/marmaray/common/spark/SparkFactory, and the class loader (instance 
> of sun/misc/Launcher$AppClassLoader) for the method's defining class, 
> org/apache/spark/SparkConf, have different Class objects for the type 
> scala/collection/Seq used in the signature}}. Resolving such issues often 
> takes many hours.
>  
> To deal with these sorts of issues, customers often download the Spark build, 
> remove the target jars and then do spark-submit. Other times, customers may 
> not be able to do spark-submit as it is gated behind some Spark Job Server. 
> In this case, customers may try downloading the build, removing the jars, and 
> then using configurations such as {{spark.yarn.dist.jars}} or 
> {{spark.yarn.dist.archives}}. Both of these options are undesirable as they 
> are very operationally heavy, error prone and often result in the customer’s 
> spark builds going out of sync with the authoritative build. 
>  
> *Solution*
> I’d like to propose adding a {{spark.yarn.jars.exclusionRegex}} 
> configuration. Customers could provide a regex such as {{.\*parquet.\*}} and 
> jar files matching this regex would not be included in the driver and 
> executor classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29593) Enhance Cluster Managers to be Pluggable

2019-10-24 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959078#comment-16959078
 ] 

Marcelo Masiero Vanzin commented on SPARK-29593:


bq. Is there any plans to make it more public?

Not really. The hope is that someone who really needs that will do it.

> Enhance Cluster Managers to be Pluggable
> 
>
> Key: SPARK-29593
> URL: https://issues.apache.org/jira/browse/SPARK-29593
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler
>Affects Versions: 2.4.4
>Reporter: Kevin Doyle
>Priority: Major
>
> Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
> Kubernetes forked the code to build it and then bring it into Spark. Lots of 
> work is still going on with the Kubernetes cluster manager. It should be able 
> to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
> This will also benefit enterprise companies that have their own cluster 
> managers that aren't open source, so can't be part of Spark itself.
> High level idea to be discussed for additional options:
>  1. Make the cluster manager pluggable.
>  2. Have the Spark Standalone cluster manager ship with Spark by default and 
> be the base cluster manager others can inherit from. Others can be shipped or 
> not shipped at same time.
>  3. Each Cluster Manager can ship additional jars that can be placed inside 
> Spark, then with a configuration file define the cluster manager Spark runs 
> with. 
>  4. The configuration file can define which classes to use for the various 
> parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
> different one.
>  5. Based on the classes that are allowed to be switched out in the Spark 
> code we can use code like the following to load a different class.
> –+val+ +clazz+ = Class.forName("* from configuration file*")
>  +val+ cons = clazz.getConstructor(classOf[SparkContext])
>  cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29593) Enhance Cluster Managers to be Pluggable

2019-10-24 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959063#comment-16959063
 ] 

Marcelo Masiero Vanzin commented on SPARK-29593:


This already exists: {{org.apache.spark.scheduler.ExternalClusterManager}}.

It's just not a proper public API; you can still use it by having your 
implementation be in in {{org.apache.spark}} namespace (other classes can be 
elsewhere).

> Enhance Cluster Managers to be Pluggable
> 
>
> Key: SPARK-29593
> URL: https://issues.apache.org/jira/browse/SPARK-29593
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler
>Affects Versions: 2.4.4
>Reporter: Kevin Doyle
>Priority: Major
>
> Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
> Kubernetes forked the code to build it and then bring it into Spark. Lots of 
> work is still going on with the Kubernetes cluster manager. It should be able 
> to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
> This will also benefit enterprise companies that have their own cluster 
> managers that aren't open source, so can't be part of Spark itself.
> High level idea to be discussed for additional options:
>  1. Make the cluster manager pluggable.
>  2. Have the Spark Standalone cluster manager ship with Spark by default and 
> be the base cluster manager others can inherit from. Others can be shipped or 
> not shipped at same time.
>  3. Each Cluster Manager can ship additional jars that can be placed inside 
> Spark, then with a configuration file define the cluster manager Spark runs 
> with. 
>  4. The configuration file can define which classes to use for the various 
> parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
> different one.
>  5. Based on the classes that are allowed to be switched out in the Spark 
> code we can use code like the following to load a different class.
> –+val+ +clazz+ = Class.forName("* from configuration file*")
>  +val+ cons = clazz.getConstructor(classOf[SparkContext])
>  cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]
> Proposal discussed at Spark + AI Summit Europe 2019: 
> [https://databricks.com/session_eu19/refactoring-apache-spark-to-allow-additional-cluster-managers]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >