[jira] [Resolved] (SPARK-12081) Make unified memory management work with small heaps

2015-12-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-12081.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

> Make unified memory management work with small heaps
> 
>
> Key: SPARK-12081
> URL: https://issues.apache.org/jira/browse/SPARK-12081
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.6.0
>
>
> By default, Spark drivers and executors are 1GB. With the recent unified 
> memory mode, only 250MB is set aside for non-storage non-execution purposes 
> (spark.memory.fraction is 75%). However, especially in local mode, the driver 
> needs at least ~300MB. Some local jobs started to OOM because of this.
> Two mutually exclusive proposals:
> (1) First, cut out 300 MB, then take 75% of what remains
> (2) Use min(75% of JVM heap size, JVM heap size - 300MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12081) Make unified memory management work with small heaps

2015-12-01 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15035219#comment-15035219
 ] 

Andrew Or commented on SPARK-12081:
---

The patch took approach (1)

> Make unified memory management work with small heaps
> 
>
> Key: SPARK-12081
> URL: https://issues.apache.org/jira/browse/SPARK-12081
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.6.0
>
>
> By default, Spark drivers and executors are 1GB. With the recent unified 
> memory mode, only 250MB is set aside for non-storage non-execution purposes 
> (spark.memory.fraction is 75%). However, especially in local mode, the driver 
> needs at least ~300MB. Some local jobs started to OOM because of this.
> Two mutually exclusive proposals:
> (1) First, cut out 300 MB, then take 75% of what remains
> (2) Use min(75% of JVM heap size, JVM heap size - 300MB)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5106) Make Web UI automatically refresh/update displayed data

2015-12-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-5106.
--
Resolution: Won't Fix

> Make Web UI automatically refresh/update displayed data
> ---
>
> Key: SPARK-5106
> URL: https://issues.apache.org/jira/browse/SPARK-5106
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Ryan Williams
>
> My (and presumably others') experience monitoring Spark jobs currently 
> consists of repeatedly ⌘R'ing various pages of the web UI to view 
> ever-fresher data about how many tasks have succeeded / failed, how much 
> spillage is happening, etc., which is tedious.
> Particularly unfortunate is the "one refresh over the line" problem where, 
> just as things are getting interesting, the job itself fails or finishes, and 
> after refreshing the page all data disappears.
> It would be good if the web UI updated the data it was displaying 
> automatically.
> One hacky way to achieve this would be to have it automatically refresh the 
> page, though this still risks losing everything when the job finishes.
> A better long-term solution would be to have the UI poll for (or have pushed 
> to it) updates to the data it is displaying.
> Either way, some way to toggle this functionality on or off is probably 
> warranted as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous

2015-12-01 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034405#comment-15034405
 ] 

Andrew Or commented on SPARK-12062:
---

Great, I've assigned it to you

> Master rebuilding historical SparkUI should be asynchronous
> ---
>
> Key: SPARK-12062
> URL: https://issues.apache.org/jira/browse/SPARK-12062
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Bryan Cutler
>
> When a long-running application finishes, it takes a while (sometimes 
> minutes) to rebuild the SparkUI. However, in Master.scala this is currently 
> done within the RPC event loop, which runs only in 1 thread. Thus, in the 
> mean time no other applications can register with this master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous

2015-12-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12062:
--
Assignee: Bryan Cutler

> Master rebuilding historical SparkUI should be asynchronous
> ---
>
> Key: SPARK-12062
> URL: https://issues.apache.org/jira/browse/SPARK-12062
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Bryan Cutler
>
> When a long-running application finishes, it takes a while (sometimes 
> minutes) to rebuild the SparkUI. However, in Master.scala this is currently 
> done within the RPC event loop, which runs only in 1 thread. Thus, in the 
> mean time no other applications can register with this master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8414) Ensure ContextCleaner actually triggers clean ups

2015-12-01 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034447#comment-15034447
 ] 

Andrew Or commented on SPARK-8414:
--

I'll submit a patch today.

> Ensure ContextCleaner actually triggers clean ups
> -
>
> Key: SPARK-8414
> URL: https://issues.apache.org/jira/browse/SPARK-8414
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
>
> Right now it cleans up old references only through natural GCs, which may not 
> occur if the driver has infinite RAM. We should do a periodic GC to make sure 
> that we actually do clean things up. Something like once per 30 minutes seems 
> relatively inexpensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8414) Ensure ContextCleaner actually triggers clean ups

2015-12-01 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8414:
-
Target Version/s: 1.6.0

> Ensure ContextCleaner actually triggers clean ups
> -
>
> Key: SPARK-8414
> URL: https://issues.apache.org/jira/browse/SPARK-8414
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
>
> Right now it cleans up old references only through natural GCs, which may not 
> occur if the driver has infinite RAM. We should do a periodic GC to make sure 
> that we actually do clean things up. Something like once per 30 minutes seems 
> relatively inexpensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12059) Standalone Master assertion error

2015-11-30 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12059:
--
Description: 
{code}
15/11/30 09:55:04 ERROR Inbox: Ignoring error
java.lang.AssertionError: assertion failed: executor 4 state transfer from 
RUNNING to RUNNING is illegal
at scala.Predef$.assert(Predef.scala:179)
at 
org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
{code}

  was:
{code}
java.lang.AssertionError: assertion failed: executor 4 state transfer from 
RUNNING to RUNNING is illegal
at scala.Predef$.assert(Predef.scala:179)
at 
org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
{code}


> Standalone Master assertion error
> -
>
> Key: SPARK-12059
> URL: https://issues.apache.org/jira/browse/SPARK-12059
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Saisai Shao
>Priority: Critical
>
> {code}
> 15/11/30 09:55:04 ERROR Inbox: Ignoring error
> java.lang.AssertionError: assertion failed: executor 4 state transfer from 
> RUNNING to RUNNING is illegal
> at scala.Predef$.assert(Predef.scala:179)
> at 
> org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
> at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
> at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
> at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12037) Executors use heartbeatReceiverRef to report heartbeats and task metrics that might not be initialized and leads to NullPointerException

2015-11-30 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-12037.
---
  Resolution: Fixed
Assignee: Nan Zhu
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Executors use heartbeatReceiverRef to report heartbeats and task metrics that 
> might not be initialized and leads to NullPointerException
> 
>
> Key: SPARK-12037
> URL: https://issues.apache.org/jira/browse/SPARK-12037
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
> Environment: The latest sources at revision {{c793d2d}}
>Reporter: Jacek Laskowski
>Assignee: Nan Zhu
> Fix For: 1.6.0
>
>
> When {{Executor}} starts it starts driver heartbeater (using 
> {{startDriverHeartbeater()}}) that uses {{heartbeatReceiverRef}} that is 
> initialized later and there is a possibility of NullPointerException (after 
> {{spark.executor.heartbeatInterval}} or {{10s}}).
> {code}
> WARN Executor: Issue communicating with driver in heartbeater
> java.lang.NullPointerException
>   at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:447)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:467)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:467)
>   at 
> org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:467)
>   at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1717)
>   at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:467)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12035) Add more debug information in include_example tag of Jekyll

2015-11-30 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-12035.
---
  Resolution: Fixed
Assignee: Xusen Yin
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Add more debug information in include_example tag of Jekyll
> ---
>
> Key: SPARK-12035
> URL: https://issues.apache.org/jira/browse/SPARK-12035
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Documentation
>Reporter: Xusen Yin
>Assignee: Xusen Yin
>Priority: Minor
>  Labels: documentation
> Fix For: 1.6.0
>
>
> Add more debug information in the include_example tag of Jekyll, so that we 
> can know more when facing with errors of `jekyll build`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12007) Network library's RPC layer requires a lot of copying

2015-11-30 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-12007.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

> Network library's RPC layer requires a lot of copying
> -
>
> Key: SPARK-12007
> URL: https://issues.apache.org/jira/browse/SPARK-12007
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 1.6.0
>
>
> The network library's RPC layer has an external API based on byte arrays, 
> instead of ByteBuffer; that requires a lot of copying since the internals of 
> the library use ByteBuffers (or rather Netty's ByteBuf), and lots of external 
> clients also use ByteBuffer.
> The extra copies could be avoided if the API used ByteBuffer instead.
> To show an extreme case, look at an RPC send via NettyRpcEnv:
> - message is encoded using JavaSerializer, resulting in a ByteBuffer
> - the ByteBuffer is copied into a byte array of the right size, since its 
> internal array may be larger than the actual data it holds
> - the network library's encoder copies the byte array into a ByteBuf
> - finally the data is written to the socket
> The intermediate 2 copies could be avoided if the API allowed the original 
> ByteBuffer to be sent instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12060) Avoid memory copy in JavaSerializerInstance.serialize

2015-11-30 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12060:
--
Assignee: Shixiong Zhu

> Avoid memory copy in JavaSerializerInstance.serialize
> -
>
> Key: SPARK-12060
> URL: https://issues.apache.org/jira/browse/SPARK-12060
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> JavaSerializerInstance.serialize uses ByteArrayOutputStream.toByteArray to 
> get the serialized data. ByteArrayOutputStream.toByteArray needs to copy the 
> content in the internal array to a new array. However, since the array will 
> be converted to ByteBuffer at once, we can avoid the memory copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12007) Network library's RPC layer requires a lot of copying

2015-11-30 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12007:
--
Affects Version/s: 1.6.0
 Target Version/s: 1.6.0

> Network library's RPC layer requires a lot of copying
> -
>
> Key: SPARK-12007
> URL: https://issues.apache.org/jira/browse/SPARK-12007
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>
> The network library's RPC layer has an external API based on byte arrays, 
> instead of ByteBuffer; that requires a lot of copying since the internals of 
> the library use ByteBuffers (or rather Netty's ByteBuf), and lots of external 
> clients also use ByteBuffer.
> The extra copies could be avoided if the API used ByteBuffer instead.
> To show an extreme case, look at an RPC send via NettyRpcEnv:
> - message is encoded using JavaSerializer, resulting in a ByteBuffer
> - the ByteBuffer is copied into a byte array of the right size, since its 
> internal array may be larger than the actual data it holds
> - the network library's encoder copies the byte array into a ByteBuf
> - finally the data is written to the socket
> The intermediate 2 copies could be avoided if the API allowed the original 
> ByteBuffer to be sent instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12007) Network library's RPC layer requires a lot of copying

2015-11-30 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12007:
--
Assignee: Marcelo Vanzin

> Network library's RPC layer requires a lot of copying
> -
>
> Key: SPARK-12007
> URL: https://issues.apache.org/jira/browse/SPARK-12007
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>
> The network library's RPC layer has an external API based on byte arrays, 
> instead of ByteBuffer; that requires a lot of copying since the internals of 
> the library use ByteBuffers (or rather Netty's ByteBuf), and lots of external 
> clients also use ByteBuffer.
> The extra copies could be avoided if the API used ByteBuffer instead.
> To show an extreme case, look at an RPC send via NettyRpcEnv:
> - message is encoded using JavaSerializer, resulting in a ByteBuffer
> - the ByteBuffer is copied into a byte array of the right size, since its 
> internal array may be larger than the actual data it holds
> - the network library's encoder copies the byte array into a ByteBuf
> - finally the data is written to the socket
> The intermediate 2 copies could be avoided if the API allowed the original 
> ByteBuffer to be sent instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous

2015-11-30 Thread Andrew Or (JIRA)
Andrew Or created SPARK-12062:
-

 Summary: Master rebuilding historical SparkUI should be 
asynchronous
 Key: SPARK-12062
 URL: https://issues.apache.org/jira/browse/SPARK-12062
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.0.0
Reporter: Andrew Or


When a long-running application finishes, it takes a while (sometimes minutes) 
to rebuild the SparkUI. However, in Master.scala this is currently done within 
the RPC event loop, which runs only in 1 thread. Thus, in the mean time no 
other applications can register with this master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12060) Avoid memory copy in JavaSerializerInstance.serialize

2015-11-30 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-12060:
--
Target Version/s: 1.6.0
 Component/s: Spark Core

> Avoid memory copy in JavaSerializerInstance.serialize
> -
>
> Key: SPARK-12060
> URL: https://issues.apache.org/jira/browse/SPARK-12060
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Shixiong Zhu
>
> JavaSerializerInstance.serialize uses ByteArrayOutputStream.toByteArray to 
> get the serialized data. ByteArrayOutputStream.toByteArray needs to copy the 
> content in the internal array to a new array. However, since the array will 
> be converted to ByteBuffer at once, we can avoid the memory copy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12059) Standalone Master assertion error

2015-11-30 Thread Andrew Or (JIRA)
Andrew Or created SPARK-12059:
-

 Summary: Standalone Master assertion error
 Key: SPARK-12059
 URL: https://issues.apache.org/jira/browse/SPARK-12059
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 1.6.0
Reporter: Andrew Or
Assignee: Saisai Shao
Priority: Critical


{code}
java.lang.AssertionError: assertion failed: executor 4 state transfer from 
RUNNING to RUNNING is illegal
at scala.Predef$.assert(Predef.scala:179)
at 
org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
at 
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
at 
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11999) ThreadUtils.newDaemonCachedThreadPool(prefix, maxThreadNumber) has unexpected behavior

2015-11-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11999:
--
Assignee: Shixiong Zhu

> ThreadUtils.newDaemonCachedThreadPool(prefix, maxThreadNumber)  has 
> unexpected behavior
> ---
>
> Key: SPARK-11999
> URL: https://issues.apache.org/jira/browse/SPARK-11999
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.1, 1.5.2, 1.6.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>
> Currently, ThreadUtils.newDaemonCachedThreadPool(prefix, maxThreadNumber) 
> will throw RejectedExecutionException, if there are already `maxThreadNumber` 
> busy threads and we submit a new task. It's because `SynchronousQueue` cannot 
> cache any task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10864) SparkUI: app name is hidden if window is resized

2015-11-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-10864.
---
  Resolution: Fixed
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> SparkUI: app name is hidden if window is resized
> 
>
> Key: SPARK-10864
> URL: https://issues.apache.org/jira/browse/SPARK-10864
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Andrew Or
>Priority: Minor
> Fix For: 1.6.0
>
> Attachments: Screen Shot 2015-09-28 at 5.44.06 PM.png
>
>
> See screenshot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10558) Wrong executor state in standalone master because of wrong state transition

2015-11-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-10558.
---
  Resolution: Fixed
Assignee: Saisai Shao
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Wrong executor state in standalone master because of wrong state transition
> ---
>
> Key: SPARK-10558
> URL: https://issues.apache.org/jira/browse/SPARK-10558
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
> Fix For: 1.6.0
>
>
> Because of concurrency issue in executor state transition, the executor state 
> saved in standalone may possibly be {{LOADING}} rather than {{RUNNING}}. This 
> is because of {{RUNNING}} state is delivered earlier than {{LOADING}}.
> We have to guarantee the correct state changing, like: LAUNCHING -> LOADING 
> -> RUNNING.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11880) On Windows spark-env.cmd is not loaded.

2015-11-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11880:
--
Assignee: tawan

> On Windows spark-env.cmd is not loaded.
> ---
>
> Key: SPARK-11880
> URL: https://issues.apache.org/jira/browse/SPARK-11880
> Project: Spark
>  Issue Type: Bug
>  Components: Windows
> Environment: Windows
>Reporter: Gaurav Sehgal
>Assignee: tawan
>Priority: Trivial
> Fix For: 1.6.0
>
>
> On windows the bin/load-spark-env.cmd tries to load file from 
> %~dp0..\..\conf. Where ~dp0 points to bin and conf is only one level up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11880) On Windows spark-env.cmd is not loaded.

2015-11-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11880.
---
  Resolution: Fixed
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> On Windows spark-env.cmd is not loaded.
> ---
>
> Key: SPARK-11880
> URL: https://issues.apache.org/jira/browse/SPARK-11880
> Project: Spark
>  Issue Type: Bug
>  Components: Windows
> Environment: Windows
>Reporter: Gaurav Sehgal
>Priority: Trivial
> Fix For: 1.6.0
>
>
> On windows the bin/load-spark-env.cmd tries to load file from 
> %~dp0..\..\conf. Where ~dp0 points to bin and conf is only one level up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10864) SparkUI: app name is hidden if window is resized

2015-11-25 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-10864:
--
Assignee: Alexander Bozarth

> SparkUI: app name is hidden if window is resized
> 
>
> Key: SPARK-10864
> URL: https://issues.apache.org/jira/browse/SPARK-10864
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Andrew Or
>Assignee: Alexander Bozarth
>Priority: Minor
> Fix For: 1.6.0
>
> Attachments: Screen Shot 2015-09-28 at 5.44.06 PM.png
>
>
> See screenshot



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11866) RpcEnv RPC timeouts can lead to errors, leak in transport library.

2015-11-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11866:
--
Priority: Major  (was: Minor)

> RpcEnv RPC timeouts can lead to errors, leak in transport library.
> --
>
> Key: SPARK-11866
> URL: https://issues.apache.org/jira/browse/SPARK-11866
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>
> The {{RpcEnv}} code in spark-core has its own timeout handling capabilities, 
> which can clash with the transport library's timeout handling in two ways 
> when replies to an RPC message are never sent.
> - if the channel has been idle for a while, the transport library will close 
> the channel because it may think it's hung; this could cause other errors 
> since the {{RpcEnv}}-based code might not expect those channels to be closed.
> - if the reply never arrives and the channel is not idle, there's state kept 
> in the network library that will never be cleaned up. the {{RpcEnv}}-level 
> timeout code should clean up that state since it's not interested in that RPC 
> anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11866) RpcEnv RPC timeouts can lead to errors, leak in transport library.

2015-11-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11866:
--
Target Version/s: 1.6.0

> RpcEnv RPC timeouts can lead to errors, leak in transport library.
> --
>
> Key: SPARK-11866
> URL: https://issues.apache.org/jira/browse/SPARK-11866
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>
> The {{RpcEnv}} code in spark-core has its own timeout handling capabilities, 
> which can clash with the transport library's timeout handling in two ways 
> when replies to an RPC message are never sent.
> - if the channel has been idle for a while, the transport library will close 
> the channel because it may think it's hung; this could cause other errors 
> since the {{RpcEnv}}-based code might not expect those channels to be closed.
> - if the reply never arrives and the channel is not idle, there's state kept 
> in the network library that will never be cleaned up. the {{RpcEnv}}-level 
> timeout code should clean up that state since it's not interested in that RPC 
> anymore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11831) AkkaRpcEnvSuite is prone to port-contention-related flakiness

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11831:
--
Target Version/s: 1.5.3, 1.6.0

> AkkaRpcEnvSuite is prone to port-contention-related flakiness
> -
>
> Key: SPARK-11831
> URL: https://issues.apache.org/jira/browse/SPARK-11831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Shixiong Zhu
>  Labels: flaky-test
>
> The AkkaRpcEnvSuite tests appear to be prone to port-contention-related 
> flakiness in Jenkins:
> {code}
> Error Message
> Failed to bind to: localhost/127.0.0.1:12362: Service 'test' failed after 16 
> retries!
> Stacktrace
>   java.net.BindException: Failed to bind to: localhost/127.0.0.1:12362: 
> Service 'test' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-pre-YARN/4819/HADOOP_VERSION=1.2.1,label=spark-test/testReport/junit/org.apache.spark.rpc.akka/AkkaRpcEnvSuite/uriOf__ssl/
> We should probably refactor these tests to not depend on a fixed port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11831) AkkaRpcEnvSuite is prone to port-contention-related flakiness

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11831:
--
Labels: flaky-test  (was: )

> AkkaRpcEnvSuite is prone to port-contention-related flakiness
> -
>
> Key: SPARK-11831
> URL: https://issues.apache.org/jira/browse/SPARK-11831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Josh Rosen
>  Labels: flaky-test
>
> The AkkaRpcEnvSuite tests appear to be prone to port-contention-related 
> flakiness in Jenkins:
> {code}
> Error Message
> Failed to bind to: localhost/127.0.0.1:12362: Service 'test' failed after 16 
> retries!
> Stacktrace
>   java.net.BindException: Failed to bind to: localhost/127.0.0.1:12362: 
> Service 'test' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-pre-YARN/4819/HADOOP_VERSION=1.2.1,label=spark-test/testReport/junit/org.apache.spark.rpc.akka/AkkaRpcEnvSuite/uriOf__ssl/
> We should probably refactor these tests to not depend on a fixed port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11831) AkkaRpcEnvSuite is prone to port-contention-related flakiness

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11831:
--
Assignee: Shixiong Zhu

> AkkaRpcEnvSuite is prone to port-contention-related flakiness
> -
>
> Key: SPARK-11831
> URL: https://issues.apache.org/jira/browse/SPARK-11831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Shixiong Zhu
>  Labels: flaky-test
>
> The AkkaRpcEnvSuite tests appear to be prone to port-contention-related 
> flakiness in Jenkins:
> {code}
> Error Message
> Failed to bind to: localhost/127.0.0.1:12362: Service 'test' failed after 16 
> retries!
> Stacktrace
>   java.net.BindException: Failed to bind to: localhost/127.0.0.1:12362: 
> Service 'test' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-pre-YARN/4819/HADOOP_VERSION=1.2.1,label=spark-test/testReport/junit/org.apache.spark.rpc.akka/AkkaRpcEnvSuite/uriOf__ssl/
> We should probably refactor these tests to not depend on a fixed port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11831) AkkaRpcEnvSuite is prone to port-contention-related flakiness

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11831:
--
Component/s: Tests

> AkkaRpcEnvSuite is prone to port-contention-related flakiness
> -
>
> Key: SPARK-11831
> URL: https://issues.apache.org/jira/browse/SPARK-11831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Josh Rosen
>
> The AkkaRpcEnvSuite tests appear to be prone to port-contention-related 
> flakiness in Jenkins:
> {code}
> Error Message
> Failed to bind to: localhost/127.0.0.1:12362: Service 'test' failed after 16 
> retries!
> Stacktrace
>   java.net.BindException: Failed to bind to: localhost/127.0.0.1:12362: 
> Service 'test' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-pre-YARN/4819/HADOOP_VERSION=1.2.1,label=spark-test/testReport/junit/org.apache.spark.rpc.akka/AkkaRpcEnvSuite/uriOf__ssl/
> We should probably refactor these tests to not depend on a fixed port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11845) Add unit tests to verify correct checkpointing of TrackStateRDD

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11845.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

> Add unit tests to verify correct checkpointing of TrackStateRDD
> ---
>
> Key: SPARK-11845
> URL: https://issues.apache.org/jira/browse/SPARK-11845
> Project: Spark
>  Issue Type: Test
>  Components: Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11843) Isolate staging directory across applications on same YARN cluster

2015-11-19 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014994#comment-15014994
 ] 

Andrew Or commented on SPARK-11843:
---

Oops, that seems to be the case. I'm closing this.

> Isolate staging directory across applications on same YARN cluster
> --
>
> Key: SPARK-11843
> URL: https://issues.apache.org/jira/browse/SPARK-11843
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Reporter: Andrew Or
>Priority: Minor
>
> If multiple clients share the same YARN cluster and file system they may end 
> up using the same `.sparkStaging` directory. This may be a problem if their 
> jars are called something similar, for instance. It would be easier to 
> enforce isolation for both security and user experience if the staging 
> directories are isolated. We can either:
> (1) allow users to configure the directory name
> (2) add an identifier to the directory name, which I prefer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11843) Isolate staging directory across applications on same YARN cluster

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11843.
---
Resolution: Won't Fix

> Isolate staging directory across applications on same YARN cluster
> --
>
> Key: SPARK-11843
> URL: https://issues.apache.org/jira/browse/SPARK-11843
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Reporter: Andrew Or
>Priority: Minor
>
> If multiple clients share the same YARN cluster and file system they may end 
> up using the same `.sparkStaging` directory. This may be a problem if their 
> jars are called something similar, for instance. It would be easier to 
> enforce isolation for both security and user experience if the staging 
> directories are isolated. We can either:
> (1) allow users to configure the directory name
> (2) add an identifier to the directory name, which I prefer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11831) AkkaRpcEnvSuite is prone to port-contention-related flakiness

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11831:
--
Fix Version/s: 1.5.3

> AkkaRpcEnvSuite is prone to port-contention-related flakiness
> -
>
> Key: SPARK-11831
> URL: https://issues.apache.org/jira/browse/SPARK-11831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Shixiong Zhu
>  Labels: flaky-test
> Fix For: 1.5.3, 1.6.0
>
>
> The AkkaRpcEnvSuite tests appear to be prone to port-contention-related 
> flakiness in Jenkins:
> {code}
> Error Message
> Failed to bind to: localhost/127.0.0.1:12362: Service 'test' failed after 16 
> retries!
> Stacktrace
>   java.net.BindException: Failed to bind to: localhost/127.0.0.1:12362: 
> Service 'test' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-pre-YARN/4819/HADOOP_VERSION=1.2.1,label=spark-test/testReport/junit/org.apache.spark.rpc.akka/AkkaRpcEnvSuite/uriOf__ssl/
> We should probably refactor these tests to not depend on a fixed port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11831) AkkaRpcEnvSuite is prone to port-contention-related flakiness

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11831:
--
Target Version/s: 1.5.3, 1.6.0  (was: 1.6.0)

> AkkaRpcEnvSuite is prone to port-contention-related flakiness
> -
>
> Key: SPARK-11831
> URL: https://issues.apache.org/jira/browse/SPARK-11831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Shixiong Zhu
>  Labels: flaky-test
> Fix For: 1.5.3, 1.6.0
>
>
> The AkkaRpcEnvSuite tests appear to be prone to port-contention-related 
> flakiness in Jenkins:
> {code}
> Error Message
> Failed to bind to: localhost/127.0.0.1:12362: Service 'test' failed after 16 
> retries!
> Stacktrace
>   java.net.BindException: Failed to bind to: localhost/127.0.0.1:12362: 
> Service 'test' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-pre-YARN/4819/HADOOP_VERSION=1.2.1,label=spark-test/testReport/junit/org.apache.spark.rpc.akka/AkkaRpcEnvSuite/uriOf__ssl/
> We should probably refactor these tests to not depend on a fixed port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11278) PageRank fails with unified memory manager

2015-11-19 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014892#comment-15014892
 ] 

Andrew Or commented on SPARK-11278:
---

thanks [~nravi] that's very helpful.

> PageRank fails with unified memory manager
> --
>
> Key: SPARK-11278
> URL: https://issues.apache.org/jira/browse/SPARK-11278
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX, Spark Core
>Affects Versions: 1.5.1
>Reporter: Nishkam Ravi
>Assignee: Andrew Or
>Priority: Critical
> Attachments: executor_log_legacyModeTrue.html, 
> executor_logs_legacyModeFalse.html
>
>
> PageRank (6-nodes, 32GB input) runs very slow and eventually fails with 
> ExecutorLostFailure. Traced it back to the 'unified memory manager' commit 
> from Oct 13th. Took a quick look at the code and couldn't see the problem 
> (changes look pretty good). cc'ing [~andrewor14][~vanzin] who may be able to 
> spot the problem quickly. Can be reproduced by running PageRank on a large 
> enough input dataset if needed. Sorry for not being of much help here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11746) Use cache-aware method 'dependencies' to instead of 'getDependencies'

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11746:
--
Assignee: SuYan

> Use cache-aware method 'dependencies'  to instead of 'getDependencies'
> --
>
> Key: SPARK-11746
> URL: https://issues.apache.org/jira/browse/SPARK-11746
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: SuYan
>Assignee: SuYan
>Priority: Minor
> Fix For: 1.6.0
>
>
> Use cache-aware method 'dependencies'  to instead of 'getDependencies'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11831) AkkaRpcEnvSuite is prone to port-contention-related flakiness

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11831:
--
Fix Version/s: 1.6.0

> AkkaRpcEnvSuite is prone to port-contention-related flakiness
> -
>
> Key: SPARK-11831
> URL: https://issues.apache.org/jira/browse/SPARK-11831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Shixiong Zhu
>  Labels: flaky-test
> Fix For: 1.6.0
>
>
> The AkkaRpcEnvSuite tests appear to be prone to port-contention-related 
> flakiness in Jenkins:
> {code}
> Error Message
> Failed to bind to: localhost/127.0.0.1:12362: Service 'test' failed after 16 
> retries!
> Stacktrace
>   java.net.BindException: Failed to bind to: localhost/127.0.0.1:12362: 
> Service 'test' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-pre-YARN/4819/HADOOP_VERSION=1.2.1,label=spark-test/testReport/junit/org.apache.spark.rpc.akka/AkkaRpcEnvSuite/uriOf__ssl/
> We should probably refactor these tests to not depend on a fixed port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11828) DAGScheduler source registered too early with MetricsSystem

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11828.
---
  Resolution: Fixed
Assignee: Marcelo Vanzin
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> DAGScheduler source registered too early with MetricsSystem
> ---
>
> Key: SPARK-11828
> URL: https://issues.apache.org/jira/browse/SPARK-11828
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>Priority: Minor
> Fix For: 1.6.0
>
>
> I see this log message when starting apps on YARN:
> {quote}
> 15/11/18 13:12:56 WARN MetricsSystem: Using default name DAGScheduler for 
> source because spark.app.id is not set.
> {quote}
> That's because DAGScheduler registers itself with the metrics system in its 
> constructor, and the DAGScheduler is instantiated before "spark.app.id" is 
> set in the context's SparkConf.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11746) Use checkpoint-aware method 'dependencies' to instead of 'getDependencies'

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11746:
--
Summary: Use checkpoint-aware method 'dependencies'  to instead of 
'getDependencies'  (was: Use cache-aware method 'dependencies'  to instead of 
'getDependencies')

> Use checkpoint-aware method 'dependencies'  to instead of 'getDependencies'
> ---
>
> Key: SPARK-11746
> URL: https://issues.apache.org/jira/browse/SPARK-11746
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: SuYan
>Assignee: SuYan
>Priority: Minor
> Fix For: 1.6.0
>
>
> Use cache-aware method 'dependencies'  to instead of 'getDependencies'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11831) AkkaRpcEnvSuite is prone to port-contention-related flakiness

2015-11-19 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014648#comment-15014648
 ] 

Andrew Or commented on SPARK-11831:
---

do we need to backport this into 1.5?

> AkkaRpcEnvSuite is prone to port-contention-related flakiness
> -
>
> Key: SPARK-11831
> URL: https://issues.apache.org/jira/browse/SPARK-11831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Shixiong Zhu
>  Labels: flaky-test
> Fix For: 1.6.0
>
>
> The AkkaRpcEnvSuite tests appear to be prone to port-contention-related 
> flakiness in Jenkins:
> {code}
> Error Message
> Failed to bind to: localhost/127.0.0.1:12362: Service 'test' failed after 16 
> retries!
> Stacktrace
>   java.net.BindException: Failed to bind to: localhost/127.0.0.1:12362: 
> Service 'test' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-pre-YARN/4819/HADOOP_VERSION=1.2.1,label=spark-test/testReport/junit/org.apache.spark.rpc.akka/AkkaRpcEnvSuite/uriOf__ssl/
> We should probably refactor these tests to not depend on a fixed port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11831) AkkaRpcEnvSuite is prone to port-contention-related flakiness

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11831.
---
Resolution: Fixed

> AkkaRpcEnvSuite is prone to port-contention-related flakiness
> -
>
> Key: SPARK-11831
> URL: https://issues.apache.org/jira/browse/SPARK-11831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Shixiong Zhu
>  Labels: flaky-test
> Fix For: 1.6.0
>
>
> The AkkaRpcEnvSuite tests appear to be prone to port-contention-related 
> flakiness in Jenkins:
> {code}
> Error Message
> Failed to bind to: localhost/127.0.0.1:12362: Service 'test' failed after 16 
> retries!
> Stacktrace
>   java.net.BindException: Failed to bind to: localhost/127.0.0.1:12362: 
> Service 'test' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-pre-YARN/4819/HADOOP_VERSION=1.2.1,label=spark-test/testReport/junit/org.apache.spark.rpc.akka/AkkaRpcEnvSuite/uriOf__ssl/
> We should probably refactor these tests to not depend on a fixed port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11831) AkkaRpcEnvSuite is prone to port-contention-related flakiness

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11831:
--
Target Version/s: 1.6.0  (was: 1.5.3, 1.6.0)

> AkkaRpcEnvSuite is prone to port-contention-related flakiness
> -
>
> Key: SPARK-11831
> URL: https://issues.apache.org/jira/browse/SPARK-11831
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Shixiong Zhu
>  Labels: flaky-test
> Fix For: 1.6.0
>
>
> The AkkaRpcEnvSuite tests appear to be prone to port-contention-related 
> flakiness in Jenkins:
> {code}
> Error Message
> Failed to bind to: localhost/127.0.0.1:12362: Service 'test' failed after 16 
> retries!
> Stacktrace
>   java.net.BindException: Failed to bind to: localhost/127.0.0.1:12362: 
> Service 'test' failed after 16 retries!
>   at 
> org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
>   at 
> akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
>   at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
>   at scala.util.Try$.apply(Try.scala:161)
>   at scala.util.Success.map(Try.scala:206)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
>   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
>   at 
> scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
>   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
>   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
>   at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> {code}
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-pre-YARN/4819/HADOOP_VERSION=1.2.1,label=spark-test/testReport/junit/org.apache.spark.rpc.akka/AkkaRpcEnvSuite/uriOf__ssl/
> We should probably refactor these tests to not depend on a fixed port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11799) Make it explicit in executor logs that uncaught exceptions are thrown during executor shutdown

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11799:
--
Assignee: Srinivasa Reddy Vundela

> Make it explicit in executor logs that uncaught exceptions are thrown during 
> executor shutdown
> --
>
> Key: SPARK-11799
> URL: https://issues.apache.org/jira/browse/SPARK-11799
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Srinivasa Reddy Vundela
>Assignee: Srinivasa Reddy Vundela
>Priority: Minor
>
> Here is some background for the issue.
> Customer got OOM exception in one of the task and executor got killed with 
> kill %p. Few shutdown hooks are registered with ShutDownHookManager to do the 
> hadoop temp directory cleanup. During this shutdown phase other tasks are 
> throwing uncaught exception and executor logs are filled up with so many of 
> them. 
> Since it is unclear for the customer in driver logs/ Spark UI why the 
> container was lost customer is going through the executor logs and he see lot 
> of uncaught exception. 
> It would be clear to the customer if we can prepend the uncaught exceptions 
> with some message like [Container is in shutdown mode] so that he can skip 
> those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11799) Make it explicit in executor logs that uncaught exceptions are thrown during executor shutdown

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11799.
---
  Resolution: Fixed
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Make it explicit in executor logs that uncaught exceptions are thrown during 
> executor shutdown
> --
>
> Key: SPARK-11799
> URL: https://issues.apache.org/jira/browse/SPARK-11799
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Srinivasa Reddy Vundela
>Assignee: Srinivasa Reddy Vundela
>Priority: Minor
> Fix For: 1.6.0
>
>
> Here is some background for the issue.
> Customer got OOM exception in one of the task and executor got killed with 
> kill %p. Few shutdown hooks are registered with ShutDownHookManager to do the 
> hadoop temp directory cleanup. During this shutdown phase other tasks are 
> throwing uncaught exception and executor logs are filled up with so many of 
> them. 
> Since it is unclear for the customer in driver logs/ Spark UI why the 
> container was lost customer is going through the executor logs and he see lot 
> of uncaught exception. 
> It would be clear to the customer if we can prepend the uncaught exceptions 
> with some message like [Container is in shutdown mode] so that he can skip 
> those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11746) Use cache-aware method 'dependencies' to instead of 'getDependencies'

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11746.
---
  Resolution: Fixed
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Use cache-aware method 'dependencies'  to instead of 'getDependencies'
> --
>
> Key: SPARK-11746
> URL: https://issues.apache.org/jira/browse/SPARK-11746
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: SuYan
>Assignee: SuYan
>Priority: Minor
> Fix For: 1.6.0
>
>
> Use cache-aware method 'dependencies'  to instead of 'getDependencies'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4134) Dynamic allocation: tone down scary executor lost messages when killing on purpose

2015-11-19 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-4134:
-
Assignee: Marcelo Vanzin  (was: Andrew Or)

> Dynamic allocation: tone down scary executor lost messages when killing on 
> purpose
> --
>
> Key: SPARK-4134
> URL: https://issues.apache.org/jira/browse/SPARK-4134
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>Assignee: Marcelo Vanzin
>
> After SPARK-3822 goes in, we are now able to dynamically kill executors after 
> an application has started. However, when we do that we get a ton of scary 
> error messages telling us that we've done wrong somehow. It would be good to 
> detect when this is the case and prevent these messages from surfacing.
> This maybe difficult, however, because the connection manager tends to be 
> quite verbose in unconditionally logging disconnection messages. This is a 
> very nice-to-have for 1.2 but certainly not a blocker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11809) Switch the default Mesos mode to coarse-grained mode

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11809:
--
Component/s: (was: SQL)
 Mesos

> Switch the default Mesos mode to coarse-grained mode
> 
>
> Key: SPARK-11809
> URL: https://issues.apache.org/jira/browse/SPARK-11809
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>  Labels: releasenotes
>
> Based on my conversions with people, I believe the consensus is that the 
> coarse-grained mode is more stable and easier to reason about. It is best to 
> use that as the default rather than the more flaky fine-grained mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-7471) DAG visualization: show call site information

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-7471.

  Resolution: Duplicate
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> DAG visualization: show call site information
> -
>
> Key: SPARK-7471
> URL: https://issues.apache.org/jira/browse/SPARK-7471
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.6.0
>
>
> It would be useful to find the line that created the RDD / scope.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11700) Memory leak at SparkContext jobProgressListener stageIdToData map

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11700:
--
Assignee: Shixiong Zhu

> Memory leak at SparkContext jobProgressListener stageIdToData map
> -
>
> Key: SPARK-11700
> URL: https://issues.apache.org/jira/browse/SPARK-11700
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0, 1.5.1, 1.5.2
> Environment: Ubuntu 14.04 LTS, Oracle JDK 1.8.51 Apache tomcat 
> 8.0.28. Spring 4
>Reporter: Kostas papageorgopoulos
>Assignee: Shixiong Zhu
>Priority: Critical
>  Labels: leak, memory-leak
> Attachments: AbstractSparkJobRunner.java, 
> SparkContextPossibleMemoryLeakIDEA_DEBUG.png, SparkHeapSpaceProgress.png, 
> SparkMemoryAfterLotsOfConsecutiveRuns.png, 
> SparkMemoryLeakAfterLotsOfRunsWithinTheSameContext.png
>
>
> it seems that there is  A SparkContext jobProgressListener memory leak.*. 
> Bellow i describe the  steps i do to reproduce that. 
> I have created a java webapp trying to abstractly Run some Spark Sql jobs 
> that read data from HDFS (join them) and Write them To ElasticSearch using ES 
> hadoop connector. After a Lot of consecutive runs  i noticed that my heap 
> space was full so i got an out of heap space error.
> At the attached file {code} AbstractSparkJobRunner {code} the {code}  public 
> final void run(T jobConfiguration, ExecutionLog executionLog) throws 
> Exception  {code} runs each time an Spark Sql Job is triggered.  So tried to 
> reuse the same SparkContext for a number of consecutive runs. If some rules 
> apply i try to clean up the SparkContext by first calling {code} 
> killSparkAndSqlContext {code}. This code eventually runs {code}  synchronized 
> (sparkContextThreadLock) {
> if (javaSparkContext != null) {
> LOGGER.info("!!! CLEARING SPARK 
> CONTEXT!!!");
> javaSparkContext.stop();
> javaSparkContext = null;
> sqlContext = null;
> System.gc();
> }
> numberOfRunningJobsForSparkContext.getAndSet(0);
> }
> {code}.
> So at some point in time i suppose that if no other SparkSql job should run i 
> should kill the sparkContext  (The 
> AbstractSparkJobRunner.killSparkAndSqlContext  runs) and this should be 
> garbage collected from garbage collector. However this is not the case, Even 
> if in my debugger shows that my JavaSparkContext object is null see attached 
> picture {code} SparkContextPossibleMemoryLeakIDEA_DEBUG.png {code}.
> The jvisual vm shows an incremental heap space even when the garbage 
> collector is called. See attached picture {code} SparkHeapSpaceProgress.png 
> {code}.
> The memory analyser Tool shows that a big part of the retained heap to be 
> assigned to _jobProgressListener see attached picture {code} 
> SparkMemoryAfterLotsOfConsecutiveRuns.png {code}  and summary picture {code} 
> SparkMemoryLeakAfterLotsOfRunsWithinTheSameContext.png {code}. Although at 
> the same time in Singleton Service the JavaSparkContext is null.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11843) Isolate staging directory across applications on same YARN cluster

2015-11-18 Thread Andrew Or (JIRA)
Andrew Or created SPARK-11843:
-

 Summary: Isolate staging directory across applications on same 
YARN cluster
 Key: SPARK-11843
 URL: https://issues.apache.org/jira/browse/SPARK-11843
 Project: Spark
  Issue Type: Bug
  Components: YARN
Reporter: Andrew Or
Priority: Minor


If multiple clients share the same YARN cluster and file system they may end up 
using the same `.sparkStaging` directory. This may be a problem if their jars 
are called something similar, for instance. It would be easier to enforce 
isolation for both security and user experience if the staging directories are 
isolated. We can either:

(1) allow users to configure the directory name
(2) add an identifier to the directory name, which I prefer



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11278) PageRank fails with unified memory manager

2015-11-18 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012638#comment-15012638
 ] 

Andrew Or commented on SPARK-11278:
---

also, when you said 6 nodes what kind of nodes are they? How much memory / 
cores per node?

> PageRank fails with unified memory manager
> --
>
> Key: SPARK-11278
> URL: https://issues.apache.org/jira/browse/SPARK-11278
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX, Spark Core
>Affects Versions: 1.5.1
>Reporter: Nishkam Ravi
>Priority: Critical
>
> PageRank (6-nodes, 32GB input) runs very slow and eventually fails with 
> ExecutorLostFailure. Traced it back to the 'unified memory manager' commit 
> from Oct 13th. Took a quick look at the code and couldn't see the problem 
> (changes look pretty good). cc'ing [~andrewor14][~vanzin] who may be able to 
> spot the problem quickly. Can be reproduced by running PageRank on a large 
> enough input dataset if needed. Sorry for not being of much help here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11649) "SparkListenerSuite.onTaskGettingResult() called when result fetched remotely" test is very slow

2015-11-18 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012601#comment-15012601
 ] 

Andrew Or commented on SPARK-11649:
---

I back ported it into 1.5.

> "SparkListenerSuite.onTaskGettingResult() called when result fetched 
> remotely" test is very slow
> 
>
> Key: SPARK-11649
> URL: https://issues.apache.org/jira/browse/SPARK-11649
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.5.3, 1.6.0
>
>
> The SparkListenerSuite "onTaskGettingResult() called when result fetched 
> remotely" test seems to take between 1 to 4 minutes to run in Jenkins, which 
> seems excessively slow; we should see if there's an easy way to speed this up:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-1.5-Maven-pre-YARN/938/HADOOP_VERSION=1.2.1,label=spark-test/testReport/org.apache.spark.scheduler/SparkListenerSuite/onTaskGettingResult___called_when_result_fetched_remotely/history/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11649) "SparkListenerSuite.onTaskGettingResult() called when result fetched remotely" test is very slow

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11649:
--
Target Version/s: 1.5.3, 1.6.0  (was: 1.6.0)

> "SparkListenerSuite.onTaskGettingResult() called when result fetched 
> remotely" test is very slow
> 
>
> Key: SPARK-11649
> URL: https://issues.apache.org/jira/browse/SPARK-11649
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.5.3, 1.6.0
>
>
> The SparkListenerSuite "onTaskGettingResult() called when result fetched 
> remotely" test seems to take between 1 to 4 minutes to run in Jenkins, which 
> seems excessively slow; we should see if there's an easy way to speed this up:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-1.5-Maven-pre-YARN/938/HADOOP_VERSION=1.2.1,label=spark-test/testReport/org.apache.spark.scheduler/SparkListenerSuite/onTaskGettingResult___called_when_result_fetched_remotely/history/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11649) "SparkListenerSuite.onTaskGettingResult() called when result fetched remotely" test is very slow

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11649:
--
Fix Version/s: 1.5.3

> "SparkListenerSuite.onTaskGettingResult() called when result fetched 
> remotely" test is very slow
> 
>
> Key: SPARK-11649
> URL: https://issues.apache.org/jira/browse/SPARK-11649
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.5.3, 1.6.0
>
>
> The SparkListenerSuite "onTaskGettingResult() called when result fetched 
> remotely" test seems to take between 1 to 4 minutes to run in Jenkins, which 
> seems excessively slow; we should see if there's an easy way to speed this up:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-1.5-Maven-pre-YARN/938/HADOOP_VERSION=1.2.1,label=spark-test/testReport/org.apache.spark.scheduler/SparkListenerSuite/onTaskGettingResult___called_when_result_fetched_remotely/history/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11649) "SparkListenerSuite.onTaskGettingResult() called when result fetched remotely" test is very slow

2015-11-18 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012598#comment-15012598
 ] 

Andrew Or commented on SPARK-11649:
---

oh I didn't realize, does the new RPC system have the same problem in master 
though?

> "SparkListenerSuite.onTaskGettingResult() called when result fetched 
> remotely" test is very slow
> 
>
> Key: SPARK-11649
> URL: https://issues.apache.org/jira/browse/SPARK-11649
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.5.3, 1.6.0
>
>
> The SparkListenerSuite "onTaskGettingResult() called when result fetched 
> remotely" test seems to take between 1 to 4 minutes to run in Jenkins, which 
> seems excessively slow; we should see if there's an easy way to speed this up:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-1.5-Maven-pre-YARN/938/HADOOP_VERSION=1.2.1,label=spark-test/testReport/org.apache.spark.scheduler/SparkListenerSuite/onTaskGettingResult___called_when_result_fetched_remotely/history/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11278) PageRank fails with unified memory manager

2015-11-18 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012632#comment-15012632
 ] 

Andrew Or commented on SPARK-11278:
---

[~nravi] can you try again with the latest 1.6 branch to see if this is still 
an issue? I wonder how this is different with 
https://github.com/apache/spark/commit/56419cf11f769c80f391b45dc41b3c7101cc5ff4.

> PageRank fails with unified memory manager
> --
>
> Key: SPARK-11278
> URL: https://issues.apache.org/jira/browse/SPARK-11278
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX, Spark Core
>Affects Versions: 1.5.1
>Reporter: Nishkam Ravi
>Priority: Critical
>
> PageRank (6-nodes, 32GB input) runs very slow and eventually fails with 
> ExecutorLostFailure. Traced it back to the 'unified memory manager' commit 
> from Oct 13th. Took a quick look at the code and couldn't see the problem 
> (changes look pretty good). cc'ing [~andrewor14][~vanzin] who may be able to 
> spot the problem quickly. Can be reproduced by running PageRank on a large 
> enough input dataset if needed. Sorry for not being of much help here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11278) PageRank fails with unified memory manager

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reassigned SPARK-11278:
-

Assignee: Andrew Or

> PageRank fails with unified memory manager
> --
>
> Key: SPARK-11278
> URL: https://issues.apache.org/jira/browse/SPARK-11278
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX, Spark Core
>Affects Versions: 1.5.1
>Reporter: Nishkam Ravi
>Assignee: Andrew Or
>Priority: Critical
>
> PageRank (6-nodes, 32GB input) runs very slow and eventually fails with 
> ExecutorLostFailure. Traced it back to the 'unified memory manager' commit 
> from Oct 13th. Took a quick look at the code and couldn't see the problem 
> (changes look pretty good). cc'ing [~andrewor14][~vanzin] who may be able to 
> spot the problem quickly. Can be reproduced by running PageRank on a large 
> enough input dataset if needed. Sorry for not being of much help here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10985) Avoid passing evicted blocks throughout BlockManager / CacheManager

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-10985:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: SPARK-1)

> Avoid passing evicted blocks throughout BlockManager / CacheManager
> ---
>
> Key: SPARK-10985
> URL: https://issues.apache.org/jira/browse/SPARK-10985
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Spark Core
>Reporter: Andrew Or
>Priority: Minor
>
> This is a minor refactoring task.
> Currently when we attempt to put a block in, we get back an array buffer of 
> blocks that are dropped in the process. We do this to propagate these blocks 
> back to our TaskContext, which will add them to its TaskMetrics so we can see 
> them in the SparkUI storage tab properly.
> Now that we have TaskContext.get, we can just use that to propagate this 
> information. This simplifies a lot of the signatures and gets rid of weird 
> return types like the following everywhere:
> {code}
> ArrayBuffer[(BlockId, BlockStatus)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11309) Clean up hacky use of MemoryManager inside of HashedRelation

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11309:
--
Issue Type: Bug  (was: Sub-task)
Parent: (was: SPARK-1)

> Clean up hacky use of MemoryManager inside of HashedRelation
> 
>
> Key: SPARK-11309
> URL: https://issues.apache.org/jira/browse/SPARK-11309
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Josh Rosen
>
> In HashedRelation, there's a hacky creation of a new MemoryManager in order 
> to handle broadcasting of BytesToBytesMap: 
> https://github.com/apache/spark/blob/85e654c5ec87e666a8845bfd77185c1ea57b268a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L323
> Something similar to this has existed for a while, but the code recently 
> became much messier as an indirect consequence of my memory manager 
> consolidation patch. We should see about cleaning this up and removing the 
> hack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10985) Avoid passing evicted blocks throughout BlockManager / CacheManager

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-10985:
--
Issue Type: Improvement  (was: Bug)

> Avoid passing evicted blocks throughout BlockManager / CacheManager
> ---
>
> Key: SPARK-10985
> URL: https://issues.apache.org/jira/browse/SPARK-10985
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager, Spark Core
>Reporter: Andrew Or
>Priority: Minor
>
> This is a minor refactoring task.
> Currently when we attempt to put a block in, we get back an array buffer of 
> blocks that are dropped in the process. We do this to propagate these blocks 
> back to our TaskContext, which will add them to its TaskMetrics so we can see 
> them in the SparkUI storage tab properly.
> Now that we have TaskContext.get, we can just use that to propagate this 
> information. This simplifies a lot of the signatures and gets rid of weird 
> return types like the following everywhere:
> {code}
> ArrayBuffer[(BlockId, BlockStatus)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10985) Avoid passing evicted blocks throughout BlockManager / CacheManager

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-10985:
--
Target Version/s:   (was: 1.6.0)

> Avoid passing evicted blocks throughout BlockManager / CacheManager
> ---
>
> Key: SPARK-10985
> URL: https://issues.apache.org/jira/browse/SPARK-10985
> Project: Spark
>  Issue Type: Sub-task
>  Components: Block Manager, Spark Core
>Reporter: Andrew Or
>Priority: Minor
>
> This is a minor refactoring task.
> Currently when we attempt to put a block in, we get back an array buffer of 
> blocks that are dropped in the process. We do this to propagate these blocks 
> back to our TaskContext, which will add them to its TaskMetrics so we can see 
> them in the SparkUI storage tab properly.
> Now that we have TaskContext.get, we can just use that to propagate this 
> information. This simplifies a lot of the signatures and gets rid of weird 
> return types like the following everywhere:
> {code}
> ArrayBuffer[(BlockId, BlockStatus)]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11700) Memory leak at SparkContext jobProgressListener stageIdToData map

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11700:
--
Priority: Critical  (was: Minor)

> Memory leak at SparkContext jobProgressListener stageIdToData map
> -
>
> Key: SPARK-11700
> URL: https://issues.apache.org/jira/browse/SPARK-11700
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0, 1.5.1, 1.5.2
> Environment: Ubuntu 14.04 LTS, Oracle JDK 1.8.51 Apache tomcat 
> 8.0.28. Spring 4
>Reporter: Kostas papageorgopoulos
>Priority: Critical
>  Labels: leak, memory-leak
> Attachments: AbstractSparkJobRunner.java, 
> SparkContextPossibleMemoryLeakIDEA_DEBUG.png, SparkHeapSpaceProgress.png, 
> SparkMemoryAfterLotsOfConsecutiveRuns.png, 
> SparkMemoryLeakAfterLotsOfRunsWithinTheSameContext.png
>
>
> it seems that there is  A SparkContext jobProgressListener memory leak.*. 
> Bellow i describe the  steps i do to reproduce that. 
> I have created a java webapp trying to abstractly Run some Spark Sql jobs 
> that read data from HDFS (join them) and Write them To ElasticSearch using ES 
> hadoop connector. After a Lot of consecutive runs  i noticed that my heap 
> space was full so i got an out of heap space error.
> At the attached file {code} AbstractSparkJobRunner {code} the {code}  public 
> final void run(T jobConfiguration, ExecutionLog executionLog) throws 
> Exception  {code} runs each time an Spark Sql Job is triggered.  So tried to 
> reuse the same SparkContext for a number of consecutive runs. If some rules 
> apply i try to clean up the SparkContext by first calling {code} 
> killSparkAndSqlContext {code}. This code eventually runs {code}  synchronized 
> (sparkContextThreadLock) {
> if (javaSparkContext != null) {
> LOGGER.info("!!! CLEARING SPARK 
> CONTEXT!!!");
> javaSparkContext.stop();
> javaSparkContext = null;
> sqlContext = null;
> System.gc();
> }
> numberOfRunningJobsForSparkContext.getAndSet(0);
> }
> {code}.
> So at some point in time i suppose that if no other SparkSql job should run i 
> should kill the sparkContext  (The 
> AbstractSparkJobRunner.killSparkAndSqlContext  runs) and this should be 
> garbage collected from garbage collector. However this is not the case, Even 
> if in my debugger shows that my JavaSparkContext object is null see attached 
> picture {code} SparkContextPossibleMemoryLeakIDEA_DEBUG.png {code}.
> The jvisual vm shows an incremental heap space even when the garbage 
> collector is called. See attached picture {code} SparkHeapSpaceProgress.png 
> {code}.
> The memory analyser Tool shows that a big part of the retained heap to be 
> assigned to _jobProgressListener see attached picture {code} 
> SparkMemoryAfterLotsOfConsecutiveRuns.png {code}  and summary picture {code} 
> SparkMemoryLeakAfterLotsOfRunsWithinTheSameContext.png {code}. Although at 
> the same time in Singleton Service the JavaSparkContext is null.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11700) Memory leak at SparkContext jobProgressListener stageIdToData map

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11700:
--
Target Version/s: 1.6.0

> Memory leak at SparkContext jobProgressListener stageIdToData map
> -
>
> Key: SPARK-11700
> URL: https://issues.apache.org/jira/browse/SPARK-11700
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.5.0, 1.5.1, 1.5.2
> Environment: Ubuntu 14.04 LTS, Oracle JDK 1.8.51 Apache tomcat 
> 8.0.28. Spring 4
>Reporter: Kostas papageorgopoulos
>Priority: Critical
>  Labels: leak, memory-leak
> Attachments: AbstractSparkJobRunner.java, 
> SparkContextPossibleMemoryLeakIDEA_DEBUG.png, SparkHeapSpaceProgress.png, 
> SparkMemoryAfterLotsOfConsecutiveRuns.png, 
> SparkMemoryLeakAfterLotsOfRunsWithinTheSameContext.png
>
>
> it seems that there is  A SparkContext jobProgressListener memory leak.*. 
> Bellow i describe the  steps i do to reproduce that. 
> I have created a java webapp trying to abstractly Run some Spark Sql jobs 
> that read data from HDFS (join them) and Write them To ElasticSearch using ES 
> hadoop connector. After a Lot of consecutive runs  i noticed that my heap 
> space was full so i got an out of heap space error.
> At the attached file {code} AbstractSparkJobRunner {code} the {code}  public 
> final void run(T jobConfiguration, ExecutionLog executionLog) throws 
> Exception  {code} runs each time an Spark Sql Job is triggered.  So tried to 
> reuse the same SparkContext for a number of consecutive runs. If some rules 
> apply i try to clean up the SparkContext by first calling {code} 
> killSparkAndSqlContext {code}. This code eventually runs {code}  synchronized 
> (sparkContextThreadLock) {
> if (javaSparkContext != null) {
> LOGGER.info("!!! CLEARING SPARK 
> CONTEXT!!!");
> javaSparkContext.stop();
> javaSparkContext = null;
> sqlContext = null;
> System.gc();
> }
> numberOfRunningJobsForSparkContext.getAndSet(0);
> }
> {code}.
> So at some point in time i suppose that if no other SparkSql job should run i 
> should kill the sparkContext  (The 
> AbstractSparkJobRunner.killSparkAndSqlContext  runs) and this should be 
> garbage collected from garbage collector. However this is not the case, Even 
> if in my debugger shows that my JavaSparkContext object is null see attached 
> picture {code} SparkContextPossibleMemoryLeakIDEA_DEBUG.png {code}.
> The jvisual vm shows an incremental heap space even when the garbage 
> collector is called. See attached picture {code} SparkHeapSpaceProgress.png 
> {code}.
> The memory analyser Tool shows that a big part of the retained heap to be 
> assigned to _jobProgressListener see attached picture {code} 
> SparkMemoryAfterLotsOfConsecutiveRuns.png {code}  and summary picture {code} 
> SparkMemoryLeakAfterLotsOfRunsWithinTheSameContext.png {code}. Although at 
> the same time in Singleton Service the JavaSparkContext is null.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11309) Clean up hacky use of MemoryManager inside of HashedRelation

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11309:
--
Issue Type: Improvement  (was: Bug)

> Clean up hacky use of MemoryManager inside of HashedRelation
> 
>
> Key: SPARK-11309
> URL: https://issues.apache.org/jira/browse/SPARK-11309
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Josh Rosen
>
> In HashedRelation, there's a hacky creation of a new MemoryManager in order 
> to handle broadcasting of BytesToBytesMap: 
> https://github.com/apache/spark/blob/85e654c5ec87e666a8845bfd77185c1ea57b268a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L323
> Something similar to this has existed for a while, but the code recently 
> became much messier as an indirect consequence of my memory manager 
> consolidation patch. We should see about cleaning this up and removing the 
> hack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10930) History "Stages" page "duration" can be confusing

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-10930.
---
  Resolution: Fixed
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> History "Stages" page "duration" can be confusing
> -
>
> Key: SPARK-10930
> URL: https://issues.apache.org/jira/browse/SPARK-10930
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Derek Dagit
> Fix For: 1.6.0
>
>
> The spark history server, "stages" page shows each stage submitted time and 
> the duration.  The duration can be confusing since the time it actually 
> starts tasks might be much later then its submitted if its waiting on 
> previous stages.  This makes it hard to figure out which stages were really 
> slow without clicking into each stage.
> It would be nice to perhaps have a first task launched time or processing 
> time spent in each stage to easily be able to find the slow stages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-7628) DAG visualization: position graphs with semantic awareness

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-7628.

Resolution: Won't Fix

> DAG visualization: position graphs with semantic awareness
> --
>
> Key: SPARK-7628
> URL: https://issues.apache.org/jira/browse/SPARK-7628
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Many streaming operations aggregate over many batches. The current layout 
> puts the aggregation stage at the end, resulting in many overlapping edges 
> that together form a piece of beautiful artwork but nevertheless clutter the 
> intended visualization.
> One thing we could do is to put any stage that has N incoming edges on the 
> next line rather than piling it up vertically on the right.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7348) DAG visualization: add links to RDD page

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7348:
-
Issue Type: Improvement  (was: Sub-task)
Parent: (was: SPARK-7463)

> DAG visualization: add links to RDD page
> 
>
> Key: SPARK-7348
> URL: https://issues.apache.org/jira/browse/SPARK-7348
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>
> It currently has links from the job page to the stage page. It would be nice 
> if it has links to the corresponding RDD page as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7463) DAG visualization improvements

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-7463.
--
   Resolution: Fixed
Fix Version/s: 1.6.0

> DAG visualization improvements
> --
>
> Key: SPARK-7463
> URL: https://issues.apache.org/jira/browse/SPARK-7463
> Project: Spark
>  Issue Type: Umbrella
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.6.0
>
>
> This is the umbrella JIRA for improvements or bug fixes to the DAG 
> visualization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11649) "SparkListenerSuite.onTaskGettingResult() called when result fetched remotely" test is very slow

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11649.
---
  Resolution: Fixed
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> "SparkListenerSuite.onTaskGettingResult() called when result fetched 
> remotely" test is very slow
> 
>
> Key: SPARK-11649
> URL: https://issues.apache.org/jira/browse/SPARK-11649
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 1.6.0
>
>
> The SparkListenerSuite "onTaskGettingResult() called when result fetched 
> remotely" test seems to take between 1 to 4 minutes to run in Jenkins, which 
> seems excessively slow; we should see if there's an easy way to speed this up:
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/Spark-1.5-Maven-pre-YARN/938/HADOOP_VERSION=1.2.1,label=spark-test/testReport/org.apache.spark.scheduler/SparkListenerSuite/onTaskGettingResult___called_when_result_fetched_remotely/history/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-7349) DAG visualization: add legend to explain the content

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-7349.

Resolution: Won't Fix

> DAG visualization: add legend to explain the content
> 
>
> Key: SPARK-7349
> URL: https://issues.apache.org/jira/browse/SPARK-7349
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Minor
>
> Right now we have red dots and black dots here and there. It's not clear what 
> they mean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-7465) DAG visualization: RDD dependencies not always shown

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-7465.

Resolution: Won't Fix

> DAG visualization: RDD dependencies not always shown
> 
>
> Key: SPARK-7465
> URL: https://issues.apache.org/jira/browse/SPARK-7465
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> Currently if the same RDD appears in multiple stages, the arrow will be drawn 
> only for the first occurrence. It may be too much to show the dependency on 
> every single occurrence of the same RDD (common in MLlib and GraphX), but we 
> should at least show them on hover so the user knows where the RDDs are 
> coming from.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8716) Write tests for executor shared cache feature

2015-11-18 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8716:
-
Target Version/s:   (was: 1.6.0)

> Write tests for executor shared cache feature
> -
>
> Key: SPARK-8716
> URL: https://issues.apache.org/jira/browse/SPARK-8716
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Tests
>Affects Versions: 1.2.0
>Reporter: Andrew Or
>
> More specifically, this is the feature that is currently flagged by 
> `spark.files.useFetchCache`.
> This is a complicated feature that has no tests. I cannot say with confidence 
> that it actually works on all cluster managers. In particular, I believe it 
> doesn't work on Mesos because whatever goes into this else case creates its 
> own temp directory per executor: 
> https://github.com/apache/spark/blob/881662e9c93893430756320f51cef0fc6643f681/core/src/main/scala/org/apache/spark/util/Utils.scala#L739.
> It's also not immediately clear that it works on standalone mode due to the 
> lack of comments. It actually does work there because the Worker happens to 
> set a `SPARK_EXECUTOR_DIRS` variable. The linkage could be more explicitly 
> documented in the code.
> This is difficult to write tests for, but it's still important to do so. 
> Otherwise, semi-related changes in the future may easily break it without 
> anyone noticing.
> Related issues: SPARK-8130, SPARK-6313, SPARK-2713



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-9552) Dynamic allocation kills busy executors on race condition

2015-11-17 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-9552.
--
  Resolution: Fixed
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Dynamic allocation kills busy executors on race condition
> -
>
> Key: SPARK-9552
> URL: https://issues.apache.org/jira/browse/SPARK-9552
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.4.0, 1.4.1
>Reporter: Jie Huang
>Assignee: Jie Huang
> Fix For: 1.6.0
>
>
> By using the dynamic allocation, sometimes it occurs false killing for those 
> busy executors. Some executors with assignments will be killed because of 
> being idle for enough time (say 60 seconds). The root cause is that the 
> Task-Launch listener event is asynchronized.
> For example, some executors are under assigning tasks, but not sending out 
> the listener notification yet. Meanwhile, the dynamic allocation's executor 
> idle time is up (e.g., 60 seconds). It will trigger killExecutor event at the 
> same time.
> the timer expiration starts before the listener event arrives.
> Then, the task is going to run on top of that killed/killing executor. It 
> will lead to task failure finally.
> Here is the proposal to fix it. We can add the force control for 
> killExecutor. If the force control is not set (i.e., false), we'd better to 
> check if the executor under killing is idle or busy. If the current executor 
> has some assignment, we should not kill that executor and return back false 
> (to indicate killing failure). In dynamic allocation, we'd better to turn off 
> force killing (i.e., force = false), we will meet killing failure if tries to 
> kill a busy executor. And then, the executor timer won't be invalid. Later 
> on, the task assignment event arrives, we can remove the idle timer 
> accordingly. So that we can avoid false killing for those busy executors in 
> dynamic allocation.
> For the rest of usages, the end users can decide if to use force killing or 
> not by themselves. If to turn on that option, the killExecutor will do the 
> action without any status checking.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11790) Flaky test: KafkaStreamTests.test_kafka_direct_stream_foreach_get_offsetRanges

2015-11-17 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11790.
---
  Resolution: Fixed
Assignee: Shixiong Zhu
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Flaky test:  
> KafkaStreamTests.test_kafka_direct_stream_foreach_get_offsetRanges
> ---
>
> Key: SPARK-11790
> URL: https://issues.apache.org/jira/browse/SPARK-11790
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Streaming, Tests
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>  Labels: flaky-test
> Fix For: 1.6.0
>
>
> Jenkins link: 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46041/consoleFull
> {code}
> ==
> ERROR: test_kafka_direct_stream_foreach_get_offsetRanges 
> (__main__.KafkaStreamTests)
> Test the Python direct Kafka stream foreachRDD get offsetRanges.
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/streaming/tests.py",
>  line 876, in setUp
> self._kafkaTestUtils.setup()
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
>  line 813, in __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File 
> "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.9-src.zip/py4j/protocol.py",
>  line 308, in get_return_value
> format(target_id, ".", name), value)
> Py4JJavaError: An error occurred while calling o11914.setup.
> : org.I0Itec.zkclient.exception.ZkTimeoutException: Unable to connect to 
> zookeeper server within timeout: 6000
>   at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:880)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:98)
>   at org.I0Itec.zkclient.ZkClient.(ZkClient.java:84)
>   at 
> org.apache.spark.streaming.kafka.KafkaTestUtils.setupEmbeddedZookeeper(KafkaTestUtils.scala:99)
>   at 
> org.apache.spark.streaming.kafka.KafkaTestUtils.setup(KafkaTestUtils.scala:122)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:209)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11726) Legacy Netty-RPC based submission in standalone mode does not work

2015-11-17 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11726.
---
  Resolution: Fixed
Assignee: Jacek Laskowski
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Legacy Netty-RPC based submission in standalone mode does not work
> --
>
> Key: SPARK-11726
> URL: https://issues.apache.org/jira/browse/SPARK-11726
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Submit
>Reporter: Jacek Lewandowski
>Assignee: Jacek Laskowski
> Fix For: 1.6.0
>
>
> When the application is to be submitted in cluster mode and standalone Spark 
> scheduler is used either legacy RPC based protocol or REST based protocol can 
> be used. Spark submit firstly tries REST and if it fails it tries RPC.
> When Akka based RPC is used, the REST based connection fails immediately 
> because Akka rejects non-Akka connection. However in Netty based RPC, the 
> REST client seems to wait for the response indefinitely, thus making it 
> impossible to fail and try RPC.
> The fix is quite simple - set a timeout on reading response from the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11771) Maximum memory is determined by two params but error message only lists one.

2015-11-17 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11771.
---
  Resolution: Fixed
Assignee: holdenk
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Maximum memory is determined by two params but error message only lists one.
> 
>
> Key: SPARK-11771
> URL: https://issues.apache.org/jira/browse/SPARK-11771
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
> Fix For: 1.6.0
>
>
> When we exceed the max memory tell users to increase both params instead of 
> just the one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11726) Legacy Netty-RPC based submission in standalone mode does not work

2015-11-17 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010288#comment-15010288
 ] 

Andrew Or commented on SPARK-11726:
---

oops, fixed

> Legacy Netty-RPC based submission in standalone mode does not work
> --
>
> Key: SPARK-11726
> URL: https://issues.apache.org/jira/browse/SPARK-11726
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Submit
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
> Fix For: 1.6.0
>
>
> When the application is to be submitted in cluster mode and standalone Spark 
> scheduler is used either legacy RPC based protocol or REST based protocol can 
> be used. Spark submit firstly tries REST and if it fails it tries RPC.
> When Akka based RPC is used, the REST based connection fails immediately 
> because Akka rejects non-Akka connection. However in Netty based RPC, the 
> REST client seems to wait for the response indefinitely, thus making it 
> impossible to fail and try RPC.
> The fix is quite simple - set a timeout on reading response from the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11726) Legacy Netty-RPC based submission in standalone mode does not work

2015-11-17 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11726:
--
Assignee: Jacek Lewandowski  (was: Jacek Laskowski)

> Legacy Netty-RPC based submission in standalone mode does not work
> --
>
> Key: SPARK-11726
> URL: https://issues.apache.org/jira/browse/SPARK-11726
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Submit
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
> Fix For: 1.6.0
>
>
> When the application is to be submitted in cluster mode and standalone Spark 
> scheduler is used either legacy RPC based protocol or REST based protocol can 
> be used. Spark submit firstly tries REST and if it fails it tries RPC.
> When Akka based RPC is used, the REST based connection fails immediately 
> because Akka rejects non-Akka connection. However in Netty based RPC, the 
> REST client seems to wait for the response indefinitely, thus making it 
> impossible to fail and try RPC.
> The fix is quite simple - set a timeout on reading response from the server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11732) MiMa excludes miss private classes

2015-11-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11732:
--
Target Version/s: 1.6.0

> MiMa excludes miss private classes
> --
>
> Key: SPARK-11732
> URL: https://issues.apache.org/jira/browse/SPARK-11732
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5.1
>Reporter: Tim Hunter
>Assignee: Tim Hunter
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The checks in GenerateMIMAIgnore only check for package private classes, not 
> private classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11732) MiMa excludes miss private classes

2015-11-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11732:
--
Assignee: Tim Hunter

> MiMa excludes miss private classes
> --
>
> Key: SPARK-11732
> URL: https://issues.apache.org/jira/browse/SPARK-11732
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.5.1
>Reporter: Tim Hunter
>Assignee: Tim Hunter
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The checks in GenerateMIMAIgnore only check for package private classes, not 
> private classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11480) Wrong callsite is displayed when using AsyncRDDActions#takeAsync

2015-11-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11480.
---
   Resolution: Fixed
 Assignee: Kousuke Saruta
Fix Version/s: 1.6.0

> Wrong callsite is displayed when using AsyncRDDActions#takeAsync
> 
>
> Key: SPARK-11480
> URL: https://issues.apache.org/jira/browse/SPARK-11480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 1.6.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 1.6.0
>
>
> When we call AsyncRDDActions#takeAsync, actually another DAGScheduler#runJob 
> is called from another thread so we cannot get proper callsite infomation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11710) Document new memory management model

2015-11-16 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11710.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

> Document new memory management model
> 
>
> Key: SPARK-11710
> URL: https://issues.apache.org/jira/browse/SPARK-11710
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, Spark Core
>Affects Versions: 1.6.0
>Reporter: Andrew Or
>Assignee: Andrew Or
> Fix For: 1.6.0
>
>
> e.g. tuning guide still references old deprecated configs
> https://spark.apache.org/docs/1.5.0/tuning.html#garbage-collection-tuning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8029) ShuffleMapTasks must be robust to concurrent attempts on the same executor

2015-11-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8029:
-
Target Version/s: 1.5.3, 1.6.0  (was: 1.5.2, 1.6.0)

> ShuffleMapTasks must be robust to concurrent attempts on the same executor
> --
>
> Key: SPARK-8029
> URL: https://issues.apache.org/jira/browse/SPARK-8029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Imran Rashid
>Assignee: Davies Liu
>Priority: Critical
> Fix For: 1.5.3, 1.6.0
>
> Attachments: 
> AlternativesforMakingShuffleMapTasksRobusttoMultipleAttempts.pdf
>
>
> When stages get retried, a task may have more than one attempt running at the 
> same time, on the same executor.  Currently this causes problems for 
> ShuffleMapTasks, since all attempts try to write to the same output files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7308) Should there be multiple concurrent attempts for one stage?

2015-11-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-7308:
-
Assignee: Davies Liu

> Should there be multiple concurrent attempts for one stage?
> ---
>
> Key: SPARK-7308
> URL: https://issues.apache.org/jira/browse/SPARK-7308
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1
>Reporter: Imran Rashid
>Assignee: Davies Liu
> Fix For: 1.5.3, 1.6.0
>
> Attachments: SPARK-7308_discussion.pdf
>
>
> Currently, when there is a fetch failure, you can end up with multiple 
> concurrent attempts for the same stage.  Is this intended?  At best, it leads 
> to some very confusing behavior, and it makes it hard for the user to make 
> sense of what is going on.  At worst, I think this is cause of some very 
> strange errors we've seen errors we've seen from users, where stages start 
> executing before all the dependent stages have completed.
> This can happen in the following scenario:  there is a fetch failure in 
> attempt 0, so the stage is retried.  attempt 1 starts.  But, tasks from 
> attempt 0 are still running -- some of them can also hit fetch failures after 
> attempt 1 starts.  That will cause additional stage attempts to get fired up.
> There is an attempt to handle this already 
> https://github.com/apache/spark/blob/16860327286bc08b4e2283d51b4c8fe024ba5006/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1105
> but that only checks whether the **stage** is running.  It really should 
> check whether that **attempt** is still running, but there isn't enough info 
> to do that.  
> I'll also post some info on how to reproduce this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8029) ShuffleMapTasks must be robust to concurrent attempts on the same executor

2015-11-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8029:
-
Description: 
When stages get retried, a task may have more than one attempt running at the 
same time, on the same executor.  Currently this causes problems for 
ShuffleMapTasks, since all attempts try to write to the same output files.

This is finally resolved through https://github.com/apache/spark/pull/9610, 
which uses the first writer wins approach.

  was:
When stages get retried, a task may have more than one attempt running at the 
same time, on the same executor.  Currently this causes problems for 
ShuffleMapTasks, since all attempts try to write to the same output files.

This is resolved through 


> ShuffleMapTasks must be robust to concurrent attempts on the same executor
> --
>
> Key: SPARK-8029
> URL: https://issues.apache.org/jira/browse/SPARK-8029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Imran Rashid
>Assignee: Davies Liu
>Priority: Critical
> Fix For: 1.5.3, 1.6.0
>
> Attachments: 
> AlternativesforMakingShuffleMapTasksRobusttoMultipleAttempts.pdf
>
>
> When stages get retried, a task may have more than one attempt running at the 
> same time, on the same executor.  Currently this causes problems for 
> ShuffleMapTasks, since all attempts try to write to the same output files.
> This is finally resolved through https://github.com/apache/spark/pull/9610, 
> which uses the first writer wins approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7829) SortShuffleWriter writes inconsistent data & index files on stage retry

2015-11-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-7829.
--
  Resolution: Fixed
Assignee: Davies Liu  (was: Imran Rashid)
   Fix Version/s: 1.6.0
  1.5.3
Target Version/s: 1.5.3, 1.6.0

> SortShuffleWriter writes inconsistent data & index files on stage retry
> ---
>
> Key: SPARK-7829
> URL: https://issues.apache.org/jira/browse/SPARK-7829
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.3.1
>Reporter: Imran Rashid
>Assignee: Davies Liu
> Fix For: 1.5.3, 1.6.0
>
>
> When a stage is retried, even if a shuffle map task was successful, it may 
> get retried in any case.  If it happens to get scheduled on the same 
> executor, the old data file is *appended*, while the index file still assumes 
> the data starts in position 0.  This leads to an apparently corrupt shuffle 
> map output, since when the data file is read, the index file points to the 
> wrong location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8029) ShuffleMapTasks must be robust to concurrent attempts on the same executor

2015-11-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8029:
-
Description: 
When stages get retried, a task may have more than one attempt running at the 
same time, on the same executor.  Currently this causes problems for 
ShuffleMapTasks, since all attempts try to write to the same output files.

This is resolved through 

  was:When stages get retried, a task may have more than one attempt running at 
the same time, on the same executor.  Currently this causes problems for 
ShuffleMapTasks, since all attempts try to write to the same output files.


> ShuffleMapTasks must be robust to concurrent attempts on the same executor
> --
>
> Key: SPARK-8029
> URL: https://issues.apache.org/jira/browse/SPARK-8029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Imran Rashid
>Assignee: Davies Liu
>Priority: Critical
> Fix For: 1.5.3, 1.6.0
>
> Attachments: 
> AlternativesforMakingShuffleMapTasksRobusttoMultipleAttempts.pdf
>
>
> When stages get retried, a task may have more than one attempt running at the 
> same time, on the same executor.  Currently this causes problems for 
> ShuffleMapTasks, since all attempts try to write to the same output files.
> This is resolved through 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7829) SortShuffleWriter writes inconsistent data & index files on stage retry

2015-11-13 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004780#comment-15004780
 ] 

Andrew Or commented on SPARK-7829:
--

I believe this is now fixed due to https://github.com/apache/spark/pull/9610. 
Let me know if this is not the case.

> SortShuffleWriter writes inconsistent data & index files on stage retry
> ---
>
> Key: SPARK-7829
> URL: https://issues.apache.org/jira/browse/SPARK-7829
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.3.1
>Reporter: Imran Rashid
>Assignee: Imran Rashid
> Fix For: 1.5.3, 1.6.0
>
>
> When a stage is retried, even if a shuffle map task was successful, it may 
> get retried in any case.  If it happens to get scheduled on the same 
> executor, the old data file is *appended*, while the index file still assumes 
> the data starts in position 0.  This leads to an apparently corrupt shuffle 
> map output, since when the data file is read, the index file points to the 
> wrong location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8029) ShuffleMapTasks must be robust to concurrent attempts on the same executor

2015-11-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8029:
-
Fix Version/s: (was: 1.5.2)
   1.5.3

> ShuffleMapTasks must be robust to concurrent attempts on the same executor
> --
>
> Key: SPARK-8029
> URL: https://issues.apache.org/jira/browse/SPARK-8029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Imran Rashid
>Assignee: Davies Liu
>Priority: Critical
> Fix For: 1.5.3, 1.6.0
>
> Attachments: 
> AlternativesforMakingShuffleMapTasksRobusttoMultipleAttempts.pdf
>
>
> When stages get retried, a task may have more than one attempt running at the 
> same time, on the same executor.  Currently this causes problems for 
> ShuffleMapTasks, since all attempts try to write to the same output files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8029) ShuffleMapTasks must be robust to concurrent attempts on the same executor

2015-11-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8029:
-
Fix Version/s: 1.5.2

> ShuffleMapTasks must be robust to concurrent attempts on the same executor
> --
>
> Key: SPARK-8029
> URL: https://issues.apache.org/jira/browse/SPARK-8029
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Imran Rashid
>Assignee: Davies Liu
>Priority: Critical
> Fix For: 1.5.2, 1.6.0
>
> Attachments: 
> AlternativesforMakingShuffleMapTasksRobusttoMultipleAttempts.pdf
>
>
> When stages get retried, a task may have more than one attempt running at the 
> same time, on the same executor.  Currently this causes problems for 
> ShuffleMapTasks, since all attempts try to write to the same output files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8582) Optimize checkpointing to avoid computing an RDD twice

2015-11-13 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004874#comment-15004874
 ] 

Andrew Or commented on SPARK-8582:
--

Hi everyone, I have bumped this to 1.7.0 because of the potential performance 
regressions a fix could introduce. If you are affected by this and would like 
to solve this earlier, then you can workaround this by calling `persist` first 
before you call `checkpoint`. This ensures that the second time you compute the 
RDD reads from the cache instead, which is much faster for many workloads.

> Optimize checkpointing to avoid computing an RDD twice
> --
>
> Key: SPARK-8582
> URL: https://issues.apache.org/jira/browse/SPARK-8582
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Shixiong Zhu
>
> In Spark, checkpointing allows the user to truncate the lineage of his RDD 
> and save the intermediate contents to HDFS for fault tolerance. However, this 
> is not currently implemented super efficiently:
> Every time we checkpoint an RDD, we actually compute it twice: once during 
> the action that triggered the checkpointing in the first place, and once 
> while we checkpoint (we iterate through an RDD's partitions and write them to 
> disk). See this line for more detail: 
> https://github.com/apache/spark/blob/0401cbaa8ee51c71f43604f338b65022a479da0a/core/src/main/scala/org/apache/spark/rdd/RDDCheckpointData.scala#L102.
> Instead, we should have a `CheckpointingInterator` that writes checkpoint 
> data to HDFS while we run the action. This will speed up many usages of 
> `RDD#checkpoint` by 2X.
> (Alternatively, the user can just cache the RDD before checkpointing it, but 
> this is not always viable for very large input data. It's also not a great 
> API to use in general.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8582) Optimize checkpointing to avoid computing an RDD twice

2015-11-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-8582:
-
Target Version/s: 1.7.0  (was: 1.6.0)

> Optimize checkpointing to avoid computing an RDD twice
> --
>
> Key: SPARK-8582
> URL: https://issues.apache.org/jira/browse/SPARK-8582
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Shixiong Zhu
>
> In Spark, checkpointing allows the user to truncate the lineage of his RDD 
> and save the intermediate contents to HDFS for fault tolerance. However, this 
> is not currently implemented super efficiently:
> Every time we checkpoint an RDD, we actually compute it twice: once during 
> the action that triggered the checkpointing in the first place, and once 
> while we checkpoint (we iterate through an RDD's partitions and write them to 
> disk). See this line for more detail: 
> https://github.com/apache/spark/blob/0401cbaa8ee51c71f43604f338b65022a479da0a/core/src/main/scala/org/apache/spark/rdd/RDDCheckpointData.scala#L102.
> Instead, we should have a `CheckpointingInterator` that writes checkpoint 
> data to HDFS while we run the action. This will speed up many usages of 
> `RDD#checkpoint` by 2X.
> (Alternatively, the user can just cache the RDD before checkpointing it, but 
> this is not always viable for very large input data. It's also not a great 
> API to use in general.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7308) Should there be multiple concurrent attempts for one stage?

2015-11-13 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004783#comment-15004783
 ] 

Andrew Or commented on SPARK-7308:
--

Should this still be open given that all associated JIRAs are closed? I think 
we've already established that there's no bullet-proof way to do this on the 
scheduler side so we need to make the write side robust.

> Should there be multiple concurrent attempts for one stage?
> ---
>
> Key: SPARK-7308
> URL: https://issues.apache.org/jira/browse/SPARK-7308
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1
>Reporter: Imran Rashid
>Assignee: Imran Rashid
> Attachments: SPARK-7308_discussion.pdf
>
>
> Currently, when there is a fetch failure, you can end up with multiple 
> concurrent attempts for the same stage.  Is this intended?  At best, it leads 
> to some very confusing behavior, and it makes it hard for the user to make 
> sense of what is going on.  At worst, I think this is cause of some very 
> strange errors we've seen errors we've seen from users, where stages start 
> executing before all the dependent stages have completed.
> This can happen in the following scenario:  there is a fetch failure in 
> attempt 0, so the stage is retried.  attempt 1 starts.  But, tasks from 
> attempt 0 are still running -- some of them can also hit fetch failures after 
> attempt 1 starts.  That will cause additional stage attempts to get fired up.
> There is an attempt to handle this already 
> https://github.com/apache/spark/blob/16860327286bc08b4e2283d51b4c8fe024ba5006/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1105
> but that only checks whether the **stage** is running.  It really should 
> check whether that **attempt** is still running, but there isn't enough info 
> to do that.  
> I'll also post some info on how to reproduce this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7970) Optimize code for SQL queries fired on Union of RDDs (closure cleaner)

2015-11-13 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-7970.
--
  Resolution: Fixed
   Fix Version/s: 1.6.0
Target Version/s: 1.6.0

> Optimize code for SQL queries fired on Union of RDDs (closure cleaner)
> --
>
> Key: SPARK-7970
> URL: https://issues.apache.org/jira/browse/SPARK-7970
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Nitin Goyal
>Assignee: Nitin Goyal
> Fix For: 1.6.0
>
> Attachments: Screen Shot 2015-05-27 at 11.01.03 pm.png, Screen Shot 
> 2015-05-27 at 11.07.02 pm.png
>
>
> Closure cleaner slows down the execution of Spark SQL queries fired on union 
> of RDDs. The time increases linearly at driver side with number of RDDs 
> unioned. Refer following thread for more context :-
> http://apache-spark-developers-list.1001551.n3.nabble.com/ClosureCleaner-slowing-down-Spark-SQL-queries-tt12466.html
> As can be seen in attached screenshots of Jprofiler, lot of time is getting 
> consumed in "getClassReader" method of ClosureCleaner and rest in 
> "ensureSerializable" (atleast in my case)
> This can be fixed in two ways (as per my current understanding) :-
> 1. Fixed at Spark SQL level - As pointed out by yhuai, we can create 
> MapPartitionsRDD idirectly nstead of doing rdd.mapPartitions which calls 
> ClosureCleaner clean method (See PR - 
> https://github.com/apache/spark/pull/6256).
> 2. Fix at Spark core level -
>   (i) Make "checkSerializable" property driven in SparkContext's clean method
>   (ii) Somehow cache classreader for last 'n' classes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11710) Document new memory management model

2015-11-12 Thread Andrew Or (JIRA)
Andrew Or created SPARK-11710:
-

 Summary: Document new memory management model
 Key: SPARK-11710
 URL: https://issues.apache.org/jira/browse/SPARK-11710
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, Spark Core
Affects Versions: 1.6.0
Reporter: Andrew Or
Assignee: Andrew Or


e.g. tuning guide still references old deprecated configs
https://spark.apache.org/docs/1.5.0/tuning.html#garbage-collection-tuning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11658) simplify documentation for PySpark combineByKey

2015-11-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-11658.
---
  Resolution: Fixed
Assignee: chris snow
   Fix Version/s: 1.7.0
Target Version/s: 1.7.0

> simplify documentation for PySpark combineByKey
> ---
>
> Key: SPARK-11658
> URL: https://issues.apache.org/jira/browse/SPARK-11658
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 1.5.1
>Reporter: chris snow
>Assignee: chris snow
>Priority: Minor
> Fix For: 1.7.0
>
>
> The current documentation for combineByKey looks like this:
> {code}
> >>> x = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
> >>> def f(x): return x
> >>> def add(a, b): return a + str(b)
> >>> sorted(x.combineByKey(str, add, add).collect())
> [('a', '11'), ('b', '1')]
> """
> {code}
> I think it could be simplified to:
> {code}
> >>> x = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
> >>> def add(a, b): return a + str(b)
> >>> x.combineByKey(str, add, add).collect()
> [('a', '11'), ('b', '1')]
> """
> {code}
> I'll shortly add a patch for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2533) Show summary of locality level of completed tasks in the each stage page of web UI

2015-11-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-2533:
-
Assignee: Jean-Baptiste Onofré

> Show summary of locality level of completed tasks in the each stage page of 
> web UI
> --
>
> Key: SPARK-2533
> URL: https://issues.apache.org/jira/browse/SPARK-2533
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Masayoshi TSUZUKI
>Assignee: Jean-Baptiste Onofré
>Priority: Minor
> Fix For: 1.6.0
>
>
> When the number of tasks is very large, it is impossible to know how many 
> tasks were executed under (PROCESS_LOCAL/NODE_LOCAL/RACK_LOCAL) from the 
> stage page of web UI. It would be better to show the summary of task locality 
> level in web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11671) Example for sqlContext.createDataDrame from pandas.DataFrame has a typo

2015-11-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11671:
--
Assignee: chris snow

> Example for sqlContext.createDataDrame from pandas.DataFrame has a typo
> ---
>
> Key: SPARK-11671
> URL: https://issues.apache.org/jira/browse/SPARK-11671
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 1.5.1
>Reporter: chris snow
>Assignee: chris snow
>Priority: Minor
> Fix For: 1.7.0
>
>
> PySpark documentation error:
> {code}
> sqlContext.createDataFrame(pandas.DataFrame([[1, 2]]).collect()) 
> {code}
> Results in:
> {code}
> ---
> AttributeErrorTraceback (most recent call last)
>  in ()
> > 1 sqlContext.createDataFrame(pandas.DataFrame([[1, 2]]).collect())
> /usr/local/src/bluemix_ipythonspark_141/notebook/lib/python2.7/site-packages/pandas-0.14.0-py2.7-linux-x86_64.egg/pandas/core/generic.pyc
>  in __getattr__(self, name)
>1841 return self[name]
>1842 raise AttributeError("'%s' object has no attribute '%s'" %
> -> 1843  (type(self).__name__, name))
>1844 
>1845 def __setattr__(self, name, value):
> AttributeError: 'DataFrame' object has no attribute 'collect'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11671) Example for sqlContext.createDataDrame from pandas.DataFrame has a typo

2015-11-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11671:
--
Fix Version/s: (was: 1.7.0)
   1.6.0

> Example for sqlContext.createDataDrame from pandas.DataFrame has a typo
> ---
>
> Key: SPARK-11671
> URL: https://issues.apache.org/jira/browse/SPARK-11671
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation, PySpark
>Affects Versions: 1.5.1
>Reporter: chris snow
>Assignee: chris snow
>Priority: Minor
> Fix For: 1.6.0
>
>
> PySpark documentation error:
> {code}
> sqlContext.createDataFrame(pandas.DataFrame([[1, 2]]).collect()) 
> {code}
> Results in:
> {code}
> ---
> AttributeErrorTraceback (most recent call last)
>  in ()
> > 1 sqlContext.createDataFrame(pandas.DataFrame([[1, 2]]).collect())
> /usr/local/src/bluemix_ipythonspark_141/notebook/lib/python2.7/site-packages/pandas-0.14.0-py2.7-linux-x86_64.egg/pandas/core/generic.pyc
>  in __getattr__(self, name)
>1841 return self[name]
>1842 raise AttributeError("'%s' object has no attribute '%s'" %
> -> 1843  (type(self).__name__, name))
>1844 
>1845 def __setattr__(self, name, value):
> AttributeError: 'DataFrame' object has no attribute 'collect'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11658) simplify documentation for PySpark combineByKey

2015-11-12 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-11658:
--
Fix Version/s: (was: 1.7.0)
   1.6.0

> simplify documentation for PySpark combineByKey
> ---
>
> Key: SPARK-11658
> URL: https://issues.apache.org/jira/browse/SPARK-11658
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 1.5.1
>Reporter: chris snow
>Assignee: chris snow
>Priority: Minor
> Fix For: 1.6.0
>
>
> The current documentation for combineByKey looks like this:
> {code}
> >>> x = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
> >>> def f(x): return x
> >>> def add(a, b): return a + str(b)
> >>> sorted(x.combineByKey(str, add, add).collect())
> [('a', '11'), ('b', '1')]
> """
> {code}
> I think it could be simplified to:
> {code}
> >>> x = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
> >>> def add(a, b): return a + str(b)
> >>> x.combineByKey(str, add, add).collect()
> [('a', '11'), ('b', '1')]
> """
> {code}
> I'll shortly add a patch for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    6   7   8   9   10   11   12   13   14   15   >