subject:"\[jira\] \[Commented\] \(SPARK\-2585\) Remove special handling of Hadoop JobConf"

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

2014-10-24 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183722#comment-14183722
 ] 

Apache Spark commented on SPARK-2585:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/2935

> Remove special handling of Hadoop JobConf
> -
>
> Key: SPARK-2585
> URL: https://issues.apache.org/jira/browse/SPARK-2585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>Priority: Critical
>
> This is a follow up to SPARK-2521 and should close SPARK-2546 (provided the 
> implementation does not use shared conf objects). We no longer need to 
> specially broadcast the Hadoop configuration since we are broadcasting RDD 
> data anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

2014-10-06 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161383#comment-14161383
 ] 

Josh Rosen commented on SPARK-2585:
---

[~pwendell] I obtained these numbers by adding this benchmark to the Spark 
Examples package and running it through {{./bin/run-example}}, which I think 
should have given it a pretty big classpath.

> Remove special handling of Hadoop JobConf
> -
>
> Key: SPARK-2585
> URL: https://issues.apache.org/jira/browse/SPARK-2585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>Priority: Critical
>
> This is a follow up to SPARK-2521 and should close SPARK-2546 (provided the 
> implementation does not use shared conf objects). We no longer need to 
> specially broadcast the Hadoop configuration since we are broadcasting RDD 
> data anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

2014-10-06 Thread Patrick Wendell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161382#comment-14161382
 ] 

Patrick Wendell commented on SPARK-2585:


Hey [~joshrosen] what happens if you run your benchmark inside of the Spark 
shell? If the JobConf constructor searches the classpath this could end up 
taking a lot longer in that environment. It would be good to make sure.

> Remove special handling of Hadoop JobConf
> -
>
> Key: SPARK-2585
> URL: https://issues.apache.org/jira/browse/SPARK-2585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>Priority: Critical
>
> This is a follow up to SPARK-2521 and should close SPARK-2546 (provided the 
> implementation does not use shared conf objects). We no longer need to 
> specially broadcast the Hadoop configuration since we are broadcasting RDD 
> data anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

2014-10-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161166#comment-14161166
 ] 

Apache Spark commented on SPARK-2585:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/2683

> Remove special handling of Hadoop JobConf
> -
>
> Key: SPARK-2585
> URL: https://issues.apache.org/jira/browse/SPARK-2585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>Priority: Critical
>
> This is a follow up to SPARK-2521 and should close SPARK-2546 (provided the 
> implementation does not use shared conf objects). We no longer need to 
> specially broadcast the Hadoop configuration since we are broadcasting RDD 
> data anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

2014-10-06 Thread Andrew Ash (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160642#comment-14160642
 ] 

Andrew Ash commented on SPARK-2585:
---

I also vote for "correct by default" and there are various potentially 
dangerous knobs you can turn to squeeze out more performance at risk if you 
care to.

Note that the {{new JobConf()}} constructor loads defaults out of the Hadoop 
config files (I think {{core-site.xml}} and {{hdfs-site.xml}}) and you can 
disable that with the {{JobConf(false)}} constructor.  Not sure if we need the 
local config files per server or if all the config options come solely from the 
driver's {{JobConf}} when instantiated.

> Remove special handling of Hadoop JobConf
> -
>
> Key: SPARK-2585
> URL: https://issues.apache.org/jira/browse/SPARK-2585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>Priority: Critical
>
> This is a follow up to SPARK-2521 and should close SPARK-2546 (provided the 
> implementation does not use shared conf objects). We no longer need to 
> specially broadcast the Hadoop configuration since we are broadcasting RDD 
> data anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

2014-10-06 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14160589#comment-14160589
 ] 

Josh Rosen commented on SPARK-2585:
---

I tried benchmarking the time need to create a new JobConf() object and it 
looks like it takes ~2.3 milliseconds:

{code}
import org.apache.hadoop.mapred.JobConf


object HadoopConfBenchmark {
  def main(args: Array[String]) {
val numConfs = 1
val start = System.currentTimeMillis()
for (i <- 1 to numConfs) {
  new JobConf()
}
val end = System.currentTimeMillis()
println(s"Took ${end - start} ms to create $numConfs new JobConfs")
  }
}
{code}

On my laptop, this outputs

{code}
Took 23492 ms to create 1 new JobConfs
{code}

Since the correlation optimizer tests ran ~7 seconds slower with this PR, this 
slowdown might be explained if those tests were running ~3000 tasks.  This is 
actually plausible, since the default parallelism was pretty high in those 
tests (~200 partitions, if I recall) and the queries were very complicated.

For most real deployments (e.g. not running in local mode), the extra 2ms will 
probably be masked by other latencies (e.g. RPC), so I'd say that we should 
merge this patch for now and try to regain the performance elsewhere if it 
turns out to be a problem.

There's the option of putting this behind a configuration option, but I don't 
like that approach because I feel that it's important to be "correct by 
default" and not have options that sacrifice correctness for performance.

> Remove special handling of Hadoop JobConf
> -
>
> Key: SPARK-2585
> URL: https://issues.apache.org/jira/browse/SPARK-2585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>Priority: Critical
>
> This is a follow up to SPARK-2521 and should close SPARK-2546 (provided the 
> implementation does not use shared conf objects). We no longer need to 
> specially broadcast the Hadoop configuration since we are broadcasting RDD 
> data anyways.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

2014-08-15 Thread Patrick Wendell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099234#comment-14099234
 ] 

Patrick Wendell commented on SPARK-2585:


Unfortunately after a lot of effort we still can't get the test times down on 
this one and it's still unclear whether it will cause performance regressions.

Since this isn't particularly critical from a user perspective (it's mostly 
about simplifying internals) I think it's best to punt this to 1.2. One 
unfortunate thing is that it means SPARK-2546 will remain broken in 1.1.

> Remove special handling of Hadoop JobConf
> -
>
> Key: SPARK-2585
> URL: https://issues.apache.org/jira/browse/SPARK-2585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Josh Rosen
>Priority: Critical
>
> This is a follow up to SPARK-2521 and should close SPARK-2546 (provided the 
> implementation does not use shared conf objects). We no longer need to 
> specially broadcast the Hadoop configuration since we are broadcasting RDD 
> data anyways.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

2014-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078879#comment-14078879
 ] 

Apache Spark commented on SPARK-2585:
-

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/1648

> Remove special handling of Hadoop JobConf
> -
>
> Key: SPARK-2585
> URL: https://issues.apache.org/jira/browse/SPARK-2585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Patrick Wendell
>Assignee: Reynold Xin
>Priority: Critical
>
> This is a follow up to SPARK-2521 and should close SPARK-2546 (provided the 
> implementation does not use shared conf objects). We no longer need to 
> specially broadcast the Hadoop configuration since we are broadcasting RDD 
> data anyways.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

[jira] [Commented] (SPARK-2585) Remove special handling of Hadoop JobConf

8 matches

Site Navigation

Mail list logo

Footer information