[jira] [Updated] (SPARK-5202) HiveContext doesn't support the Variables Substitution
[ https://issues.apache.org/jira/browse/SPARK-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-5202: - Description: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution This is a block issue for the CLI user, it impacts the existed hql scripts from Hive. was: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution This is a block issue for the CLI user, which will impact the existed hql scripts. > HiveContext doesn't support the Variables Substitution > -- > > Key: SPARK-5202 > URL: https://issues.apache.org/jira/browse/SPARK-5202 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Cheng Hao > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution > This is a block issue for the CLI user, it impacts the existed hql scripts > from Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5202) HiveContext doesn't support the Variables Substitution
[ https://issues.apache.org/jira/browse/SPARK-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273293#comment-14273293 ] Apache Spark commented on SPARK-5202: - User 'chenghao-intel' has created a pull request for this issue: https://github.com/apache/spark/pull/4003 > HiveContext doesn't support the Variables Substitution > -- > > Key: SPARK-5202 > URL: https://issues.apache.org/jira/browse/SPARK-5202 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Cheng Hao > > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution > This is a block issue for the CLI user, which will impact the existed hql > scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5201) ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing with inclusive range
[ https://issues.apache.org/jira/browse/SPARK-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273289#comment-14273289 ] Apache Spark commented on SPARK-5201: - User 'advancedxy' has created a pull request for this issue: https://github.com/apache/spark/pull/4002 > ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing > with inclusive range > -- > > Key: SPARK-5201 > URL: https://issues.apache.org/jira/browse/SPARK-5201 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Ye Xianjin > Labels: rdd > Fix For: 1.2.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > {code} > sc.makeRDD(1 to (Int.MaxValue)).count // result = 0 > sc.makeRDD(1 to (Int.MaxValue - 1)).count // result = 2147483646 = > Int.MaxValue - 1 > sc.makeRDD(1 until (Int.MaxValue)).count// result = 2147483646 = > Int.MaxValue - 1 > {code} > More details on the discussion https://github.com/apache/spark/pull/2874 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5202) HiveContext doesn't support the Variables Substitution
Cheng Hao created SPARK-5202: Summary: HiveContext doesn't support the Variables Substitution Key: SPARK-5202 URL: https://issues.apache.org/jira/browse/SPARK-5202 Project: Spark Issue Type: Bug Components: SQL Reporter: Cheng Hao https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution This is a block issue for the CLI user, which will impact the existed hql scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5201) ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing with inclusive range
[ https://issues.apache.org/jira/browse/SPARK-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273277#comment-14273277 ] Ye Xianjin commented on SPARK-5201: --- I will send a pr for this. > ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing > with inclusive range > -- > > Key: SPARK-5201 > URL: https://issues.apache.org/jira/browse/SPARK-5201 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Ye Xianjin > Labels: rdd > Fix For: 1.2.1 > > Original Estimate: 2h > Remaining Estimate: 2h > > {code} > sc.makeRDD(1 to (Int.MaxValue)).count // result = 0 > sc.makeRDD(1 to (Int.MaxValue - 1)).count // result = 2147483646 = > Int.MaxValue - 1 > sc.makeRDD(1 until (Int.MaxValue)).count// result = 2147483646 = > Int.MaxValue - 1 > {code} > More details on the discussion https://github.com/apache/spark/pull/2874 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5201) ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing with inclusive range
Ye Xianjin created SPARK-5201: - Summary: ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing with inclusive range Key: SPARK-5201 URL: https://issues.apache.org/jira/browse/SPARK-5201 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Ye Xianjin Fix For: 1.2.1 {code} sc.makeRDD(1 to (Int.MaxValue)).count // result = 0 sc.makeRDD(1 to (Int.MaxValue - 1)).count // result = 2147483646 = Int.MaxValue - 1 sc.makeRDD(1 until (Int.MaxValue)).count// result = 2147483646 = Int.MaxValue - 1 {code} More details on the discussion https://github.com/apache/spark/pull/2874 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4908) Spark SQL built for Hive 13 fails under concurrent metadata queries
[ https://issues.apache.org/jira/browse/SPARK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273271#comment-14273271 ] Apache Spark commented on SPARK-4908: - User 'baishuo' has created a pull request for this issue: https://github.com/apache/spark/pull/4001 > Spark SQL built for Hive 13 fails under concurrent metadata queries > --- > > Key: SPARK-4908 > URL: https://issues.apache.org/jira/browse/SPARK-4908 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: David Ross >Assignee: Cheng Lian >Priority: Blocker > Fix For: 1.3.0, 1.2.1 > > > We are trunk: {{1.3.0-SNAPSHOT}}, as of this commit: > https://github.com/apache/spark/commit/3d0c37b8118f6057a663f959321a79b8061132b6 > We are using Spark built for Hive 13, using this option: > {{-Phive-0.13.1}} > In single-threaded mode, normal operations look fine. However, under > concurrency, with at least 2 concurrent connections, metadata queries fail. > For example, {{USE some_db}}, {{SHOW TABLES}}, and the implicit {{USE}} > statement when you pass a default schema in the JDBC URL, all fail. > {{SELECT}} queries like {{SELECT * FROM some_table}} do not have this issue. > Here is some example code: > {code} > object main extends App { > import java.sql._ > import scala.concurrent._ > import scala.concurrent.duration._ > import scala.concurrent.ExecutionContext.Implicits.global > Class.forName("org.apache.hive.jdbc.HiveDriver") > val host = "localhost" // update this > val url = s"jdbc:hive2://${host}:10511/some_db" // update this > val future = Future.traverse(1 to 3) { i => > Future { > println("Starting: " + i) > try { > val conn = DriverManager.getConnection(url) > } catch { > case e: Throwable => e.printStackTrace() > println("Failed: " + i) > } > println("Finishing: " + i) > } > } > Await.result(future, 2.minutes) > println("done!") > } > {code} > Here is the output: > {code} > Starting: 1 > Starting: 3 > Starting: 2 > java.sql.SQLException: > org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation > cancelled > at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) > at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) > at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) > at > org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451) > at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:195) > at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) > at java.sql.DriverManager.getConnection(DriverManager.java:664) > at java.sql.DriverManager.getConnection(DriverManager.java:270) > at > com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896) > at > com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) > at > com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Failed: 3 > Finishing: 3 > java.sql.SQLException: > org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation > cancelled > at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121) > at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109) > at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231) > at > org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451) > at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:195) > at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) > at java.sql.DriverManager.getConnection(DriverManager.java:664) > at java.sql.DriverManager.getConnection(DriverManager.java:270) > at > com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896) > at > com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893)
[jira] [Commented] (SPARK-5196) Add comment field in StructField
[ https://issues.apache.org/jira/browse/SPARK-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273251#comment-14273251 ] Apache Spark commented on SPARK-5196: - User 'OopsOutOfMemory' has created a pull request for this issue: https://github.com/apache/spark/pull/3999 > Add comment field in StructField > > > Key: SPARK-5196 > URL: https://issues.apache.org/jira/browse/SPARK-5196 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: shengli > Fix For: 1.3.0 > > > StructField should contains name, type, nullable, comment etc... > Add support comment field in StructField. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5200) Disable web UI in Hive Thriftserver tests
[ https://issues.apache.org/jira/browse/SPARK-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273247#comment-14273247 ] Apache Spark commented on SPARK-5200: - User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/3998 > Disable web UI in Hive Thriftserver tests > - > > Key: SPARK-5200 > URL: https://issues.apache.org/jira/browse/SPARK-5200 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Josh Rosen >Assignee: Josh Rosen > Labels: flaky-test > > In our unit tests, we should disable the Spark Web UI when starting the Hive > Thriftserver, since port contention during this test has been a cause of test > failures on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5124) Standardize internal RPC interface
[ https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273246#comment-14273246 ] Reynold Xin commented on SPARK-5124: Thanks for the response. 1. Let's not rely on the property of local actor not passing messages through a socket for local actor speedup. Conceptually, there is no reason to tie local actor implementation to RPC. DAGScheduler's actor used to be a simple queue & event loop (before it was turned into an actor for no good reason). We can restore it to that. 2. Have you thought about how the fate sharing stuff would work with alternative RPC implementations? > Standardize internal RPC interface > -- > > Key: SPARK-5124 > URL: https://issues.apache.org/jira/browse/SPARK-5124 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Reynold Xin >Assignee: Shixiong Zhu > Attachments: Pluggable RPC - draft 1.pdf > > > In Spark we use Akka as the RPC layer. It would be great if we can > standardize the internal RPC interface to facilitate testing. This will also > provide the foundation to try other RPC implementations in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5200) Disable web UI in Hive Thriftserver tests
Josh Rosen created SPARK-5200: - Summary: Disable web UI in Hive Thriftserver tests Key: SPARK-5200 URL: https://issues.apache.org/jira/browse/SPARK-5200 Project: Spark Issue Type: Improvement Components: SQL Reporter: Josh Rosen Assignee: Josh Rosen In our unit tests, we should disable the Spark Web UI when starting the Hive Thriftserver, since port contention during this test has been a cause of test failures on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5095) Support launching multiple mesos executors in coarse grained mesos mode
[ https://issues.apache.org/jira/browse/SPARK-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273244#comment-14273244 ] Timothy Chen commented on SPARK-5095: - [~joshdevins] [~gmaas] indeed capping the cores is actually to fix 4940, and we can use that to address the number of executors. I'm trying not to have just a set of configurations that can achieve both, otherwise it becomes a lot harder to maintain. I'm working on the patch now and I'll add you both on github for review. > Support launching multiple mesos executors in coarse grained mesos mode > --- > > Key: SPARK-5095 > URL: https://issues.apache.org/jira/browse/SPARK-5095 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Timothy Chen > > Currently in coarse grained mesos mode, it's expected that we only launch one > Mesos executor that launches one JVM process to launch multiple spark > executors. > However, this become a problem when the JVM process launched is larger than > an ideal size (30gb is recommended value from databricks), which causes GC > problems reported on the mailing list. > We should support launching mulitple executors when large enough resources > are available for spark to use, and these resources are still under the > configured limit. > This is also applicable when users want to specifiy number of executors to be > launched on each node -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5018) Make MultivariateGaussian public
[ https://issues.apache.org/jira/browse/SPARK-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5018. -- Resolution: Fixed Fix Version/s: 1.3.0 Issue resolved by pull request 3923 [https://github.com/apache/spark/pull/3923] > Make MultivariateGaussian public > > > Key: SPARK-5018 > URL: https://issues.apache.org/jira/browse/SPARK-5018 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.2.0 >Reporter: Joseph K. Bradley >Assignee: Travis Galoppo >Priority: Critical > Fix For: 1.3.0 > > > MultivariateGaussian is currently private[ml], but it would be a useful > public class. This JIRA will require defining a good public API for > distributions. > This JIRA will be needed for finalizing the GaussianMixtureModel API, which > should expose MultivariateGaussian instances instead of the means and > covariances. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3561) Allow for pluggable execution contexts in Spark
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273225#comment-14273225 ] Patrick Wendell commented on SPARK-3561: So if the question is: "Is Spark only API or is it an integrated API/execution engine"... we've taken a fairly clear stance over the history of the project that it's an integrated engine. I.e. Spark is not something like Pig where it's intended primarily as a user API and we expect there to be different physical execution engines plugged in underneath. In the past we haven't found this prevents Spark from working well in different environments. For instance, with Mesos, on YARN, etc. And for this we've integrated at different layers such as the storage layer and the scheduling layer, where there were well defined API's and integration points in the broader ecosystem. Compared with alternatives Spark is far more flexible in terms of runtime environments. The RDD API is so generic that it's very easy to customize and integrate. For this reason, my feeling with decoupling execution from the rest of Spark is that it would tie our hands architecturally and not add much benefit. I don't see a good reason to make this broader change in the strategy of the project. If there are specific improvements you see for making Spark work well on YARN, then we can definitely look at them. > Allow for pluggable execution contexts in Spark > --- > > Key: SPARK-3561 > URL: https://issues.apache.org/jira/browse/SPARK-3561 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Oleg Zhurakousky > Labels: features > Attachments: SPARK-3561.pdf > > > Currently Spark provides integration with external resource-managers such as > Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the > current architecture of Spark-on-YARN can be enhanced to provide > significantly better utilization of cluster resources for large scale, batch > and/or ETL applications when run alongside other applications (Spark and > others) and services in YARN. > Proposal: > The proposed approach would introduce a pluggable JobExecutionContext (trait) > - a gateway and a delegate to Hadoop execution environment - as a non-public > api (@Experimental) not exposed to end users of Spark. > The trait will define 6 operations: > * hadoopFile > * newAPIHadoopFile > * broadcast > * runJob > * persist > * unpersist > Each method directly maps to the corresponding methods in current version of > SparkContext. JobExecutionContext implementation will be accessed by > SparkContext via master URL as > "execution-context:foo.bar.MyJobExecutionContext" with default implementation > containing the existing code from SparkContext, thus allowing current > (corresponding) methods of SparkContext to delegate to such implementation. > An integrator will now have an option to provide custom implementation of > DefaultExecutionContext by either implementing it from scratch or extending > form DefaultExecutionContext. > Please see the attached design doc for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5186) Vector.equals and Vector.hashCode are very inefficient
[ https://issues.apache.org/jira/browse/SPARK-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273224#comment-14273224 ] Apache Spark commented on SPARK-5186: - User 'hhbyyh' has created a pull request for this issue: https://github.com/apache/spark/pull/3997 > Vector.equals and Vector.hashCode are very inefficient > --- > > Key: SPARK-5186 > URL: https://issues.apache.org/jira/browse/SPARK-5186 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 1.2.0 >Reporter: Derrick Burns > Original Estimate: 0.25h > Remaining Estimate: 0.25h > > The implementation of Vector.equals and Vector.hashCode are correct but slow > for SparseVectors that are truly sparse. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4924) Factor out code to launch Spark applications into a separate library
[ https://issues.apache.org/jira/browse/SPARK-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4924: -- Assignee: Marcelo Vanzin > Factor out code to launch Spark applications into a separate library > > > Key: SPARK-4924 > URL: https://issues.apache.org/jira/browse/SPARK-4924 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Marcelo Vanzin >Assignee: Marcelo Vanzin > Attachments: spark-launcher.txt > > > One of the questions we run into rather commonly is "how to start a Spark > application from my Java/Scala program?". There currently isn't a good answer > to that: > - Instantiating SparkContext has limitations (e.g., you can only have one > active context at the moment, plus you lose the ability to submit apps in > cluster mode) > - Calling SparkSubmit directly is doable but you lose a lot of the logic > handled by the shell scripts > - Calling the shell script directly is doable, but sort of ugly from an API > point of view. > I think it would be nice to have a small library that handles that for users. > On top of that, this library could be used by Spark itself to replace a lot > of the code in the current shell scripts, which have a lot of duplication. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5088) Use spark-class for running executors directly on mesos
[ https://issues.apache.org/jira/browse/SPARK-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5088: Fix Version/s: 1.2.1 1.3.0 > Use spark-class for running executors directly on mesos > --- > > Key: SPARK-5088 > URL: https://issues.apache.org/jira/browse/SPARK-5088 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos >Affects Versions: 1.2.0 >Reporter: Jongyoul Lee >Priority: Minor > Fix For: 1.3.0, 1.2.1 > > > - sbin/spark-executor is only used by running executor on mesos environment. > - spark-executor calls spark-class without specific parameter internally. > - PYTHONPATH is moved to spark-class' case. > - Remove a redundant file for maintaining codes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster
[ https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5197: Target Version/s: 1.3.0 (was: 1.3.0, 1.2.1) > Support external shuffle service in fine-grained mode on mesos cluster > -- > > Key: SPARK-5197 > URL: https://issues.apache.org/jira/browse/SPARK-5197 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos, Shuffle >Reporter: Jongyoul Lee > Fix For: 1.3.0 > > > I think dynamic allocation is almost satisfied on mesos' fine-grained mode, > which already offers resources dynamically, and returns automatically when a > task is finished. It, however, doesn't have a mechanism on support external > shuffle service like yarn's way, which is AuxiliaryService. Because mesos > doesn't support AusiliaryService, we think a different way to do this. > - Launching a shuffle service like a spark job on same cluster > -- Pros > --- Support multi-tenant environment > --- Almost same way like yarn > -- Cons > --- Control long running 'background' job - service - when mesos runs > --- Satisfy all slave - or host - to have one shuffle service all the time > - Launching jobs within shuffle service > -- Pros > --- Easy to implement > --- Don't consider whether shuffle service exists or not. > -- Cons > --- exists multiple shuffle services under multi-tenant environment > --- Control shuffle service port dynamically on multi-user environment > In my opinion, the first one is better idea to support external shuffle > service. Please leave comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster
[ https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5197: Fix Version/s: 1.3.0 > Support external shuffle service in fine-grained mode on mesos cluster > -- > > Key: SPARK-5197 > URL: https://issues.apache.org/jira/browse/SPARK-5197 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos, Shuffle >Reporter: Jongyoul Lee > Fix For: 1.3.0 > > > I think dynamic allocation is almost satisfied on mesos' fine-grained mode, > which already offers resources dynamically, and returns automatically when a > task is finished. It, however, doesn't have a mechanism on support external > shuffle service like yarn's way, which is AuxiliaryService. Because mesos > doesn't support AusiliaryService, we think a different way to do this. > - Launching a shuffle service like a spark job on same cluster > -- Pros > --- Support multi-tenant environment > --- Almost same way like yarn > -- Cons > --- Control long running 'background' job - service - when mesos runs > --- Satisfy all slave - or host - to have one shuffle service all the time > - Launching jobs within shuffle service > -- Pros > --- Easy to implement > --- Don't consider whether shuffle service exists or not. > -- Cons > --- exists multiple shuffle services under multi-tenant environment > --- Control shuffle service port dynamically on multi-user environment > In my opinion, the first one is better idea to support external shuffle > service. Please leave comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5166) Stabilize Spark SQL APIs
[ https://issues.apache.org/jira/browse/SPARK-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-5166: --- Priority: Blocker (was: Critical) > Stabilize Spark SQL APIs > > > Key: SPARK-5166 > URL: https://issues.apache.org/jira/browse/SPARK-5166 > Project: Spark > Issue Type: Task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Blocker > > Before we take Spark SQL out of alpha, we need to audit the APIs and > stabilize them. > As a general rule, everything under org.apache.spark.sql.catalyst should not > be exposed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3340) Deprecate ADD_JARS and ADD_FILES
[ https://issues.apache.org/jira/browse/SPARK-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3340: --- Labels: starter (was: ) > Deprecate ADD_JARS and ADD_FILES > > > Key: SPARK-3340 > URL: https://issues.apache.org/jira/browse/SPARK-3340 > Project: Spark > Issue Type: Improvement > Components: PySpark, Spark Core >Affects Versions: 1.1.0 >Reporter: Andrew Or > Labels: starter > > These were introduced before Spark submit even existed. Now that there are > many better ways of setting jars and python files through Spark submit, we > should deprecate these environment variables. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3450) Enable specifiying the --jars CLI option multiple times
[ https://issues.apache.org/jira/browse/SPARK-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3450. Resolution: Won't Fix I'd prefer not to do this one, it complicates our parsing substantially. It's possible to just write a bash loop that creates a single long list of jars. > Enable specifiying the --jars CLI option multiple times > --- > > Key: SPARK-3450 > URL: https://issues.apache.org/jira/browse/SPARK-3450 > Project: Spark > Issue Type: Improvement > Components: Deploy >Affects Versions: 1.0.2 >Reporter: wolfgang hoschek > > spark-submit should support specifiying the --jars option multiple time, e.g. > --jars foo.jar,bar.jar --jars baz.jar,oops.jar should be equivalent to --jars > foo.jar,bar.jar,baz.jar,oops.jar > This would allow using wrapper scripts that simplify usage for enterprise > customers along the following lines: > {code} > my-spark-submit.sh: > jars= > for i in /opt/myapp/*.jar; do > if [ $i -gt 0] > then > jars="$jars," > fi > jars="$jars$i" > done > spark-submit --jars "$jars" "$@" > {code} > Example usage: > {code} > my-spark-submit.sh --jars myUserDefinedFunction.jar > {code} > The relevant enhancement code might go into SparkSubmitArguments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4399) Support multiple cloud providers
[ https://issues.apache.org/jira/browse/SPARK-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4399. Resolution: Won't Fix We'll let the community take this one on. > Support multiple cloud providers > > > Key: SPARK-4399 > URL: https://issues.apache.org/jira/browse/SPARK-4399 > Project: Spark > Issue Type: New Feature > Components: EC2 >Affects Versions: 1.2.0 >Reporter: Andrew Ash > > We currently have Spark startup scripts for Amazon EC2 but not for various > other cloud providers. This ticket is an umbrella to support multiple cloud > providers in the bundled scripts, not just Amazon. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1422) Add scripts for launching Spark on Google Compute Engine
[ https://issues.apache.org/jira/browse/SPARK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273197#comment-14273197 ] Nicholas Chammas commented on SPARK-1422: - [~pwendell] - I would consider doing this as well for the parent task, [SPARK-4399]. > Add scripts for launching Spark on Google Compute Engine > > > Key: SPARK-1422 > URL: https://issues.apache.org/jira/browse/SPARK-1422 > Project: Spark > Issue Type: Improvement > Components: EC2 >Reporter: Matei Zaharia > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause
[ https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273196#comment-14273196 ] Yin Huai commented on SPARK-4296: - I was wondering if we can also find this issue at other places. Maybe we can resolve this issue thoroughly. > Throw "Expression not in GROUP BY" when using same expression in group by > clause and select clause > --- > > Key: SPARK-4296 > URL: https://issues.apache.org/jira/browse/SPARK-4296 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Shixiong Zhu >Assignee: Cheng Lian >Priority: Blocker > > When the input data has a complex structure, using same expression in group > by clause and select clause will throw "Expression not in GROUP BY". > {code:java} > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.createSchemaRDD > case class Birthday(date: String) > case class Person(name: String, birthday: Birthday) > val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), > Person("Jim", Birthday("1980-02-28" > people.registerTempTable("people") > val year = sqlContext.sql("select count(*), upper(birthday.date) from people > group by upper(birthday.date)") > year.collect > {code} > Here is the plan of year: > {code:java} > SchemaRDD[3] at RDD at SchemaRDD.scala:105 > == Query Plan == > == Physical Plan == > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression > not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree: > Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date > AS date#9) AS c1#3] > Subquery people > LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at > ExistingRDD.scala:36 > {code} > The bug is the equality test for `Upper(birthday#1.date)` and > `Upper(birthday#1.date AS date#9)`. > Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias > expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause
[ https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-4296: Target Version/s: 1.3.0, 1.2.1 (was: 1.2.0) Affects Version/s: 1.1.1 1.2.0 Fix Version/s: (was: 1.2.0) > Throw "Expression not in GROUP BY" when using same expression in group by > clause and select clause > --- > > Key: SPARK-4296 > URL: https://issues.apache.org/jira/browse/SPARK-4296 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0, 1.1.1, 1.2.0 >Reporter: Shixiong Zhu >Assignee: Cheng Lian >Priority: Blocker > > When the input data has a complex structure, using same expression in group > by clause and select clause will throw "Expression not in GROUP BY". > {code:java} > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.createSchemaRDD > case class Birthday(date: String) > case class Person(name: String, birthday: Birthday) > val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), > Person("Jim", Birthday("1980-02-28" > people.registerTempTable("people") > val year = sqlContext.sql("select count(*), upper(birthday.date) from people > group by upper(birthday.date)") > year.collect > {code} > Here is the plan of year: > {code:java} > SchemaRDD[3] at RDD at SchemaRDD.scala:105 > == Query Plan == > == Physical Plan == > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression > not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree: > Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date > AS date#9) AS c1#3] > Subquery people > LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at > ExistingRDD.scala:36 > {code} > The bug is the equality test for `Upper(birthday#1.date)` and > `Upper(birthday#1.date AS date#9)`. > Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias > expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause
[ https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273194#comment-14273194 ] Yin Huai commented on SPARK-4296: - [~lian cheng] Seems this issues is similar with [this one|https://issues.apache.org/jira/browse/SPARK-2063?focusedCommentId=14055193&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14055193]. The main problem is that we use the last part of a reference of a field in a struct as the alias. Is it possible that we can fix that one as well? > Throw "Expression not in GROUP BY" when using same expression in group by > clause and select clause > --- > > Key: SPARK-4296 > URL: https://issues.apache.org/jira/browse/SPARK-4296 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Shixiong Zhu >Assignee: Cheng Lian >Priority: Blocker > Fix For: 1.2.0 > > > When the input data has a complex structure, using same expression in group > by clause and select clause will throw "Expression not in GROUP BY". > {code:java} > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > import sqlContext.createSchemaRDD > case class Birthday(date: String) > case class Person(name: String, birthday: Birthday) > val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), > Person("Jim", Birthday("1980-02-28" > people.registerTempTable("people") > val year = sqlContext.sql("select count(*), upper(birthday.date) from people > group by upper(birthday.date)") > year.collect > {code} > Here is the plan of year: > {code:java} > SchemaRDD[3] at RDD at SchemaRDD.scala:105 > == Query Plan == > == Physical Plan == > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression > not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree: > Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date > AS date#9) AS c1#3] > Subquery people > LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at > ExistingRDD.scala:36 > {code} > The bug is the equality test for `Upper(birthday#1.date)` and > `Upper(birthday#1.date AS date#9)`. > Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias > expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2621) Update task InputMetrics incrementally
[ https://issues.apache.org/jira/browse/SPARK-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273192#comment-14273192 ] Sandy Ryza commented on SPARK-2621: --- Definitely - just filed SPARK-5199 for this. > Update task InputMetrics incrementally > -- > > Key: SPARK-2621 > URL: https://issues.apache.org/jira/browse/SPARK-2621 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5199) Input metrics should show up for InputFormats that return CombineFileSplits
Sandy Ryza created SPARK-5199: - Summary: Input metrics should show up for InputFormats that return CombineFileSplits Key: SPARK-5199 URL: https://issues.apache.org/jira/browse/SPARK-5199 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Sandy Ryza Assignee: Sandy Ryza -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273187#comment-14273187 ] Nicholas Chammas commented on SPARK-3821: - Updated launch stats: * Launching cluster with 50 slaves in {{us-east-1}}. * Stats for best of 3 runs. {{branch-1.3}} @ [{{3a95101}}|https://github.com/mesos/spark-ec2/tree/3a95101c70e6892a8a48cc54094adaed1458487a]: {code} Cluster is now in 'ssh-ready' state. Waited 460 seconds. [timing] rsync /root/spark-ec2: 00h 00m 07s [timing] setup-slave: 00h 00m 28s [timing] scala init: 00h 00m 11s [timing] spark init: 00h 00m 07s [timing] ephemeral-hdfs init: 00h 12m 40s [timing] persistent-hdfs init: 00h 12m 35s [timing] spark-standalone init: 00h 00m 00s [timing] tachyon init: 00h 00m 08s [timing] ganglia init: 00h 00m 53s [timing] scala setup: 00h 03m 11s [timing] spark setup: 00h 21m 20s [timing] ephemeral-hdfs setup: 00h 00m 48s [timing] persistent-hdfs setup: 00h 00m 43s [timing] spark-standalone setup: 00h 01m 19s [timing] tachyon setup: 00h 03m 06s [timing] ganglia setup: 00h 00m 32s {code} {{packer}} @ [{{273c8c5}}|https://github.com/nchammas/spark-ec2/tree/273c8c518fbc6e86e0fb4410efbe77a4d4e4ff5b]: {code} Cluster is now in 'ssh-ready' state. Waited 292 seconds. [timing] rsync /root/spark-ec2: 00h 00m 20s [timing] setup-slave: 00h 00m 19s [timing] scala init: 00h 00m 12s [timing] spark init: 00h 00m 08s [timing] ephemeral-hdfs init: 00h 12m 58s [timing] persistent-hdfs init: 00h 12m 55s [timing] spark-standalone init: 00h 00m 00s [timing] tachyon init: 00h 00m 10s [timing] ganglia init: 00h 00m 15s [timing] scala setup: 00h 03m 19s [timing] spark setup: 00h 20m 32s [timing] ephemeral-hdfs setup: 00h 00m 34s [timing] persistent-hdfs setup: 00h 00m 27s [timing] spark-standalone setup: 00h 00m 47s [timing] tachyon setup: 00h 03m 15s [timing] ganglia setup: 00h 00m 23s {code} As you can see, with the exception of time-to-SSH-availability, things are mostly the same across the current and Packer-generated AMIs. I've proposed improvements to cut down the launch times of large clusters in [a separate issue|SPARK-5189]. [~shivaram] - At this point I think it's safe to say that the approach proposed here is straightforward and worth pursuing. All we need now is a review of [the scripts that install various stuff|https://github.com/nchammas/spark-ec2/blob/273c8c518fbc6e86e0fb4410efbe77a4d4e4ff5b/packer/spark-packer.json#L63-L66] (e.g. Ganglia, Python 2.7, etc.) on the AMI to make sure it all makes sense. > Develop an automated way of creating Spark images (AMI, Docker, and others) > --- > > Key: SPARK-3821 > URL: https://issues.apache.org/jira/browse/SPARK-3821 > Project: Spark > Issue Type: Improvement > Components: Build, EC2 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas > Attachments: packer-proposal.html > > > Right now the creation of Spark AMIs or Docker containers is done manually. > With tools like [Packer|http://www.packer.io/], we should be able to automate > this work, and do so in such a way that multiple types of machine images can > be created from a single template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1422) Add scripts for launching Spark on Google Compute Engine
[ https://issues.apache.org/jira/browse/SPARK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273178#comment-14273178 ] Patrick Wendell commented on SPARK-1422: Good call NIck - yeah let's close this as being out of scope since it's being maintained elsewhere. > Add scripts for launching Spark on Google Compute Engine > > > Key: SPARK-1422 > URL: https://issues.apache.org/jira/browse/SPARK-1422 > Project: Spark > Issue Type: Improvement > Components: EC2 >Reporter: Matei Zaharia > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1422) Add scripts for launching Spark on Google Compute Engine
[ https://issues.apache.org/jira/browse/SPARK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1422. Resolution: Won't Fix > Add scripts for launching Spark on Google Compute Engine > > > Key: SPARK-1422 > URL: https://issues.apache.org/jira/browse/SPARK-1422 > Project: Spark > Issue Type: Improvement > Components: EC2 >Reporter: Matei Zaharia > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-5198) Change executorId more unique on mesos fine-grained mode
[ https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273169#comment-14273169 ] Jongyoul Lee edited comment on SPARK-5198 at 1/12/15 2:38 AM: -- Uploaded example screenshots was (Author: jongyoul): Example screenshots > Change executorId more unique on mesos fine-grained mode > > > Key: SPARK-5198 > URL: https://issues.apache.org/jira/browse/SPARK-5198 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Jongyoul Lee > Fix For: 1.3.0, 1.2.1 > > Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot > 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png > > > In fine-grained mode, SchedulerBackend set executor name as same as slave id > with any task id. It's not good to track aspecific job because of logging a > different in a same log file. This is a same value while launching job on > coarse-grained mode. > !Screen Shot 2015-01-12 at 11.14.39 AM.png! > !Screen Shot 2015-01-12 at 11.34.30 AM.png! > !Screen Shot 2015-01-12 at 11.34.41 AM.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode
[ https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5198: Description: In fine-grained mode, SchedulerBackend set executor name as same as slave id with any task id. It's not good to track aspecific job because of logging a different in a same log file. This is a same value while launching job on coarse-grained mode. !Screen Shot 2015-01-12 at 11.14.39 AM.png! !Screen Shot 2015-01-12 at 11.34.30 AM.png! !Screen Shot 2015-01-12 at 11.34.41 AM.png! was: In fine-grained mode, SchedulerBackend set executor name as same as slave id with any task id. It's not good to track aspecific job because of logging a different in a same log file. This is a same value while launching job on coarse-grained mode. !Screen Shot 2015-01-12 at 11.14.39 AM.png! ! > Change executorId more unique on mesos fine-grained mode > > > Key: SPARK-5198 > URL: https://issues.apache.org/jira/browse/SPARK-5198 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Jongyoul Lee > Fix For: 1.3.0, 1.2.1 > > Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot > 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png > > > In fine-grained mode, SchedulerBackend set executor name as same as slave id > with any task id. It's not good to track aspecific job because of logging a > different in a same log file. This is a same value while launching job on > coarse-grained mode. > !Screen Shot 2015-01-12 at 11.14.39 AM.png! > !Screen Shot 2015-01-12 at 11.34.30 AM.png! > !Screen Shot 2015-01-12 at 11.34.41 AM.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode
[ https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5198: Description: In fine-grained mode, SchedulerBackend set executor name as same as slave id with any task id. It's not good to track aspecific job because of logging a different in a same log file. This is a same value while launching job on coarse-grained mode. !Screen Shot 2015-01-12 at 11.14.39 AM.png! ! was: In fine-grained mode, SchedulerBackend set executor name as same as slave id with any task id. It's not good to track aspecific job because of logging a different in a same log file. This is a same value while launching job on coarse-grained mode. [ > Change executorId more unique on mesos fine-grained mode > > > Key: SPARK-5198 > URL: https://issues.apache.org/jira/browse/SPARK-5198 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Jongyoul Lee > Fix For: 1.3.0, 1.2.1 > > Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot > 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png > > > In fine-grained mode, SchedulerBackend set executor name as same as slave id > with any task id. It's not good to track aspecific job because of logging a > different in a same log file. This is a same value while launching job on > coarse-grained mode. > !Screen Shot 2015-01-12 at 11.14.39 AM.png! > ! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode
[ https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5198: Description: In fine-grained mode, SchedulerBackend set executor name as same as slave id with any task id. It's not good to track aspecific job because of logging a different in a same log file. This is a same value while launching job on coarse-grained mode. [ was:In fine-grained mode, SchedulerBackend set executor name as same as slave id with any task id. It's not good to track aspecific job because of logging a different in a same log file. This is a same value while launching job on coarse-grained mode. > Change executorId more unique on mesos fine-grained mode > > > Key: SPARK-5198 > URL: https://issues.apache.org/jira/browse/SPARK-5198 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Jongyoul Lee > Fix For: 1.3.0, 1.2.1 > > Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot > 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png > > > In fine-grained mode, SchedulerBackend set executor name as same as slave id > with any task id. It's not good to track aspecific job because of logging a > different in a same log file. This is a same value while launching job on > coarse-grained mode. > [ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode
[ https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5198: Attachment: Screen Shot 2015-01-12 at 11.34.41 AM.png Screen Shot 2015-01-12 at 11.34.30 AM.png Screen Shot 2015-01-12 at 11.14.39 AM.png Example screenshots > Change executorId more unique on mesos fine-grained mode > > > Key: SPARK-5198 > URL: https://issues.apache.org/jira/browse/SPARK-5198 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Jongyoul Lee > Fix For: 1.3.0, 1.2.1 > > Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot > 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png > > > In fine-grained mode, SchedulerBackend set executor name as same as slave id > with any task id. It's not good to track aspecific job because of logging a > different in a same log file. This is a same value while launching job on > coarse-grained mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5198) Change executorId more unique on mesos fine-grained mode
[ https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273165#comment-14273165 ] Apache Spark commented on SPARK-5198: - User 'jongyoul' has created a pull request for this issue: https://github.com/apache/spark/pull/3994 > Change executorId more unique on mesos fine-grained mode > > > Key: SPARK-5198 > URL: https://issues.apache.org/jira/browse/SPARK-5198 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Jongyoul Lee > Fix For: 1.3.0, 1.2.1 > > > In fine-grained mode, SchedulerBackend set executor name as same as slave id > with any task id. It's not good to track aspecific job because of logging a > different in a same log file. This is a same value while launching job on > coarse-grained mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode
[ https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5198: Description: In fine-grained mode, SchedulerBackend set executor name as same as slave id with any task id. It's not good to track aspecific job because of logging a different in a same log file. This is a same value while launching job on coarse-grained mode. (was: In fine-grained mode, SchedulerBackend set executor name as same as slave id with any task id. It's not good to track aspecific job because of logging a different in a same log file.) > Change executorId more unique on mesos fine-grained mode > > > Key: SPARK-5198 > URL: https://issues.apache.org/jira/browse/SPARK-5198 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Jongyoul Lee > Fix For: 1.3.0, 1.2.1 > > > In fine-grained mode, SchedulerBackend set executor name as same as slave id > with any task id. It's not good to track aspecific job because of logging a > different in a same log file. This is a same value while launching job on > coarse-grained mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-4689) Unioning 2 SchemaRDDs should return a SchemaRDD in Python, Scala, and Java
[ https://issues.apache.org/jira/browse/SPARK-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268720#comment-14268720 ] Bibudh Lahiri edited comment on SPARK-4689 at 1/12/15 2:13 AM: --- I'd like to work on this issue, but would need some details. I looked into ./sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala where the unionAll method is defined as def unionAll(otherPlan: SchemaRDD) = new SchemaRDD(sqlContext, Union(logicalPlan, otherPlan.logicalPlan)) There is no implementation of union() in SchemaRDD itself and and the API says it is inherited from RDD. I took two different SchemaRDD objects and applied union on them (it is in my fork at https://github.com/bibudhlahiri/spark/blob/master/dev/audit-release/sbt_app_schema_rdd/src/main/scala/SchemaRDDApp.scala ) , and the resultant object is of class UnionRDD. I am thinking of overriding union() in SchemaRDD to return a SchemaRDD, please let me know what you think. was (Author: bibudh): I'd like to work on this issue, but would need some details. I looked into ./sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala where the unionAll method is defined as def unionAll(otherPlan: SchemaRDD) = new SchemaRDD(sqlContext, Union(logicalPlan, otherPlan.logicalPlan)) Are we looking for an implementation of union here (keeping duplicates only once), in addition to unionAll (keeping duplicates both the times)? > Unioning 2 SchemaRDDs should return a SchemaRDD in Python, Scala, and Java > -- > > Key: SPARK-4689 > URL: https://issues.apache.org/jira/browse/SPARK-4689 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.1.0 >Reporter: Chris Fregly >Priority: Minor > Labels: starter > > Currently, you need to use unionAll() in Scala. > Python does not expose this functionality at the moment. > The current work around is to use the UNION ALL HiveQL functionality detailed > here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode
[ https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5198: Component/s: Mesos > Change executorId more unique on mesos fine-grained mode > > > Key: SPARK-5198 > URL: https://issues.apache.org/jira/browse/SPARK-5198 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Jongyoul Lee > Fix For: 1.3.0, 1.2.1 > > > In fine-grained mode, SchedulerBackend set executor name as same as slave id > with any task id. It's not good to track aspecific job because of logging a > different in a same log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode
[ https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5198: Fix Version/s: 1.2.1 1.3.0 > Change executorId more unique on mesos fine-grained mode > > > Key: SPARK-5198 > URL: https://issues.apache.org/jira/browse/SPARK-5198 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Jongyoul Lee > Fix For: 1.3.0, 1.2.1 > > > In fine-grained mode, SchedulerBackend set executor name as same as slave id > with any task id. It's not good to track aspecific job because of logging a > different in a same log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5198) Change executorId more unique on mesos fine-grained mode
Jongyoul Lee created SPARK-5198: --- Summary: Change executorId more unique on mesos fine-grained mode Key: SPARK-5198 URL: https://issues.apache.org/jira/browse/SPARK-5198 Project: Spark Issue Type: Improvement Reporter: Jongyoul Lee In fine-grained mode, SchedulerBackend set executor name as same as slave id with any task id. It's not good to track aspecific job because of logging a different in a same log file. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster
[ https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273137#comment-14273137 ] Jongyoul Lee commented on SPARK-5197: - Please, assign it to me. [~andrewor14] [~adav] Please review my description > Support external shuffle service in fine-grained mode on mesos cluster > -- > > Key: SPARK-5197 > URL: https://issues.apache.org/jira/browse/SPARK-5197 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos, Shuffle >Reporter: Jongyoul Lee > > I think dynamic allocation is almost satisfied on mesos' fine-grained mode, > which already offers resources dynamically, and returns automatically when a > task is finished. It, however, doesn't have a mechanism on support external > shuffle service like yarn's way, which is AuxiliaryService. Because mesos > doesn't support AusiliaryService, we think a different way to do this. > - Launching a shuffle service like a spark job on same cluster > -- Pros > --- Support multi-tenant environment > --- Almost same way like yarn > -- Cons > --- Control long running 'background' job - service - when mesos runs > --- Satisfy all slave - or host - to have one shuffle service all the time > - Launching jobs within shuffle service > -- Pros > --- Easy to implement > --- Don't consider whether shuffle service exists or not. > -- Cons > --- exists multiple shuffle services under multi-tenant environment > --- Control shuffle service port dynamically on multi-user environment > In my opinion, the first one is better idea to support external shuffle > service. Please leave comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster
[ https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jongyoul Lee updated SPARK-5197: Description: I think dynamic allocation is almost satisfied on mesos' fine-grained mode, which already offers resources dynamically, and returns automatically when a task is finished. It, however, doesn't have a mechanism on support external shuffle service like yarn's way, which is AuxiliaryService. Because mesos doesn't support AusiliaryService, we think a different way to do this. - Launching a shuffle service like a spark job on same cluster -- Pros --- Support multi-tenant environment --- Almost same way like yarn -- Cons --- Control long running 'background' job - service - when mesos runs --- Satisfy all slave - or host - to have one shuffle service all the time - Launching jobs within shuffle service -- Pros --- Easy to implement --- Don't consider whether shuffle service exists or not. -- Cons --- exists multiple shuffle services under multi-tenant environment --- Control shuffle service port dynamically on multi-user environment In my opinion, the first one is better idea to support external shuffle service. Please leave comments. was: I think dynamic allocation is almost satisfied on mesos' fine-grained mode, which already offers resources dynamically, and returns automatically when a task is finished. We, however, don't have a mechanism on support external shuffle service like yarn's way, which is AuxiliaryService. Because mesos doesn't support AusiliaryService, we think a different way to do this. - Launching a shuffle service like a spark job on same cluster -- Pros --- Support multi-tenant environment --- Almost same way like yarn -- Cons --- Control long running 'background' job - service - when mesos runs --- Satisfy all slave - or host - to have one shuffle service all the time - Launching jobs within shuffle service -- Pros --- Easy to implement --- Don't consider whether shuffle service exists or not. -- Cons --- exists multiple shuffle services under multi-tenant environment --- Control shuffle service port dynamically on multi-user environment In my opinion, the first one is better idea to support external shuffle service. Please leave comments. > Support external shuffle service in fine-grained mode on mesos cluster > -- > > Key: SPARK-5197 > URL: https://issues.apache.org/jira/browse/SPARK-5197 > Project: Spark > Issue Type: Improvement > Components: Deploy, Mesos, Shuffle >Reporter: Jongyoul Lee > > I think dynamic allocation is almost satisfied on mesos' fine-grained mode, > which already offers resources dynamically, and returns automatically when a > task is finished. It, however, doesn't have a mechanism on support external > shuffle service like yarn's way, which is AuxiliaryService. Because mesos > doesn't support AusiliaryService, we think a different way to do this. > - Launching a shuffle service like a spark job on same cluster > -- Pros > --- Support multi-tenant environment > --- Almost same way like yarn > -- Cons > --- Control long running 'background' job - service - when mesos runs > --- Satisfy all slave - or host - to have one shuffle service all the time > - Launching jobs within shuffle service > -- Pros > --- Easy to implement > --- Don't consider whether shuffle service exists or not. > -- Cons > --- exists multiple shuffle services under multi-tenant environment > --- Control shuffle service port dynamically on multi-user environment > In my opinion, the first one is better idea to support external shuffle > service. Please leave comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster
Jongyoul Lee created SPARK-5197: --- Summary: Support external shuffle service in fine-grained mode on mesos cluster Key: SPARK-5197 URL: https://issues.apache.org/jira/browse/SPARK-5197 Project: Spark Issue Type: Improvement Components: Deploy, Mesos, Shuffle Reporter: Jongyoul Lee I think dynamic allocation is almost satisfied on mesos' fine-grained mode, which already offers resources dynamically, and returns automatically when a task is finished. We, however, don't have a mechanism on support external shuffle service like yarn's way, which is AuxiliaryService. Because mesos doesn't support AusiliaryService, we think a different way to do this. - Launching a shuffle service like a spark job on same cluster -- Pros --- Support multi-tenant environment --- Almost same way like yarn -- Cons --- Control long running 'background' job - service - when mesos runs --- Satisfy all slave - or host - to have one shuffle service all the time - Launching jobs within shuffle service -- Pros --- Easy to implement --- Don't consider whether shuffle service exists or not. -- Cons --- exists multiple shuffle services under multi-tenant environment --- Control shuffle service port dynamically on multi-user environment In my opinion, the first one is better idea to support external shuffle service. Please leave comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4033) Integer overflow when SparkPi is called with more than 25000 slices
[ https://issues.apache.org/jira/browse/SPARK-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4033. Resolution: Fixed Fix Version/s: 1.3.0 Assignee: SaintBacchus Target Version/s: 1.3.0 > Integer overflow when SparkPi is called with more than 25000 slices > --- > > Key: SPARK-4033 > URL: https://issues.apache.org/jira/browse/SPARK-4033 > Project: Spark > Issue Type: Bug > Components: Examples >Affects Versions: 1.2.0 >Reporter: SaintBacchus >Assignee: SaintBacchus > Fix For: 1.3.0 > > > If input of the SparkPi args is larger than the 25000, the integer 'n' inside > the code will be overflow, and may be a negative number. > And it causes the (0 until n) Seq as an empty seq, then doing the action > 'reduce' will throw the UnsupportedOperationException("empty collection"). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-4951) A busy executor may be killed when dynamicAllocation is enabled
[ https://issues.apache.org/jira/browse/SPARK-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or closed SPARK-4951. Resolution: Fixed Fix Version/s: 1.2.1 1.3.0 Target Version/s: 1.3.0, 1.2.1 > A busy executor may be killed when dynamicAllocation is enabled > --- > > Key: SPARK-4951 > URL: https://issues.apache.org/jira/browse/SPARK-4951 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 1.3.0, 1.2.1 > > > If a task runs more than `spark.dynamicAllocation.executorIdleTimeout`, the > executor which runs this task will be killed. > The following steps (yarn-client mode) can reproduce this bug: > 1. Start `spark-shell` using > {code} > ./bin/spark-shell --conf "spark.shuffle.service.enabled=true" \ > --conf "spark.dynamicAllocation.minExecutors=1" \ > --conf "spark.dynamicAllocation.maxExecutors=4" \ > --conf "spark.dynamicAllocation.enabled=true" \ > --conf "spark.dynamicAllocation.executorIdleTimeout=30" \ > --master yarn-client \ > --driver-memory 512m \ > --executor-memory 512m \ > --executor-cores 1 > {code} > 2. Wait more than 30 seconds until there is only one executor. > 3. Run the following code (a task needs at least 50 seconds to finish) > {code} > val r = sc.parallelize(1 to 1000, 20).map{t => Thread.sleep(1000); > t}.groupBy(_ % 2).collect() > {code} > 4. Executors will be killed and allocated all the time, which makes the Job > fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-5073) "spark.storage.memoryMapThreshold" has two default values
[ https://issues.apache.org/jira/browse/SPARK-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Davidson resolved SPARK-5073. --- Resolution: Fixed > "spark.storage.memoryMapThreshold" has two default values > - > > Key: SPARK-5073 > URL: https://issues.apache.org/jira/browse/SPARK-5073 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Jianhui Yuan >Priority: Minor > > In org.apache.spark.storage.DiskStore: > val minMemoryMapBytes = > blockManager.conf.getLong("spark.storage.memoryMapThreshold", 2 * 4096L) > In org.apache.spark.network.util.TransportConf: > public int memoryMapBytes() { > return conf.getInt("spark.storage.memoryMapThreshold", 2 * 1024 * > 1024); > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4159) Maven build doesn't run JUnit test suites
[ https://issues.apache.org/jira/browse/SPARK-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273062#comment-14273062 ] Apache Spark commented on SPARK-4159: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/3993 > Maven build doesn't run JUnit test suites > - > > Key: SPARK-4159 > URL: https://issues.apache.org/jira/browse/SPARK-4159 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Patrick Wendell >Assignee: Sean Owen >Priority: Critical > Labels: backport-needed > Fix For: 1.3.0 > > > It turns out our Maven build isn't running any Java test suites, and likely > hasn't ever. > After some fishing I believe the following is the issue. We use scalatest [1] > in our maven build which, by default can't automatically detect JUnit tests. > Scalatest will allow you to enumerate a list of suites via "JUnitClasses", > but I cant' find a way for it to auto-detect all JUnit tests. It turns out > this works in SBT because of our use of the junit-interface[2] which does > this for you. > An okay fix for this might be to simply enable the normal (surefire) maven > tests in addition to our scalatest in the maven build. The only thing to > watch out for is that they don't overlap in some way. We'd also have to copy > over environment variables, memory settings, etc to that plugin. > [1] http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin > [2] https://github.com/sbt/junit-interface -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5172) spark-examples-***.jar shades a wrong Hadoop distribution
[ https://issues.apache.org/jira/browse/SPARK-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273041#comment-14273041 ] Apache Spark commented on SPARK-5172: - User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/3992 > spark-examples-***.jar shades a wrong Hadoop distribution > - > > Key: SPARK-5172 > URL: https://issues.apache.org/jira/browse/SPARK-5172 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Shixiong Zhu >Priority: Minor > > Steps to check it: > 1. Download "spark-1.2.0-bin-hadoop2.4.tgz" from > http://www.apache.org/dyn/closer.cgi/spark/spark-1.2.0/spark-1.2.0-bin-hadoop2.4.tgz > 2. unzip `spark-examples-1.2.0-hadoop2.4.0.jar`. > 3. There is a file called `org/apache/hadoop/package-info.class` in the jar. > It doesn't exist in hadoop 2.4. > 4. Run "javap -classpath . -private -c -v org.apache.hadoop.package-info" > {code} > Compiled from "package-info.java" > interface org.apache.hadoop.package-info > SourceFile: "package-info.java" > RuntimeVisibleAnnotations: length = 0x24 >00 01 00 06 00 06 00 07 73 00 08 00 09 73 00 0A >00 0B 73 00 0C 00 0D 73 00 0E 00 0F 73 00 10 00 >11 73 00 12 > minor version: 0 > major version: 50 > Constant pool: > const #1 = Asciz org/apache/hadoop/package-info; > const #2 = class #1; // "org/apache/hadoop/package-info" > const #3 = Asciz java/lang/Object; > const #4 = class #3; // java/lang/Object > const #5 = Asciz package-info.java; > const #6 = Asciz Lorg/apache/hadoop/HadoopVersionAnnotation;; > const #7 = Asciz version; > const #8 = Asciz 1.2.1; > const #9 = Asciz revision; > const #10 = Asciz 1503152; > const #11 = Asciz user; > const #12 = Asciz mattf; > const #13 = Asciz date; > const #14 = Asciz Wed Jul 24 13:39:35 PDT 2013; > const #15 = Asciz url; > const #16 = Asciz > https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2; > const #17 = Asciz srcChecksum; > const #18 = Asciz 6923c86528809c4e7e6f493b6b413a9a; > const #19 = Asciz SourceFile; > const #20 = Asciz RuntimeVisibleAnnotations; > { > } > {code} > The version is {{1.2.1}} > It comes because a wrong hbase version settings in examples project. Here is > a part of the dependencly tree when runnning "mvn -Pyarn -Phadoop-2.4 > -Dhadoop.version=2.4.0 -pl examples dependency:tree" > {noformat} > [INFO] +- org.apache.hbase:hbase-testing-util:jar:0.98.7-hadoop1:compile > [INFO] | +- > org.apache.hbase:hbase-common:test-jar:tests:0.98.7-hadoop1:compile > [INFO] | +- > org.apache.hbase:hbase-server:test-jar:tests:0.98.7-hadoop1:compile > [INFO] | | +- com.sun.jersey:jersey-core:jar:1.8:compile > [INFO] | | +- com.sun.jersey:jersey-json:jar:1.8:compile > [INFO] | | | +- org.codehaus.jettison:jettison:jar:1.1:compile > [INFO] | | | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile > [INFO] | | | \- org.codehaus.jackson:jackson-xc:jar:1.7.1:compile > [INFO] | | \- com.sun.jersey:jersey-server:jar:1.8:compile > [INFO] | | \- asm:asm:jar:3.3.1:test > [INFO] | +- org.apache.hbase:hbase-hadoop1-compat:jar:0.98.7-hadoop1:compile > [INFO] | +- > org.apache.hbase:hbase-hadoop1-compat:test-jar:tests:0.98.7-hadoop1:compile > [INFO] | +- org.apache.hadoop:hadoop-core:jar:1.2.1:compile > [INFO] | | +- xmlenc:xmlenc:jar:0.52:compile > [INFO] | | +- commons-configuration:commons-configuration:jar:1.6:compile > [INFO] | | | +- commons-digester:commons-digester:jar:1.8:compile > [INFO] | | | | \- commons-beanutils:commons-beanutils:jar:1.7.0:compile > [INFO] | | | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile > [INFO] | | \- commons-el:commons-el:jar:1.0:compile > [INFO] | +- org.apache.hadoop:hadoop-test:jar:1.2.1:compile > [INFO] | | +- org.apache.ftpserver:ftplet-api:jar:1.0.0:compile > [INFO] | | +- org.apache.mina:mina-core:jar:2.0.0-M5:compile > [INFO] | | +- org.apache.ftpserver:ftpserver-core:jar:1.0.0:compile > [INFO] | | \- org.apache.ftpserver:ftpserver-deprecated:jar:1.0.0-M2:compile > [INFO] | +- > com.github.stephenc.findbugs:findbugs-annotations:jar:1.3.9-1:compile > [INFO] | \- junit:junit:jar:4.10:test > [INFO] | \- org.hamcrest:hamcrest-core:jar:1.1:test > {noformat} > If I ran `mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -pl examples -am > dependency:tree -Dhbase.profile=hadoop2`, the dependency tree is right. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5008) Persistent HDFS does not recognize EBS Volumes
[ https://issues.apache.org/jira/browse/SPARK-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273007#comment-14273007 ] Nicholas Chammas commented on SPARK-5008: - Use [{{copy-dir}}|https://github.com/mesos/spark-ec2/blob/v4/copy-dir.sh], which is installed by default, from the master. > Persistent HDFS does not recognize EBS Volumes > -- > > Key: SPARK-5008 > URL: https://issues.apache.org/jira/browse/SPARK-5008 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.2.0 > Environment: 8 Node Cluster Generated from 1.2.0 spark-ec2 script. > -m c3.2xlarge -t c3.8xlarge --ebs-vol-size 300 --ebs-vol-type gp2 > --ebs-vol-num 1 >Reporter: Brad Willard > > Cluster is built with correct size EBS volumes. It creates the volume at > /dev/xvds and it mounted to /vol0. However when you start persistent hdfs > with start-all script, it starts but it isn't correctly configured to use the > EBS volume. > I'm assuming some sym links or expected mounts are not correctly configured. > This has worked flawlessly on all previous versions of spark. > I have a stupid workaround by installing pssh and mucking with it by mounting > it to /vol, which worked, however it doesn't not work between restarts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5008) Persistent HDFS does not recognize EBS Volumes
[ https://issues.apache.org/jira/browse/SPARK-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272991#comment-14272991 ] Brad Willard commented on SPARK-5008: - [~nchammas] I can try that once I get back into the office. Probably by Wednesday. Once I update the core-site.xml, what's the correct way to sync it to all the slaves? > Persistent HDFS does not recognize EBS Volumes > -- > > Key: SPARK-5008 > URL: https://issues.apache.org/jira/browse/SPARK-5008 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 1.2.0 > Environment: 8 Node Cluster Generated from 1.2.0 spark-ec2 script. > -m c3.2xlarge -t c3.8xlarge --ebs-vol-size 300 --ebs-vol-type gp2 > --ebs-vol-num 1 >Reporter: Brad Willard > > Cluster is built with correct size EBS volumes. It creates the volume at > /dev/xvds and it mounted to /vol0. However when you start persistent hdfs > with start-all script, it starts but it isn't correctly configured to use the > EBS volume. > I'm assuming some sym links or expected mounts are not correctly configured. > This has worked flawlessly on all previous versions of spark. > I have a stupid workaround by installing pssh and mucking with it by mounting > it to /vol, which worked, however it doesn't not work between restarts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5162) Python yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272943#comment-14272943 ] Lianhui Wang commented on SPARK-5162: - [~dklassen] i submit a PR for this issue.https://github.com/apache/spark/pull/3976 so i think you can try it. if there are any questions or suggestions,please tell me. > Python yarn-cluster mode > > > Key: SPARK-5162 > URL: https://issues.apache.org/jira/browse/SPARK-5162 > Project: Spark > Issue Type: New Feature > Components: PySpark, YARN >Reporter: Dana Klassen > Labels: cluster, python, yarn > > Running pyspark in yarn is currently limited to ‘yarn-client’ mode. It would > be great to be able to submit python applications to the cluster and (just > like java classes) have the resource manager setup an AM on any node in the > cluster. Does anyone know the issues blocking this feature? I was snooping > around with enabling python apps: > Removing the logic stopping python and yarn-cluster from sparkSubmit.scala > ... > // The following modes are not supported or applicable > (clusterManager, deployMode) match { > ... > case (_, CLUSTER) if args.isPython => > printErrorAndExit("Cluster deploy mode is currently not supported for > python applications.") > ... > } > … > and submitting application via: > HADOOP_CONF_DIR={{insert conf dir}} ./bin/spark-submit --master yarn-cluster > --num-executors 2 —-py-files {{insert location of egg here}} > --executor-cores 1 ../tools/canary.py > Everything looks to run alright, pythonRunner is picked up as main class, > resources get setup, yarn client gets launched but falls flat on its face: > 2015-01-08 18:48:03,444 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: > DEBUG: FAILED { > {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py, > 1420742868009, FILE, null }, Resource > {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py changed > on src filesystem (expected 1420742868009, was 1420742869284 > and > 2015-01-08 18:48:03,446 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: > Resource > {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py(->/data/4/yarn/nm/usercache/klassen/filecache/11/canary.py) > transitioned from DOWNLOADING to FAILED > Tracked this down to the apache hadoop code(FSDownload.java line 249) related > to container localization of files upon downloading. At this point thought it > would be best to raise the issue here and get input. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5196) Add comment field in StructField
[ https://issues.apache.org/jira/browse/SPARK-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272937#comment-14272937 ] Apache Spark commented on SPARK-5196: - User 'OopsOutOfMemory' has created a pull request for this issue: https://github.com/apache/spark/pull/3991 > Add comment field in StructField > > > Key: SPARK-5196 > URL: https://issues.apache.org/jira/browse/SPARK-5196 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.3.0 >Reporter: shengli > Fix For: 1.3.0 > > > StructField should contains name, type, nullable, comment etc... > Add support comment field in StructField. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5196) Add comment field in StructField
shengli created SPARK-5196: -- Summary: Add comment field in StructField Key: SPARK-5196 URL: https://issues.apache.org/jira/browse/SPARK-5196 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.3.0 Reporter: shengli Fix For: 1.3.0 StructField should contains name, type, nullable, comment etc... Add support comment field in StructField. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5195) when hive table is query with alias the cache data lose effectiveness.
[ https://issues.apache.org/jira/browse/SPARK-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272934#comment-14272934 ] Apache Spark commented on SPARK-5195: - User 'seayi' has created a pull request for this issue: https://github.com/apache/spark/pull/3898 > when hive table is query with alias the cache data lose effectiveness. > > > Key: SPARK-5195 > URL: https://issues.apache.org/jira/browse/SPARK-5195 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 >Reporter: yixiaohua > > override the MetastoreRelation's sameresult method only compare databasename > and table name > because in previous : > cache table t1; > select count() from t1; > it will read data from memory but the sql below will not,instead it read from > hdfs: > select count() from t1 t; > because cache data is keyed by logical plan and compare with sameResult ,so > when table with alias the same table 's logicalplan is not the same logical > plan with out alias so modify the sameresult method only compare databasename > and table name -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5195) when hive table is query with alias the cache data lose effectiveness.
yixiaohua created SPARK-5195: Summary: when hive table is query with alias the cache data lose effectiveness. Key: SPARK-5195 URL: https://issues.apache.org/jira/browse/SPARK-5195 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Reporter: yixiaohua override the MetastoreRelation's sameresult method only compare databasename and table name because in previous : cache table t1; select count() from t1; it will read data from memory but the sql below will not,instead it read from hdfs: select count() from t1 t; because cache data is keyed by logical plan and compare with sameResult ,so when table with alias the same table 's logicalplan is not the same logical plan with out alias so modify the sameresult method only compare databasename and table name -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5192) Parquet fails to parse schema contains '\r'
[ https://issues.apache.org/jira/browse/SPARK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated SPARK-5192: - Summary: Parquet fails to parse schema contains '\r' (was: Parquet fails to parse schemas contains '\r') > Parquet fails to parse schema contains '\r' > --- > > Key: SPARK-5192 > URL: https://issues.apache.org/jira/browse/SPARK-5192 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.0 > Environment: windows7 + Intellj idea 13.0.2 >Reporter: cen yuhai >Priority: Critical > Fix For: 1.3.0 > > > I think this is actually a bug in parquet, when i debuged 'ParquetTestData', > i found a exception as below. So i download the source of MessageTypeParser, > the funtion 'isWhitespace' do not check for '\r' > private boolean isWhitespace(String t) { > return t.equals(" ") || t.equals("\t") || t.equals("\n"); > } > So I replace all '\r' to work around this issue. > val subTestSchema = > """ > message myrecord { > optional boolean myboolean; > optional int64 mylong; > } > """.replaceAll("\r","") > at line 0: message myrecord { > at > parquet.schema.MessageTypeParser.asRepetition(MessageTypeParser.java:203) > at parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:101) > at > parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:96) > at parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:89) > at > parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:79) > at > org.apache.spark.sql.parquet.ParquetTestData$.writeFile(ParquetTestData.scala:221) > at > org.apache.spark.sql.parquet.ParquetQuerySuite.beforeAll(ParquetQuerySuite.scala:92) > at > org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) > at > org.apache.spark.sql.parquet.ParquetQuerySuite.beforeAll(ParquetQuerySuite.scala:85) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) > at > org.apache.spark.sql.parquet.ParquetQuerySuite.run(ParquetQuerySuite.scala:85) > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5194) ADD JAR doesn't update classpath until reconnect
Oleg Danilov created SPARK-5194: --- Summary: ADD JAR doesn't update classpath until reconnect Key: SPARK-5194 URL: https://issues.apache.org/jira/browse/SPARK-5194 Project: Spark Issue Type: Bug Affects Versions: 1.2.0 Reporter: Oleg Danilov Steps to reproduce: beeline> !connect jdbc:hive2://vmhost-vm0:1 0: jdbc:hive2://vmhost-vm0:1> add jar ./target/nexr-hive-udf-0.2-SNAPSHOT.jar 0: jdbc:hive2://vmhost-vm0:1> CREATE TEMPORARY FUNCTION nvl AS 'com.nexr.platform.hive.udf.GenericUDFNVL'; 0: jdbc:hive2://vmhost-vm0:1> select nvl(imsi,'test') from ps_cei_index_1_week limit 1; Error: java.lang.ClassNotFoundException: com.nexr.platform.hive.udf.GenericUDFNVL (state=,code=0) 0: jdbc:hive2://vmhost-vm0:1> !reconnect Reconnecting to "jdbc:hive2://vmhost-vm0:1"... Closing: org.apache.hive.jdbc.HiveConnection@3f18dc75: {1} Connected to: Spark SQL (version 1.2.0) Driver: null (version null) Transaction isolation: TRANSACTION_REPEATABLE_READ 0: jdbc:hive2://vmhost-vm0:1> select nvl(imsi,'test') from ps_cei_index_1_week limit 1; +--+ | _c0 | +--+ | -1 | +--+ 1 row selected (1.605 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4861) Refactory command in spark sql
[ https://issues.apache.org/jira/browse/SPARK-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272862#comment-14272862 ] wangfei commented on SPARK-4861: [~yhuai]of course if possible, but i have not find a way to remove it since in HiveCommandStrategy we need to distinguish hive metastore table and temporary table, so now still keep HiveCommandStrategy there. any idea here? > Refactory command in spark sql > -- > > Key: SPARK-4861 > URL: https://issues.apache.org/jira/browse/SPARK-4861 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.1.1 >Reporter: wangfei > Fix For: 1.3.0 > > > Fix a todo in spark sql: remove ```Command``` and use ```RunnableCommand``` > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5166) Stabilize Spark SQL APIs
[ https://issues.apache.org/jira/browse/SPARK-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-5166: --- Assignee: Reynold Xin > Stabilize Spark SQL APIs > > > Key: SPARK-5166 > URL: https://issues.apache.org/jira/browse/SPARK-5166 > Project: Spark > Issue Type: Task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > Before we take Spark SQL out of alpha, we need to audit the APIs and > stabilize them. > As a general rule, everything under org.apache.spark.sql.catalyst should not > be exposed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5166) Stabilize Spark SQL APIs
[ https://issues.apache.org/jira/browse/SPARK-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-5166: --- Priority: Critical (was: Major) > Stabilize Spark SQL APIs > > > Key: SPARK-5166 > URL: https://issues.apache.org/jira/browse/SPARK-5166 > Project: Spark > Issue Type: Task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin >Priority: Critical > > Before we take Spark SQL out of alpha, we need to audit the APIs and > stabilize them. > As a general rule, everything under org.apache.spark.sql.catalyst should not > be exposed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5193) Make Spark SQL API usable in Java and remove the Java-specific API
[ https://issues.apache.org/jira/browse/SPARK-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-5193: --- Description: Java version of the SchemaRDD API causes high maintenance burden for Spark SQL itself and downstream libraries (e.g. MLlib pipeline API needs to support both JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it usable for Java, and then we can remove the Java specific version. Things to remove include (Java version of): - data type - Row - SQLContext - HiveContext Things to consider: - Scala and Java have a different collection library. - Scala and Java (8) have different closure interface. - Scala and Java can have duplicate definitions of common classes, such as BigDecimal. was: Java version of the SchemaRDD API causes high maintenance burden for Spark SQL itself and downstream libraries (e.g. MLlib pipeline API needs to support both JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it usable for Java, and then we can remove the Java specific version. Things to remove include (Java version of): - data type - Row - SQLContext - HiveContext Things to consider: - Scala and Java have a different collection library. - Scala and Java (8) have different closure interface. > Make Spark SQL API usable in Java and remove the Java-specific API > -- > > Key: SPARK-5193 > URL: https://issues.apache.org/jira/browse/SPARK-5193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > Java version of the SchemaRDD API causes high maintenance burden for Spark > SQL itself and downstream libraries (e.g. MLlib pipeline API needs to support > both JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it > usable for Java, and then we can remove the Java specific version. > Things to remove include (Java version of): > - data type > - Row > - SQLContext > - HiveContext > Things to consider: > - Scala and Java have a different collection library. > - Scala and Java (8) have different closure interface. > - Scala and Java can have duplicate definitions of common classes, such as > BigDecimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-5193) Make Spark SQL API usable in Java and remove the Java-specific API
[ https://issues.apache.org/jira/browse/SPARK-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272860#comment-14272860 ] Reynold Xin commented on SPARK-5193: cc [~marmbrus] > Make Spark SQL API usable in Java and remove the Java-specific API > -- > > Key: SPARK-5193 > URL: https://issues.apache.org/jira/browse/SPARK-5193 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > Java version of the SchemaRDD API causes high maintenance burden for Spark > SQL itself and downstream libraries (e.g. MLlib pipeline API needs to support > both JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it > usable for Java, and then we can remove the Java specific version. > Things to remove include (Java version of): > - data type > - Row > - SQLContext > - HiveContext > Things to consider: > - Scala and Java have a different collection library. > - Scala and Java (8) have different closure interface. > - Scala and Java can have duplicate definitions of common classes, such as > BigDecimal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5193) Make Spark SQL API usable in Java and remove the Java-specific API
Reynold Xin created SPARK-5193: -- Summary: Make Spark SQL API usable in Java and remove the Java-specific API Key: SPARK-5193 URL: https://issues.apache.org/jira/browse/SPARK-5193 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Java version of the SchemaRDD API causes high maintenance burden for Spark SQL itself and downstream libraries (e.g. MLlib pipeline API needs to support both JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it usable for Java, and then we can remove the Java specific version. Things to remove include (Java version of): - data type - Row - SQLContext - HiveContext Things to consider: - Scala and Java have a different collection library. - Scala and Java (8) have different closure interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3299) [SQL] Public API in SQLContext to list tables
[ https://issues.apache.org/jira/browse/SPARK-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-3299: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-5166 > [SQL] Public API in SQLContext to list tables > - > > Key: SPARK-3299 > URL: https://issues.apache.org/jira/browse/SPARK-3299 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 1.0.2 >Reporter: Evan Chan >Assignee: Bill Bejeck >Priority: Minor > Labels: newbie > > There is no public API in SQLContext to list the current tables. This would > be pretty helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-5167) Move Row into sql package and make it usable for Java
[ https://issues.apache.org/jira/browse/SPARK-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-5167: --- Assignee: Reynold Xin > Move Row into sql package and make it usable for Java > - > > Key: SPARK-5167 > URL: https://issues.apache.org/jira/browse/SPARK-5167 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > > This will help us eliminate the duplicated Java code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-2096) Correctly parse dot notations for accessing an array of structs
[ https://issues.apache.org/jira/browse/SPARK-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-2096: --- Target Version/s: 1.3.0 (was: 1.2.0) > Correctly parse dot notations for accessing an array of structs > --- > > Key: SPARK-2096 > URL: https://issues.apache.org/jira/browse/SPARK-2096 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.0.0 >Reporter: Yin Huai >Priority: Minor > Labels: starter > Fix For: 1.2.0 > > > For example, "arrayOfStruct" is an array of structs and every element of this > array has a field called "field1". "arrayOfStruct[0].field1" means to access > the value of "field1" for the first element of "arrayOfStruct", but the SQL > parser (in sql-core) treats "field1" as an alias. Also, > "arrayOfStruct.field1" means to access all values of "field1" in this array > of structs and the returns those values as an array. But, the SQL parser > cannot resolve it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4508) Native Date type for SQL92 Date
[ https://issues.apache.org/jira/browse/SPARK-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-4508: --- Assignee: Adrian Wang > Native Date type for SQL92 Date > --- > > Key: SPARK-4508 > URL: https://issues.apache.org/jira/browse/SPARK-4508 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Adrian Wang >Assignee: Adrian Wang > > Store daysSinceEpoch as an Int(4 bytes), instead of using java.sql.Date(8 > bytes as Long) in catalyst row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4508) Native Date type for SQL92 Date
[ https://issues.apache.org/jira/browse/SPARK-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-4508: --- Summary: Native Date type for SQL92 Date (was: build native date type to conform behavior to Hive) > Native Date type for SQL92 Date > --- > > Key: SPARK-4508 > URL: https://issues.apache.org/jira/browse/SPARK-4508 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Adrian Wang > > Store daysSinceEpoch as an Int(4 bytes), instead of using java.sql.Date(8 > bytes as Long) in catalyst row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4508) Native Date type for SQL92 Date
[ https://issues.apache.org/jira/browse/SPARK-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-4508: --- Target Version/s: 1.3.0 > Native Date type for SQL92 Date > --- > > Key: SPARK-4508 > URL: https://issues.apache.org/jira/browse/SPARK-4508 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Adrian Wang >Assignee: Adrian Wang > > Store daysSinceEpoch as an Int(4 bytes), instead of using java.sql.Date(8 > bytes as Long) in catalyst row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4508) build native date type to conform behavior to Hive
[ https://issues.apache.org/jira/browse/SPARK-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-4508: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-5166 > build native date type to conform behavior to Hive > -- > > Key: SPARK-4508 > URL: https://issues.apache.org/jira/browse/SPARK-4508 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Adrian Wang > > Store daysSinceEpoch as an Int(4 bytes), instead of using java.sql.Date(8 > bytes as Long) in catalyst row. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org