[jira] [Updated] (SPARK-5202) HiveContext doesn't support the Variables Substitution

2015-01-11 Thread Cheng Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Hao updated SPARK-5202:
-
Description: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution

This is a block issue for the CLI user, it impacts the existed hql scripts from 
Hive.

  was:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution

This is a block issue for the CLI user, which will impact the existed hql 
scripts.


> HiveContext doesn't support the Variables Substitution
> --
>
> Key: SPARK-5202
> URL: https://issues.apache.org/jira/browse/SPARK-5202
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Cheng Hao
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution
> This is a block issue for the CLI user, it impacts the existed hql scripts 
> from Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5202) HiveContext doesn't support the Variables Substitution

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273293#comment-14273293
 ] 

Apache Spark commented on SPARK-5202:
-

User 'chenghao-intel' has created a pull request for this issue:
https://github.com/apache/spark/pull/4003

> HiveContext doesn't support the Variables Substitution
> --
>
> Key: SPARK-5202
> URL: https://issues.apache.org/jira/browse/SPARK-5202
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Cheng Hao
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution
> This is a block issue for the CLI user, which will impact the existed hql 
> scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5201) ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing with inclusive range

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273289#comment-14273289
 ] 

Apache Spark commented on SPARK-5201:
-

User 'advancedxy' has created a pull request for this issue:
https://github.com/apache/spark/pull/4002

> ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing 
> with inclusive range
> --
>
> Key: SPARK-5201
> URL: https://issues.apache.org/jira/browse/SPARK-5201
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Ye Xianjin
>  Labels: rdd
> Fix For: 1.2.1
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> {code}
>  sc.makeRDD(1 to (Int.MaxValue)).count   // result = 0
>  sc.makeRDD(1 to (Int.MaxValue - 1)).count   // result = 2147483646 = 
> Int.MaxValue - 1
>  sc.makeRDD(1 until (Int.MaxValue)).count// result = 2147483646 = 
> Int.MaxValue - 1
> {code}
> More details on the discussion https://github.com/apache/spark/pull/2874



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5202) HiveContext doesn't support the Variables Substitution

2015-01-11 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-5202:


 Summary: HiveContext doesn't support the Variables Substitution
 Key: SPARK-5202
 URL: https://issues.apache.org/jira/browse/SPARK-5202
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Cheng Hao


https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VariableSubstitution

This is a block issue for the CLI user, which will impact the existed hql 
scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5201) ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing with inclusive range

2015-01-11 Thread Ye Xianjin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273277#comment-14273277
 ] 

Ye Xianjin commented on SPARK-5201:
---

I will send a pr for this.

> ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing 
> with inclusive range
> --
>
> Key: SPARK-5201
> URL: https://issues.apache.org/jira/browse/SPARK-5201
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Ye Xianjin
>  Labels: rdd
> Fix For: 1.2.1
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> {code}
>  sc.makeRDD(1 to (Int.MaxValue)).count   // result = 0
>  sc.makeRDD(1 to (Int.MaxValue - 1)).count   // result = 2147483646 = 
> Int.MaxValue - 1
>  sc.makeRDD(1 until (Int.MaxValue)).count// result = 2147483646 = 
> Int.MaxValue - 1
> {code}
> More details on the discussion https://github.com/apache/spark/pull/2874



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5201) ParallelCollectionRDD.slice(seq, numSlices) has int overflow when dealing with inclusive range

2015-01-11 Thread Ye Xianjin (JIRA)
Ye Xianjin created SPARK-5201:
-

 Summary: ParallelCollectionRDD.slice(seq, numSlices) has int 
overflow when dealing with inclusive range
 Key: SPARK-5201
 URL: https://issues.apache.org/jira/browse/SPARK-5201
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Ye Xianjin
 Fix For: 1.2.1


{code}
 sc.makeRDD(1 to (Int.MaxValue)).count   // result = 0
 sc.makeRDD(1 to (Int.MaxValue - 1)).count   // result = 2147483646 = 
Int.MaxValue - 1
 sc.makeRDD(1 until (Int.MaxValue)).count// result = 2147483646 = 
Int.MaxValue - 1
{code}
More details on the discussion https://github.com/apache/spark/pull/2874



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4908) Spark SQL built for Hive 13 fails under concurrent metadata queries

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273271#comment-14273271
 ] 

Apache Spark commented on SPARK-4908:
-

User 'baishuo' has created a pull request for this issue:
https://github.com/apache/spark/pull/4001

> Spark SQL built for Hive 13 fails under concurrent metadata queries
> ---
>
> Key: SPARK-4908
> URL: https://issues.apache.org/jira/browse/SPARK-4908
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: David Ross
>Assignee: Cheng Lian
>Priority: Blocker
> Fix For: 1.3.0, 1.2.1
>
>
> We are trunk: {{1.3.0-SNAPSHOT}}, as of this commit: 
> https://github.com/apache/spark/commit/3d0c37b8118f6057a663f959321a79b8061132b6
> We are using Spark built for Hive 13, using this option:
> {{-Phive-0.13.1}}
> In single-threaded mode, normal operations look fine. However, under 
> concurrency, with at least 2 concurrent connections, metadata queries fail.
> For example, {{USE some_db}}, {{SHOW TABLES}}, and the implicit {{USE}} 
> statement when you pass a default schema in the JDBC URL, all fail.
> {{SELECT}} queries like {{SELECT * FROM some_table}} do not have this issue.
> Here is some example code:
> {code}
> object main extends App {
>   import java.sql._
>   import scala.concurrent._
>   import scala.concurrent.duration._
>   import scala.concurrent.ExecutionContext.Implicits.global
>   Class.forName("org.apache.hive.jdbc.HiveDriver")
>   val host = "localhost" // update this
>   val url = s"jdbc:hive2://${host}:10511/some_db" // update this
>   val future = Future.traverse(1 to 3) { i =>
> Future {
>   println("Starting: " + i)
>   try {
> val conn = DriverManager.getConnection(url)
>   } catch {
> case e: Throwable => e.printStackTrace()
> println("Failed: " + i)
>   }
>   println("Finishing: " + i)
> }
>   }
>   Await.result(future, 2.minutes)
>   println("done!")
> }
> {code}
> Here is the output:
> {code}
> Starting: 1
> Starting: 3
> Starting: 2
> java.sql.SQLException: 
> org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation 
> cancelled
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231)
>   at 
> org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451)
>   at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:195)
>   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:270)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>   at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>   at 
> scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>   at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Failed: 3
> Finishing: 3
> java.sql.SQLException: 
> org.apache.spark.sql.execution.QueryExecutionException: FAILED: Operation 
> cancelled
>   at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:121)
>   at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:109)
>   at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:231)
>   at 
> org.apache.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:451)
>   at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:195)
>   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
>   at java.sql.DriverManager.getConnection(DriverManager.java:664)
>   at java.sql.DriverManager.getConnection(DriverManager.java:270)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply$mcV$sp(ConnectionManager.scala:896)
>   at 
> com.atscale.engine.connection.pool.main$$anonfun$30$$anonfun$apply$2.apply(ConnectionManager.scala:893)

[jira] [Commented] (SPARK-5196) Add comment field in StructField

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273251#comment-14273251
 ] 

Apache Spark commented on SPARK-5196:
-

User 'OopsOutOfMemory' has created a pull request for this issue:
https://github.com/apache/spark/pull/3999

> Add comment field in StructField
> 
>
> Key: SPARK-5196
> URL: https://issues.apache.org/jira/browse/SPARK-5196
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: shengli
> Fix For: 1.3.0
>
>
> StructField should contains name, type, nullable, comment  etc...
> Add support comment field in StructField.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5200) Disable web UI in Hive Thriftserver tests

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273247#comment-14273247
 ] 

Apache Spark commented on SPARK-5200:
-

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/3998

> Disable web UI in Hive Thriftserver tests
> -
>
> Key: SPARK-5200
> URL: https://issues.apache.org/jira/browse/SPARK-5200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>  Labels: flaky-test
>
> In our unit tests, we should disable the Spark Web UI when starting the Hive 
> Thriftserver, since port contention during this test has been a cause of test 
> failures on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5124) Standardize internal RPC interface

2015-01-11 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273246#comment-14273246
 ] 

Reynold Xin commented on SPARK-5124:


Thanks for the response.

1. Let's not rely on the property of local actor not passing messages through a 
socket for local actor speedup. Conceptually, there is no reason to tie local 
actor implementation to RPC. DAGScheduler's actor used to be a simple queue & 
event loop (before it was turned into an actor for no good reason). We can 
restore it to that.

2. Have you thought about how the fate sharing stuff would work with 
alternative RPC implementations? 

> Standardize internal RPC interface
> --
>
> Key: SPARK-5124
> URL: https://issues.apache.org/jira/browse/SPARK-5124
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Shixiong Zhu
> Attachments: Pluggable RPC - draft 1.pdf
>
>
> In Spark we use Akka as the RPC layer. It would be great if we can 
> standardize the internal RPC interface to facilitate testing. This will also 
> provide the foundation to try other RPC implementations in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5200) Disable web UI in Hive Thriftserver tests

2015-01-11 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-5200:
-

 Summary: Disable web UI in Hive Thriftserver tests
 Key: SPARK-5200
 URL: https://issues.apache.org/jira/browse/SPARK-5200
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Josh Rosen
Assignee: Josh Rosen


In our unit tests, we should disable the Spark Web UI when starting the Hive 
Thriftserver, since port contention during this test has been a cause of test 
failures on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5095) Support launching multiple mesos executors in coarse grained mesos mode

2015-01-11 Thread Timothy Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273244#comment-14273244
 ] 

Timothy Chen commented on SPARK-5095:
-

[~joshdevins] [~gmaas] indeed capping the cores is actually to fix 4940, and we 
can use that to address the number of executors.

I'm trying not to have just a set of configurations that can achieve both, 
otherwise it becomes a lot harder to maintain.

I'm working on the patch now and I'll add you both on github for review.

> Support launching multiple mesos executors in coarse grained mesos mode
> ---
>
> Key: SPARK-5095
> URL: https://issues.apache.org/jira/browse/SPARK-5095
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Timothy Chen
>
> Currently in coarse grained mesos mode, it's expected that we only launch one 
> Mesos executor that launches one JVM process to launch multiple spark 
> executors.
> However, this become a problem when the JVM process launched is larger than 
> an ideal size (30gb is recommended value from databricks), which causes GC 
> problems reported on the mailing list.
> We should support launching mulitple executors when large enough resources 
> are available for spark to use, and these resources are still under the 
> configured limit.
> This is also applicable when users want to specifiy number of executors to be 
> launched on each node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5018) Make MultivariateGaussian public

2015-01-11 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-5018.
--
   Resolution: Fixed
Fix Version/s: 1.3.0

Issue resolved by pull request 3923
[https://github.com/apache/spark/pull/3923]

> Make MultivariateGaussian public
> 
>
> Key: SPARK-5018
> URL: https://issues.apache.org/jira/browse/SPARK-5018
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.2.0
>Reporter: Joseph K. Bradley
>Assignee: Travis Galoppo
>Priority: Critical
> Fix For: 1.3.0
>
>
> MultivariateGaussian is currently private[ml], but it would be a useful 
> public class.  This JIRA will require defining a good public API for 
> distributions.
> This JIRA will be needed for finalizing the GaussianMixtureModel API, which 
> should expose MultivariateGaussian instances instead of the means and 
> covariances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3561) Allow for pluggable execution contexts in Spark

2015-01-11 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273225#comment-14273225
 ] 

Patrick Wendell commented on SPARK-3561:


So if the question is: "Is Spark only API or is it an integrated API/execution 
engine"... we've taken a fairly clear stance over the history of the project 
that it's an integrated engine. I.e. Spark is not something like Pig where it's 
intended primarily as a user API and we expect there to be different physical 
execution engines plugged in underneath.

In the past we haven't found this prevents Spark from working well in different 
environments. For instance, with Mesos, on YARN, etc. And for this we've 
integrated at different layers such as the storage layer and the scheduling 
layer, where there were well defined API's and integration points in the 
broader ecosystem. Compared with alternatives Spark is far more flexible in 
terms of runtime environments. The RDD API is so generic that it's very easy to 
customize and integrate.

For this reason, my feeling with decoupling execution from the rest of Spark is 
that it would tie our hands architecturally and not add much benefit. I don't 
see a good reason to make this broader change in the strategy of the project.

If there are specific improvements you see for making Spark work well on YARN, 
then we can definitely look at them.

> Allow for pluggable execution contexts in Spark
> ---
>
> Key: SPARK-3561
> URL: https://issues.apache.org/jira/browse/SPARK-3561
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Oleg Zhurakousky
>  Labels: features
> Attachments: SPARK-3561.pdf
>
>
> Currently Spark provides integration with external resource-managers such as 
> Apache Hadoop YARN, Mesos etc. Specifically in the context of YARN, the 
> current architecture of Spark-on-YARN can be enhanced to provide 
> significantly better utilization of cluster resources for large scale, batch 
> and/or ETL applications when run alongside other applications (Spark and 
> others) and services in YARN. 
> Proposal: 
> The proposed approach would introduce a pluggable JobExecutionContext (trait) 
> - a gateway and a delegate to Hadoop execution environment - as a non-public 
> api (@Experimental) not exposed to end users of Spark. 
> The trait will define 6 operations: 
> * hadoopFile 
> * newAPIHadoopFile 
> * broadcast 
> * runJob 
> * persist
> * unpersist
> Each method directly maps to the corresponding methods in current version of 
> SparkContext. JobExecutionContext implementation will be accessed by 
> SparkContext via master URL as 
> "execution-context:foo.bar.MyJobExecutionContext" with default implementation 
> containing the existing code from SparkContext, thus allowing current 
> (corresponding) methods of SparkContext to delegate to such implementation. 
> An integrator will now have an option to provide custom implementation of 
> DefaultExecutionContext by either implementing it from scratch or extending 
> form DefaultExecutionContext. 
> Please see the attached design doc for more details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5186) Vector.equals and Vector.hashCode are very inefficient

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273224#comment-14273224
 ] 

Apache Spark commented on SPARK-5186:
-

User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/3997

> Vector.equals  and Vector.hashCode are very inefficient
> ---
>
> Key: SPARK-5186
> URL: https://issues.apache.org/jira/browse/SPARK-5186
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.2.0
>Reporter: Derrick Burns
>   Original Estimate: 0.25h
>  Remaining Estimate: 0.25h
>
> The implementation of Vector.equals and Vector.hashCode are correct but slow 
> for SparseVectors that are truly sparse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4924) Factor out code to launch Spark applications into a separate library

2015-01-11 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated SPARK-4924:
--
Assignee: Marcelo Vanzin

> Factor out code to launch Spark applications into a separate library
> 
>
> Key: SPARK-4924
> URL: https://issues.apache.org/jira/browse/SPARK-4924
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Attachments: spark-launcher.txt
>
>
> One of the questions we run into rather commonly is "how to start a Spark 
> application from my Java/Scala program?". There currently isn't a good answer 
> to that:
> - Instantiating SparkContext has limitations (e.g., you can only have one 
> active context at the moment, plus you lose the ability to submit apps in 
> cluster mode)
> - Calling SparkSubmit directly is doable but you lose a lot of the logic 
> handled by the shell scripts
> - Calling the shell script directly is doable,  but sort of ugly from an API 
> point of view.
> I think it would be nice to have a small library that handles that for users. 
> On top of that, this library could be used by Spark itself to replace a lot 
> of the code in the current shell scripts, which have a lot of duplication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5088) Use spark-class for running executors directly on mesos

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5088:

Fix Version/s: 1.2.1
   1.3.0

> Use spark-class for running executors directly on mesos
> ---
>
> Key: SPARK-5088
> URL: https://issues.apache.org/jira/browse/SPARK-5088
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Mesos
>Affects Versions: 1.2.0
>Reporter: Jongyoul Lee
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>
> - sbin/spark-executor is only used by running executor on mesos environment.
> - spark-executor calls spark-class without specific parameter internally.
> - PYTHONPATH is moved to spark-class' case.
> - Remove a redundant file for maintaining codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5197:

Target Version/s: 1.3.0  (was: 1.3.0, 1.2.1)

> Support external shuffle service in fine-grained mode on mesos cluster
> --
>
> Key: SPARK-5197
> URL: https://issues.apache.org/jira/browse/SPARK-5197
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Mesos, Shuffle
>Reporter: Jongyoul Lee
> Fix For: 1.3.0
>
>
> I think dynamic allocation is almost satisfied on mesos' fine-grained mode, 
> which already offers resources dynamically, and returns automatically when a 
> task is finished. It, however, doesn't have a mechanism on support external 
> shuffle service like yarn's way, which is AuxiliaryService. Because mesos 
> doesn't support AusiliaryService, we think a different way to do this.
> - Launching a shuffle service like a spark job on same cluster
> -- Pros
> --- Support multi-tenant environment
> --- Almost same way like yarn
> -- Cons
> --- Control long running 'background' job - service - when mesos runs
> --- Satisfy all slave - or host - to have one shuffle service all the time
> - Launching jobs within shuffle service
> -- Pros
> --- Easy to implement
> --- Don't consider whether shuffle service exists or not.
> -- Cons
> --- exists multiple shuffle services under multi-tenant environment
> --- Control shuffle service port dynamically on multi-user environment
> In my opinion, the first one is better idea to support external shuffle 
> service. Please leave comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5197:

Fix Version/s: 1.3.0

> Support external shuffle service in fine-grained mode on mesos cluster
> --
>
> Key: SPARK-5197
> URL: https://issues.apache.org/jira/browse/SPARK-5197
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Mesos, Shuffle
>Reporter: Jongyoul Lee
> Fix For: 1.3.0
>
>
> I think dynamic allocation is almost satisfied on mesos' fine-grained mode, 
> which already offers resources dynamically, and returns automatically when a 
> task is finished. It, however, doesn't have a mechanism on support external 
> shuffle service like yarn's way, which is AuxiliaryService. Because mesos 
> doesn't support AusiliaryService, we think a different way to do this.
> - Launching a shuffle service like a spark job on same cluster
> -- Pros
> --- Support multi-tenant environment
> --- Almost same way like yarn
> -- Cons
> --- Control long running 'background' job - service - when mesos runs
> --- Satisfy all slave - or host - to have one shuffle service all the time
> - Launching jobs within shuffle service
> -- Pros
> --- Easy to implement
> --- Don't consider whether shuffle service exists or not.
> -- Cons
> --- exists multiple shuffle services under multi-tenant environment
> --- Control shuffle service port dynamically on multi-user environment
> In my opinion, the first one is better idea to support external shuffle 
> service. Please leave comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5166) Stabilize Spark SQL APIs

2015-01-11 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5166:
---
Priority: Blocker  (was: Critical)

> Stabilize Spark SQL APIs
> 
>
> Key: SPARK-5166
> URL: https://issues.apache.org/jira/browse/SPARK-5166
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Blocker
>
> Before we take Spark SQL out of alpha, we need to audit the APIs and 
> stabilize them. 
> As a general rule, everything under org.apache.spark.sql.catalyst should not 
> be exposed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3340) Deprecate ADD_JARS and ADD_FILES

2015-01-11 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3340:
---
Labels: starter  (was: )

> Deprecate ADD_JARS and ADD_FILES
> 
>
> Key: SPARK-3340
> URL: https://issues.apache.org/jira/browse/SPARK-3340
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Or
>  Labels: starter
>
> These were introduced before Spark submit even existed. Now that there are 
> many better ways of setting jars and python files through Spark submit, we 
> should deprecate these environment variables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3450) Enable specifiying the --jars CLI option multiple times

2015-01-11 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3450.

Resolution: Won't Fix

I'd prefer not to do this one, it complicates our parsing substantially. It's 
possible to just write a bash loop that creates a single long list of jars.

> Enable specifiying the --jars CLI option multiple times
> ---
>
> Key: SPARK-3450
> URL: https://issues.apache.org/jira/browse/SPARK-3450
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 1.0.2
>Reporter: wolfgang hoschek
>
> spark-submit should support specifiying the --jars option multiple time, e.g. 
> --jars foo.jar,bar.jar --jars baz.jar,oops.jar should be equivalent to --jars 
> foo.jar,bar.jar,baz.jar,oops.jar
> This would allow using wrapper scripts that simplify usage for enterprise 
> customers along the following lines:
> {code}
> my-spark-submit.sh:
> jars=
> for i in /opt/myapp/*.jar; do
>   if [ $i -gt 0]
>   then
> jars="$jars,"
>   fi
>   jars="$jars$i"
> done
> spark-submit --jars "$jars" "$@"
> {code}
> Example usage:
> {code}
> my-spark-submit.sh --jars myUserDefinedFunction.jar 
> {code}
> The relevant enhancement code might go into SparkSubmitArguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4399) Support multiple cloud providers

2015-01-11 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-4399.

Resolution: Won't Fix

We'll let the community take this one on.

> Support multiple cloud providers
> 
>
> Key: SPARK-4399
> URL: https://issues.apache.org/jira/browse/SPARK-4399
> Project: Spark
>  Issue Type: New Feature
>  Components: EC2
>Affects Versions: 1.2.0
>Reporter: Andrew Ash
>
> We currently have Spark startup scripts for Amazon EC2 but not for various 
> other cloud providers.  This ticket is an umbrella to support multiple cloud 
> providers in the bundled scripts, not just Amazon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1422) Add scripts for launching Spark on Google Compute Engine

2015-01-11 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273197#comment-14273197
 ] 

Nicholas Chammas commented on SPARK-1422:
-

[~pwendell] - I would consider doing this as well for the parent task, 
[SPARK-4399].

> Add scripts for launching Spark on Google Compute Engine
> 
>
> Key: SPARK-1422
> URL: https://issues.apache.org/jira/browse/SPARK-1422
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2
>Reporter: Matei Zaharia
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause

2015-01-11 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273196#comment-14273196
 ] 

Yin Huai commented on SPARK-4296:
-

I was wondering if we can also find this issue at other places. Maybe we can 
resolve this issue thoroughly.

> Throw "Expression not in GROUP BY" when using same expression in group by 
> clause and  select clause
> ---
>
> Key: SPARK-4296
> URL: https://issues.apache.org/jira/browse/SPARK-4296
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
>Reporter: Shixiong Zhu
>Assignee: Cheng Lian
>Priority: Blocker
>
> When the input data has a complex structure, using same expression in group 
> by clause and  select clause will throw "Expression not in GROUP BY".
> {code:java}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.createSchemaRDD
> case class Birthday(date: String)
> case class Person(name: String, birthday: Birthday)
> val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), 
> Person("Jim", Birthday("1980-02-28"
> people.registerTempTable("people")
> val year = sqlContext.sql("select count(*), upper(birthday.date) from people 
> group by upper(birthday.date)")
> year.collect
> {code}
> Here is the plan of year:
> {code:java}
> SchemaRDD[3] at RDD at SchemaRDD.scala:105
> == Query Plan ==
> == Physical Plan ==
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression 
> not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree:
> Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date 
> AS date#9) AS c1#3]
>  Subquery people
>   LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at 
> ExistingRDD.scala:36
> {code}
> The bug is the equality test for `Upper(birthday#1.date)` and 
> `Upper(birthday#1.date AS date#9)`.
> Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias 
> expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause

2015-01-11 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-4296:

 Target Version/s: 1.3.0, 1.2.1  (was: 1.2.0)
Affects Version/s: 1.1.1
   1.2.0
Fix Version/s: (was: 1.2.0)

> Throw "Expression not in GROUP BY" when using same expression in group by 
> clause and  select clause
> ---
>
> Key: SPARK-4296
> URL: https://issues.apache.org/jira/browse/SPARK-4296
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
>Reporter: Shixiong Zhu
>Assignee: Cheng Lian
>Priority: Blocker
>
> When the input data has a complex structure, using same expression in group 
> by clause and  select clause will throw "Expression not in GROUP BY".
> {code:java}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.createSchemaRDD
> case class Birthday(date: String)
> case class Person(name: String, birthday: Birthday)
> val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), 
> Person("Jim", Birthday("1980-02-28"
> people.registerTempTable("people")
> val year = sqlContext.sql("select count(*), upper(birthday.date) from people 
> group by upper(birthday.date)")
> year.collect
> {code}
> Here is the plan of year:
> {code:java}
> SchemaRDD[3] at RDD at SchemaRDD.scala:105
> == Query Plan ==
> == Physical Plan ==
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression 
> not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree:
> Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date 
> AS date#9) AS c1#3]
>  Subquery people
>   LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at 
> ExistingRDD.scala:36
> {code}
> The bug is the equality test for `Upper(birthday#1.date)` and 
> `Upper(birthday#1.date AS date#9)`.
> Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias 
> expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause

2015-01-11 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273194#comment-14273194
 ] 

Yin Huai commented on SPARK-4296:
-

[~lian cheng] Seems this issues is similar with [this 
one|https://issues.apache.org/jira/browse/SPARK-2063?focusedCommentId=14055193&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14055193].
 The main problem is that we use the last part of a reference of a field in a 
struct as the alias. Is it possible that we can fix that one as well?

> Throw "Expression not in GROUP BY" when using same expression in group by 
> clause and  select clause
> ---
>
> Key: SPARK-4296
> URL: https://issues.apache.org/jira/browse/SPARK-4296
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Shixiong Zhu
>Assignee: Cheng Lian
>Priority: Blocker
> Fix For: 1.2.0
>
>
> When the input data has a complex structure, using same expression in group 
> by clause and  select clause will throw "Expression not in GROUP BY".
> {code:java}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.createSchemaRDD
> case class Birthday(date: String)
> case class Person(name: String, birthday: Birthday)
> val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), 
> Person("Jim", Birthday("1980-02-28"
> people.registerTempTable("people")
> val year = sqlContext.sql("select count(*), upper(birthday.date) from people 
> group by upper(birthday.date)")
> year.collect
> {code}
> Here is the plan of year:
> {code:java}
> SchemaRDD[3] at RDD at SchemaRDD.scala:105
> == Query Plan ==
> == Physical Plan ==
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression 
> not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree:
> Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date 
> AS date#9) AS c1#3]
>  Subquery people
>   LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at 
> ExistingRDD.scala:36
> {code}
> The bug is the equality test for `Upper(birthday#1.date)` and 
> `Upper(birthday#1.date AS date#9)`.
> Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias 
> expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2621) Update task InputMetrics incrementally

2015-01-11 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273192#comment-14273192
 ] 

Sandy Ryza commented on SPARK-2621:
---

Definitely - just filed SPARK-5199 for this.

> Update task InputMetrics incrementally
> --
>
> Key: SPARK-2621
> URL: https://issues.apache.org/jira/browse/SPARK-2621
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5199) Input metrics should show up for InputFormats that return CombineFileSplits

2015-01-11 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-5199:
-

 Summary: Input metrics should show up for InputFormats that return 
CombineFileSplits
 Key: SPARK-5199
 URL: https://issues.apache.org/jira/browse/SPARK-5199
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Sandy Ryza
Assignee: Sandy Ryza






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-01-11 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273187#comment-14273187
 ] 

Nicholas Chammas commented on SPARK-3821:
-

Updated launch stats:
* Launching cluster with 50 slaves in {{us-east-1}}.
* Stats for best of 3 runs.

{{branch-1.3}} @ 
[{{3a95101}}|https://github.com/mesos/spark-ec2/tree/3a95101c70e6892a8a48cc54094adaed1458487a]:
{code}
Cluster is now in 'ssh-ready' state. Waited 460 seconds.
[timing] rsync /root/spark-ec2:  00h 00m 07s
[timing] setup-slave:  00h 00m 28s
[timing] scala init:  00h 00m 11s
[timing] spark init:  00h 00m 07s
[timing] ephemeral-hdfs init:  00h 12m 40s
[timing] persistent-hdfs init:  00h 12m 35s
[timing] spark-standalone init:  00h 00m 00s
[timing] tachyon init:  00h 00m 08s
[timing] ganglia init:  00h 00m 53s
[timing] scala setup:  00h 03m 11s
[timing] spark setup:  00h 21m 20s
[timing] ephemeral-hdfs setup:  00h 00m 48s
[timing] persistent-hdfs setup:  00h 00m 43s
[timing] spark-standalone setup:  00h 01m 19s
[timing] tachyon setup:  00h 03m 06s
[timing] ganglia setup:  00h 00m 32s
{code}


{{packer}} @ 
[{{273c8c5}}|https://github.com/nchammas/spark-ec2/tree/273c8c518fbc6e86e0fb4410efbe77a4d4e4ff5b]:

{code}
Cluster is now in 'ssh-ready' state. Waited 292 seconds.
[timing] rsync /root/spark-ec2:  00h 00m 20s
[timing] setup-slave:  00h 00m 19s
[timing] scala init:  00h 00m 12s
[timing] spark init:  00h 00m 08s
[timing] ephemeral-hdfs init:  00h 12m 58s
[timing] persistent-hdfs init:  00h 12m 55s
[timing] spark-standalone init:  00h 00m 00s
[timing] tachyon init:  00h 00m 10s
[timing] ganglia init:  00h 00m 15s
[timing] scala setup:  00h 03m 19s
[timing] spark setup:  00h 20m 32s
[timing] ephemeral-hdfs setup:  00h 00m 34s
[timing] persistent-hdfs setup:  00h 00m 27s
[timing] spark-standalone setup:  00h 00m 47s
[timing] tachyon setup:  00h 03m 15s
[timing] ganglia setup:  00h 00m 23s
{code}

As you can see, with the exception of time-to-SSH-availability, things are 
mostly the same across the current and Packer-generated AMIs. I've proposed 
improvements to cut down the launch times of large clusters in [a separate 
issue|SPARK-5189].

[~shivaram] - At this point I think it's safe to say that the approach proposed 
here is straightforward and worth pursuing. All we need now is a review of [the 
scripts that install various 
stuff|https://github.com/nchammas/spark-ec2/blob/273c8c518fbc6e86e0fb4410efbe77a4d4e4ff5b/packer/spark-packer.json#L63-L66]
 (e.g. Ganglia, Python 2.7, etc.) on the AMI to make sure it all makes sense.

> Develop an automated way of creating Spark images (AMI, Docker, and others)
> ---
>
> Key: SPARK-3821
> URL: https://issues.apache.org/jira/browse/SPARK-3821
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, EC2
>Reporter: Nicholas Chammas
>Assignee: Nicholas Chammas
> Attachments: packer-proposal.html
>
>
> Right now the creation of Spark AMIs or Docker containers is done manually. 
> With tools like [Packer|http://www.packer.io/], we should be able to automate 
> this work, and do so in such a way that multiple types of machine images can 
> be created from a single template.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1422) Add scripts for launching Spark on Google Compute Engine

2015-01-11 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273178#comment-14273178
 ] 

Patrick Wendell commented on SPARK-1422:


Good call NIck - yeah let's close this as being out of scope since it's being 
maintained elsewhere.

> Add scripts for launching Spark on Google Compute Engine
> 
>
> Key: SPARK-1422
> URL: https://issues.apache.org/jira/browse/SPARK-1422
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2
>Reporter: Matei Zaharia
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1422) Add scripts for launching Spark on Google Compute Engine

2015-01-11 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1422.

Resolution: Won't Fix

> Add scripts for launching Spark on Google Compute Engine
> 
>
> Key: SPARK-1422
> URL: https://issues.apache.org/jira/browse/SPARK-1422
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2
>Reporter: Matei Zaharia
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-11 Thread Jongyoul Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273169#comment-14273169
 ] 

Jongyoul Lee edited comment on SPARK-5198 at 1/12/15 2:38 AM:
--

Uploaded example screenshots


was (Author: jongyoul):
Example screenshots

> Change executorId more unique on mesos fine-grained mode
> 
>
> Key: SPARK-5198
> URL: https://issues.apache.org/jira/browse/SPARK-5198
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Jongyoul Lee
> Fix For: 1.3.0, 1.2.1
>
> Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot 
> 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png
>
>
> In fine-grained mode, SchedulerBackend set executor name as same as slave id 
> with any task id. It's not good to track aspecific job because of logging a 
> different in a same log file. This is a same value while launching job on 
> coarse-grained mode.
> !Screen Shot 2015-01-12 at 11.14.39 AM.png!
> !Screen Shot 2015-01-12 at 11.34.30 AM.png!
> !Screen Shot 2015-01-12 at 11.34.41 AM.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5198:

Description: 
In fine-grained mode, SchedulerBackend set executor name as same as slave id 
with any task id. It's not good to track aspecific job because of logging a 
different in a same log file. This is a same value while launching job on 
coarse-grained mode.

!Screen Shot 2015-01-12 at 11.14.39 AM.png!
!Screen Shot 2015-01-12 at 11.34.30 AM.png!
!Screen Shot 2015-01-12 at 11.34.41 AM.png!

  was:
In fine-grained mode, SchedulerBackend set executor name as same as slave id 
with any task id. It's not good to track aspecific job because of logging a 
different in a same log file. This is a same value while launching job on 
coarse-grained mode.

!Screen Shot 2015-01-12 at 11.14.39 AM.png!
!


> Change executorId more unique on mesos fine-grained mode
> 
>
> Key: SPARK-5198
> URL: https://issues.apache.org/jira/browse/SPARK-5198
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Jongyoul Lee
> Fix For: 1.3.0, 1.2.1
>
> Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot 
> 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png
>
>
> In fine-grained mode, SchedulerBackend set executor name as same as slave id 
> with any task id. It's not good to track aspecific job because of logging a 
> different in a same log file. This is a same value while launching job on 
> coarse-grained mode.
> !Screen Shot 2015-01-12 at 11.14.39 AM.png!
> !Screen Shot 2015-01-12 at 11.34.30 AM.png!
> !Screen Shot 2015-01-12 at 11.34.41 AM.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5198:

Description: 
In fine-grained mode, SchedulerBackend set executor name as same as slave id 
with any task id. It's not good to track aspecific job because of logging a 
different in a same log file. This is a same value while launching job on 
coarse-grained mode.

!Screen Shot 2015-01-12 at 11.14.39 AM.png!
!

  was:
In fine-grained mode, SchedulerBackend set executor name as same as slave id 
with any task id. It's not good to track aspecific job because of logging a 
different in a same log file. This is a same value while launching job on 
coarse-grained mode.

[


> Change executorId more unique on mesos fine-grained mode
> 
>
> Key: SPARK-5198
> URL: https://issues.apache.org/jira/browse/SPARK-5198
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Jongyoul Lee
> Fix For: 1.3.0, 1.2.1
>
> Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot 
> 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png
>
>
> In fine-grained mode, SchedulerBackend set executor name as same as slave id 
> with any task id. It's not good to track aspecific job because of logging a 
> different in a same log file. This is a same value while launching job on 
> coarse-grained mode.
> !Screen Shot 2015-01-12 at 11.14.39 AM.png!
> !



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5198:

Description: 
In fine-grained mode, SchedulerBackend set executor name as same as slave id 
with any task id. It's not good to track aspecific job because of logging a 
different in a same log file. This is a same value while launching job on 
coarse-grained mode.

[

  was:In fine-grained mode, SchedulerBackend set executor name as same as slave 
id with any task id. It's not good to track aspecific job because of logging a 
different in a same log file. This is a same value while launching job on 
coarse-grained mode.


> Change executorId more unique on mesos fine-grained mode
> 
>
> Key: SPARK-5198
> URL: https://issues.apache.org/jira/browse/SPARK-5198
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Jongyoul Lee
> Fix For: 1.3.0, 1.2.1
>
> Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot 
> 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png
>
>
> In fine-grained mode, SchedulerBackend set executor name as same as slave id 
> with any task id. It's not good to track aspecific job because of logging a 
> different in a same log file. This is a same value while launching job on 
> coarse-grained mode.
> [



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5198:

Attachment: Screen Shot 2015-01-12 at 11.34.41 AM.png
Screen Shot 2015-01-12 at 11.34.30 AM.png
Screen Shot 2015-01-12 at 11.14.39 AM.png

Example screenshots

> Change executorId more unique on mesos fine-grained mode
> 
>
> Key: SPARK-5198
> URL: https://issues.apache.org/jira/browse/SPARK-5198
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Jongyoul Lee
> Fix For: 1.3.0, 1.2.1
>
> Attachments: Screen Shot 2015-01-12 at 11.14.39 AM.png, Screen Shot 
> 2015-01-12 at 11.34.30 AM.png, Screen Shot 2015-01-12 at 11.34.41 AM.png
>
>
> In fine-grained mode, SchedulerBackend set executor name as same as slave id 
> with any task id. It's not good to track aspecific job because of logging a 
> different in a same log file. This is a same value while launching job on 
> coarse-grained mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273165#comment-14273165
 ] 

Apache Spark commented on SPARK-5198:
-

User 'jongyoul' has created a pull request for this issue:
https://github.com/apache/spark/pull/3994

> Change executorId more unique on mesos fine-grained mode
> 
>
> Key: SPARK-5198
> URL: https://issues.apache.org/jira/browse/SPARK-5198
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Jongyoul Lee
> Fix For: 1.3.0, 1.2.1
>
>
> In fine-grained mode, SchedulerBackend set executor name as same as slave id 
> with any task id. It's not good to track aspecific job because of logging a 
> different in a same log file. This is a same value while launching job on 
> coarse-grained mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5198:

Description: In fine-grained mode, SchedulerBackend set executor name as 
same as slave id with any task id. It's not good to track aspecific job because 
of logging a different in a same log file. This is a same value while launching 
job on coarse-grained mode.  (was: In fine-grained mode, SchedulerBackend set 
executor name as same as slave id with any task id. It's not good to track 
aspecific job because of logging a different in a same log file.)

> Change executorId more unique on mesos fine-grained mode
> 
>
> Key: SPARK-5198
> URL: https://issues.apache.org/jira/browse/SPARK-5198
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Jongyoul Lee
> Fix For: 1.3.0, 1.2.1
>
>
> In fine-grained mode, SchedulerBackend set executor name as same as slave id 
> with any task id. It's not good to track aspecific job because of logging a 
> different in a same log file. This is a same value while launching job on 
> coarse-grained mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4689) Unioning 2 SchemaRDDs should return a SchemaRDD in Python, Scala, and Java

2015-01-11 Thread Bibudh Lahiri (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14268720#comment-14268720
 ] 

Bibudh Lahiri edited comment on SPARK-4689 at 1/12/15 2:13 AM:
---

I'd like to work on this issue, but would need some details. I looked into 
./sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala where the 
unionAll method is defined as 

def unionAll(otherPlan: SchemaRDD) =
new SchemaRDD(sqlContext, Union(logicalPlan, otherPlan.logicalPlan))

There is no implementation of union() in SchemaRDD itself and and the API says 
it is inherited from RDD. I took two different SchemaRDD objects and applied 
union on them (it is in my fork at 
https://github.com/bibudhlahiri/spark/blob/master/dev/audit-release/sbt_app_schema_rdd/src/main/scala/SchemaRDDApp.scala
 ) , and the resultant object is of class UnionRDD. I am thinking of overriding 
union() in SchemaRDD to return a SchemaRDD, please let me know what you think. 


was (Author: bibudh):
I'd like to work on this issue, but would need some details. I looked into 
./sql/core/src/main/scala/org/apache/spark/sql/SchemaRDD.scala where the 
unionAll method is defined as 

def unionAll(otherPlan: SchemaRDD) =
new SchemaRDD(sqlContext, Union(logicalPlan, otherPlan.logicalPlan))

Are we looking for an implementation of union here (keeping duplicates only 
once), in addition to unionAll (keeping duplicates both the times)?

> Unioning 2 SchemaRDDs should return a SchemaRDD in Python, Scala, and Java
> --
>
> Key: SPARK-4689
> URL: https://issues.apache.org/jira/browse/SPARK-4689
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Chris Fregly
>Priority: Minor
>  Labels: starter
>
> Currently, you need to use unionAll() in Scala.  
> Python does not expose this functionality at the moment.
> The current work around is to use the UNION ALL HiveQL functionality detailed 
> here:  https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5198:

Component/s: Mesos

> Change executorId more unique on mesos fine-grained mode
> 
>
> Key: SPARK-5198
> URL: https://issues.apache.org/jira/browse/SPARK-5198
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Jongyoul Lee
> Fix For: 1.3.0, 1.2.1
>
>
> In fine-grained mode, SchedulerBackend set executor name as same as slave id 
> with any task id. It's not good to track aspecific job because of logging a 
> different in a same log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5198:

Fix Version/s: 1.2.1
   1.3.0

> Change executorId more unique on mesos fine-grained mode
> 
>
> Key: SPARK-5198
> URL: https://issues.apache.org/jira/browse/SPARK-5198
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Jongyoul Lee
> Fix For: 1.3.0, 1.2.1
>
>
> In fine-grained mode, SchedulerBackend set executor name as same as slave id 
> with any task id. It's not good to track aspecific job because of logging a 
> different in a same log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5198) Change executorId more unique on mesos fine-grained mode

2015-01-11 Thread Jongyoul Lee (JIRA)
Jongyoul Lee created SPARK-5198:
---

 Summary: Change executorId more unique on mesos fine-grained mode
 Key: SPARK-5198
 URL: https://issues.apache.org/jira/browse/SPARK-5198
 Project: Spark
  Issue Type: Improvement
Reporter: Jongyoul Lee


In fine-grained mode, SchedulerBackend set executor name as same as slave id 
with any task id. It's not good to track aspecific job because of logging a 
different in a same log file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster

2015-01-11 Thread Jongyoul Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273137#comment-14273137
 ] 

Jongyoul Lee commented on SPARK-5197:
-

Please, assign it to me.

[~andrewor14] [~adav] Please review my description

> Support external shuffle service in fine-grained mode on mesos cluster
> --
>
> Key: SPARK-5197
> URL: https://issues.apache.org/jira/browse/SPARK-5197
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Mesos, Shuffle
>Reporter: Jongyoul Lee
>
> I think dynamic allocation is almost satisfied on mesos' fine-grained mode, 
> which already offers resources dynamically, and returns automatically when a 
> task is finished. It, however, doesn't have a mechanism on support external 
> shuffle service like yarn's way, which is AuxiliaryService. Because mesos 
> doesn't support AusiliaryService, we think a different way to do this.
> - Launching a shuffle service like a spark job on same cluster
> -- Pros
> --- Support multi-tenant environment
> --- Almost same way like yarn
> -- Cons
> --- Control long running 'background' job - service - when mesos runs
> --- Satisfy all slave - or host - to have one shuffle service all the time
> - Launching jobs within shuffle service
> -- Pros
> --- Easy to implement
> --- Don't consider whether shuffle service exists or not.
> -- Cons
> --- exists multiple shuffle services under multi-tenant environment
> --- Control shuffle service port dynamically on multi-user environment
> In my opinion, the first one is better idea to support external shuffle 
> service. Please leave comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster

2015-01-11 Thread Jongyoul Lee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jongyoul Lee updated SPARK-5197:

Description: 
I think dynamic allocation is almost satisfied on mesos' fine-grained mode, 
which already offers resources dynamically, and returns automatically when a 
task is finished. It, however, doesn't have a mechanism on support external 
shuffle service like yarn's way, which is AuxiliaryService. Because mesos 
doesn't support AusiliaryService, we think a different way to do this.

- Launching a shuffle service like a spark job on same cluster
-- Pros
--- Support multi-tenant environment
--- Almost same way like yarn
-- Cons
--- Control long running 'background' job - service - when mesos runs
--- Satisfy all slave - or host - to have one shuffle service all the time
- Launching jobs within shuffle service
-- Pros
--- Easy to implement
--- Don't consider whether shuffle service exists or not.
-- Cons
--- exists multiple shuffle services under multi-tenant environment
--- Control shuffle service port dynamically on multi-user environment

In my opinion, the first one is better idea to support external shuffle 
service. Please leave comments.

  was:
I think dynamic allocation is almost satisfied on mesos' fine-grained mode, 
which already offers resources dynamically, and returns automatically when a 
task is finished. We, however, don't have a mechanism on support external 
shuffle service like yarn's way, which is AuxiliaryService. Because mesos 
doesn't support AusiliaryService, we think a different way to do this.

- Launching a shuffle service like a spark job on same cluster
-- Pros
--- Support multi-tenant environment
--- Almost same way like yarn
-- Cons
--- Control long running 'background' job - service - when mesos runs
--- Satisfy all slave - or host - to have one shuffle service all the time
- Launching jobs within shuffle service
-- Pros
--- Easy to implement
--- Don't consider whether shuffle service exists or not.
-- Cons
--- exists multiple shuffle services under multi-tenant environment
--- Control shuffle service port dynamically on multi-user environment

In my opinion, the first one is better idea to support external shuffle 
service. Please leave comments.


> Support external shuffle service in fine-grained mode on mesos cluster
> --
>
> Key: SPARK-5197
> URL: https://issues.apache.org/jira/browse/SPARK-5197
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Mesos, Shuffle
>Reporter: Jongyoul Lee
>
> I think dynamic allocation is almost satisfied on mesos' fine-grained mode, 
> which already offers resources dynamically, and returns automatically when a 
> task is finished. It, however, doesn't have a mechanism on support external 
> shuffle service like yarn's way, which is AuxiliaryService. Because mesos 
> doesn't support AusiliaryService, we think a different way to do this.
> - Launching a shuffle service like a spark job on same cluster
> -- Pros
> --- Support multi-tenant environment
> --- Almost same way like yarn
> -- Cons
> --- Control long running 'background' job - service - when mesos runs
> --- Satisfy all slave - or host - to have one shuffle service all the time
> - Launching jobs within shuffle service
> -- Pros
> --- Easy to implement
> --- Don't consider whether shuffle service exists or not.
> -- Cons
> --- exists multiple shuffle services under multi-tenant environment
> --- Control shuffle service port dynamically on multi-user environment
> In my opinion, the first one is better idea to support external shuffle 
> service. Please leave comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster

2015-01-11 Thread Jongyoul Lee (JIRA)
Jongyoul Lee created SPARK-5197:
---

 Summary: Support external shuffle service in fine-grained mode on 
mesos cluster
 Key: SPARK-5197
 URL: https://issues.apache.org/jira/browse/SPARK-5197
 Project: Spark
  Issue Type: Improvement
  Components: Deploy, Mesos, Shuffle
Reporter: Jongyoul Lee


I think dynamic allocation is almost satisfied on mesos' fine-grained mode, 
which already offers resources dynamically, and returns automatically when a 
task is finished. We, however, don't have a mechanism on support external 
shuffle service like yarn's way, which is AuxiliaryService. Because mesos 
doesn't support AusiliaryService, we think a different way to do this.

- Launching a shuffle service like a spark job on same cluster
-- Pros
--- Support multi-tenant environment
--- Almost same way like yarn
-- Cons
--- Control long running 'background' job - service - when mesos runs
--- Satisfy all slave - or host - to have one shuffle service all the time
- Launching jobs within shuffle service
-- Pros
--- Easy to implement
--- Don't consider whether shuffle service exists or not.
-- Cons
--- exists multiple shuffle services under multi-tenant environment
--- Control shuffle service port dynamically on multi-user environment

In my opinion, the first one is better idea to support external shuffle 
service. Please leave comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4033) Integer overflow when SparkPi is called with more than 25000 slices

2015-01-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4033.

  Resolution: Fixed
   Fix Version/s: 1.3.0
Assignee: SaintBacchus
Target Version/s: 1.3.0

> Integer overflow when SparkPi is called with more than 25000 slices
> ---
>
> Key: SPARK-4033
> URL: https://issues.apache.org/jira/browse/SPARK-4033
> Project: Spark
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 1.2.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Fix For: 1.3.0
>
>
> If input of the SparkPi args is larger than the 25000, the integer 'n' inside 
> the code will be overflow, and may be a negative number.
> And it causes  the (0 until n) Seq as an empty seq, then doing the action 
> 'reduce'  will throw the UnsupportedOperationException("empty collection").



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-4951) A busy executor may be killed when dynamicAllocation is enabled

2015-01-11 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-4951.

  Resolution: Fixed
   Fix Version/s: 1.2.1
  1.3.0
Target Version/s: 1.3.0, 1.2.1

> A busy executor may be killed when dynamicAllocation is enabled
> ---
>
> Key: SPARK-4951
> URL: https://issues.apache.org/jira/browse/SPARK-4951
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 1.3.0, 1.2.1
>
>
> If a task runs more than `spark.dynamicAllocation.executorIdleTimeout`, the 
> executor which runs this task will be killed.
> The following steps (yarn-client mode) can reproduce this bug:
> 1. Start `spark-shell` using
> {code}
> ./bin/spark-shell --conf "spark.shuffle.service.enabled=true" \
> --conf "spark.dynamicAllocation.minExecutors=1" \
> --conf "spark.dynamicAllocation.maxExecutors=4" \
> --conf "spark.dynamicAllocation.enabled=true" \
> --conf "spark.dynamicAllocation.executorIdleTimeout=30" \
> --master yarn-client \
> --driver-memory 512m \
> --executor-memory 512m \
> --executor-cores 1
> {code}
> 2. Wait more than 30 seconds until there is only one executor.
> 3. Run the following code (a task needs at least 50 seconds to finish)
> {code}
> val r = sc.parallelize(1 to 1000, 20).map{t => Thread.sleep(1000); 
> t}.groupBy(_ % 2).collect()
> {code}
> 4. Executors will be killed and allocated all the time, which makes the Job 
> fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5073) "spark.storage.memoryMapThreshold" has two default values

2015-01-11 Thread Aaron Davidson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Davidson resolved SPARK-5073.
---
Resolution: Fixed

> "spark.storage.memoryMapThreshold" has two default values
> -
>
> Key: SPARK-5073
> URL: https://issues.apache.org/jira/browse/SPARK-5073
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Jianhui Yuan
>Priority: Minor
>
> In org.apache.spark.storage.DiskStore:
>  val minMemoryMapBytes = 
> blockManager.conf.getLong("spark.storage.memoryMapThreshold", 2 * 4096L)
> In org.apache.spark.network.util.TransportConf:
>  public int memoryMapBytes() {
>  return conf.getInt("spark.storage.memoryMapThreshold", 2 * 1024 * 
> 1024);
>  }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4159) Maven build doesn't run JUnit test suites

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273062#comment-14273062
 ] 

Apache Spark commented on SPARK-4159:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/3993

> Maven build doesn't run JUnit test suites
> -
>
> Key: SPARK-4159
> URL: https://issues.apache.org/jira/browse/SPARK-4159
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Patrick Wendell
>Assignee: Sean Owen
>Priority: Critical
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> It turns out our Maven build isn't running any Java test suites, and likely 
> hasn't ever.
> After some fishing I believe the following is the issue. We use scalatest [1] 
> in our maven build which, by default can't automatically detect JUnit tests. 
> Scalatest will allow you to enumerate a list of suites via "JUnitClasses", 
> but I cant' find a way for it to auto-detect all JUnit tests. It turns out 
> this works in SBT because of our use of the junit-interface[2] which does 
> this for you. 
> An okay fix for this might be to simply enable the normal (surefire) maven 
> tests in addition to our scalatest in the maven build. The only thing to 
> watch out for is that they don't overlap in some way. We'd also have to copy 
> over environment variables, memory settings, etc to that plugin.
> [1] http://www.scalatest.org/user_guide/using_the_scalatest_maven_plugin
> [2] https://github.com/sbt/junit-interface



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5172) spark-examples-***.jar shades a wrong Hadoop distribution

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273041#comment-14273041
 ] 

Apache Spark commented on SPARK-5172:
-

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/3992

> spark-examples-***.jar shades a wrong Hadoop distribution
> -
>
> Key: SPARK-5172
> URL: https://issues.apache.org/jira/browse/SPARK-5172
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Shixiong Zhu
>Priority: Minor
>
> Steps to check it:
> 1. Download  "spark-1.2.0-bin-hadoop2.4.tgz" from 
> http://www.apache.org/dyn/closer.cgi/spark/spark-1.2.0/spark-1.2.0-bin-hadoop2.4.tgz
> 2. unzip `spark-examples-1.2.0-hadoop2.4.0.jar`.
> 3. There is a file called `org/apache/hadoop/package-info.class` in the jar. 
> It doesn't exist in hadoop 2.4. 
> 4. Run "javap -classpath . -private -c -v  org.apache.hadoop.package-info"
> {code}
> Compiled from "package-info.java"
> interface org.apache.hadoop.package-info
>   SourceFile: "package-info.java"
>   RuntimeVisibleAnnotations: length = 0x24
>00 01 00 06 00 06 00 07 73 00 08 00 09 73 00 0A
>00 0B 73 00 0C 00 0D 73 00 0E 00 0F 73 00 10 00
>11 73 00 12 
>   minor version: 0
>   major version: 50
>   Constant pool:
> const #1 = Asciz  org/apache/hadoop/package-info;
> const #2 = class  #1; //  "org/apache/hadoop/package-info"
> const #3 = Asciz  java/lang/Object;
> const #4 = class  #3; //  java/lang/Object
> const #5 = Asciz  package-info.java;
> const #6 = Asciz  Lorg/apache/hadoop/HadoopVersionAnnotation;;
> const #7 = Asciz  version;
> const #8 = Asciz  1.2.1;
> const #9 = Asciz  revision;
> const #10 = Asciz 1503152;
> const #11 = Asciz user;
> const #12 = Asciz mattf;
> const #13 = Asciz date;
> const #14 = Asciz Wed Jul 24 13:39:35 PDT 2013;
> const #15 = Asciz url;
> const #16 = Asciz 
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2;
> const #17 = Asciz srcChecksum;
> const #18 = Asciz 6923c86528809c4e7e6f493b6b413a9a;
> const #19 = Asciz SourceFile;
> const #20 = Asciz RuntimeVisibleAnnotations;
> {
> }
> {code}
> The version is {{1.2.1}}
> It comes because a wrong hbase version settings in examples project. Here is 
> a part of the dependencly tree when runnning "mvn -Pyarn -Phadoop-2.4 
> -Dhadoop.version=2.4.0 -pl examples dependency:tree"
> {noformat}
> [INFO] +- org.apache.hbase:hbase-testing-util:jar:0.98.7-hadoop1:compile
> [INFO] |  +- 
> org.apache.hbase:hbase-common:test-jar:tests:0.98.7-hadoop1:compile
> [INFO] |  +- 
> org.apache.hbase:hbase-server:test-jar:tests:0.98.7-hadoop1:compile
> [INFO] |  |  +- com.sun.jersey:jersey-core:jar:1.8:compile
> [INFO] |  |  +- com.sun.jersey:jersey-json:jar:1.8:compile
> [INFO] |  |  |  +- org.codehaus.jettison:jettison:jar:1.1:compile
> [INFO] |  |  |  +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:compile
> [INFO] |  |  |  \- org.codehaus.jackson:jackson-xc:jar:1.7.1:compile
> [INFO] |  |  \- com.sun.jersey:jersey-server:jar:1.8:compile
> [INFO] |  | \- asm:asm:jar:3.3.1:test
> [INFO] |  +- org.apache.hbase:hbase-hadoop1-compat:jar:0.98.7-hadoop1:compile
> [INFO] |  +- 
> org.apache.hbase:hbase-hadoop1-compat:test-jar:tests:0.98.7-hadoop1:compile
> [INFO] |  +- org.apache.hadoop:hadoop-core:jar:1.2.1:compile
> [INFO] |  |  +- xmlenc:xmlenc:jar:0.52:compile
> [INFO] |  |  +- commons-configuration:commons-configuration:jar:1.6:compile
> [INFO] |  |  |  +- commons-digester:commons-digester:jar:1.8:compile
> [INFO] |  |  |  |  \- commons-beanutils:commons-beanutils:jar:1.7.0:compile
> [INFO] |  |  |  \- commons-beanutils:commons-beanutils-core:jar:1.8.0:compile
> [INFO] |  |  \- commons-el:commons-el:jar:1.0:compile
> [INFO] |  +- org.apache.hadoop:hadoop-test:jar:1.2.1:compile
> [INFO] |  |  +- org.apache.ftpserver:ftplet-api:jar:1.0.0:compile
> [INFO] |  |  +- org.apache.mina:mina-core:jar:2.0.0-M5:compile
> [INFO] |  |  +- org.apache.ftpserver:ftpserver-core:jar:1.0.0:compile
> [INFO] |  |  \- org.apache.ftpserver:ftpserver-deprecated:jar:1.0.0-M2:compile
> [INFO] |  +- 
> com.github.stephenc.findbugs:findbugs-annotations:jar:1.3.9-1:compile
> [INFO] |  \- junit:junit:jar:4.10:test
> [INFO] | \- org.hamcrest:hamcrest-core:jar:1.1:test
> {noformat}
> If I ran `mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -pl examples -am 
> dependency:tree -Dhbase.profile=hadoop2`, the dependency tree is right.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5008) Persistent HDFS does not recognize EBS Volumes

2015-01-11 Thread Nicholas Chammas (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273007#comment-14273007
 ] 

Nicholas Chammas commented on SPARK-5008:
-

Use [{{copy-dir}}|https://github.com/mesos/spark-ec2/blob/v4/copy-dir.sh], 
which is installed by default, from the master.

> Persistent HDFS does not recognize EBS Volumes
> --
>
> Key: SPARK-5008
> URL: https://issues.apache.org/jira/browse/SPARK-5008
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.2.0
> Environment: 8 Node Cluster Generated from 1.2.0 spark-ec2 script.
> -m c3.2xlarge -t c3.8xlarge --ebs-vol-size 300 --ebs-vol-type gp2 
> --ebs-vol-num 1
>Reporter: Brad Willard
>
> Cluster is built with correct size EBS volumes. It creates the volume at 
> /dev/xvds and it mounted to /vol0. However when you start persistent hdfs 
> with start-all script, it starts but it isn't correctly configured to use the 
> EBS volume.
> I'm assuming some sym links or expected mounts are not correctly configured.
> This has worked flawlessly on all previous versions of spark.
> I have a stupid workaround by installing pssh and mucking with it by mounting 
> it to /vol, which worked, however it doesn't not work between restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5008) Persistent HDFS does not recognize EBS Volumes

2015-01-11 Thread Brad Willard (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272991#comment-14272991
 ] 

Brad Willard commented on SPARK-5008:
-

[~nchammas] I can try that once I get back into the office. Probably by 
Wednesday. Once I update the core-site.xml, what's the correct way to sync it 
to all the slaves?

> Persistent HDFS does not recognize EBS Volumes
> --
>
> Key: SPARK-5008
> URL: https://issues.apache.org/jira/browse/SPARK-5008
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 1.2.0
> Environment: 8 Node Cluster Generated from 1.2.0 spark-ec2 script.
> -m c3.2xlarge -t c3.8xlarge --ebs-vol-size 300 --ebs-vol-type gp2 
> --ebs-vol-num 1
>Reporter: Brad Willard
>
> Cluster is built with correct size EBS volumes. It creates the volume at 
> /dev/xvds and it mounted to /vol0. However when you start persistent hdfs 
> with start-all script, it starts but it isn't correctly configured to use the 
> EBS volume.
> I'm assuming some sym links or expected mounts are not correctly configured.
> This has worked flawlessly on all previous versions of spark.
> I have a stupid workaround by installing pssh and mucking with it by mounting 
> it to /vol, which worked, however it doesn't not work between restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5162) Python yarn-cluster mode

2015-01-11 Thread Lianhui Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272943#comment-14272943
 ] 

Lianhui Wang commented on SPARK-5162:
-

[~dklassen] i submit a PR for this 
issue.https://github.com/apache/spark/pull/3976
so i think you can try it. if there are any questions or suggestions,please 
tell me.

> Python yarn-cluster mode
> 
>
> Key: SPARK-5162
> URL: https://issues.apache.org/jira/browse/SPARK-5162
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, YARN
>Reporter: Dana Klassen
>  Labels: cluster, python, yarn
>
> Running pyspark in yarn is currently limited to ‘yarn-client’ mode. It would 
> be great to be able to submit python applications to the cluster and (just 
> like java classes) have the resource manager setup an AM on any node in the 
> cluster. Does anyone know the issues blocking this feature? I was snooping 
> around with enabling python apps:
> Removing the logic stopping python and yarn-cluster from sparkSubmit.scala
> ...
> // The following modes are not supported or applicable
> (clusterManager, deployMode) match {
>   ...
>   case (_, CLUSTER) if args.isPython =>
> printErrorAndExit("Cluster deploy mode is currently not supported for 
> python applications.")
>   ...
> }
> …
> and submitting application via:
> HADOOP_CONF_DIR={{insert conf dir}} ./bin/spark-submit --master yarn-cluster 
> --num-executors 2  —-py-files {{insert location of egg here}} 
> --executor-cores 1  ../tools/canary.py
> Everything looks to run alright, pythonRunner is picked up as main class, 
> resources get setup, yarn client gets launched but falls flat on its face:
> 2015-01-08 18:48:03,444 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
>  DEBUG: FAILED { 
> {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py, 
> 1420742868009, FILE, null }, Resource 
> {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py changed 
> on src filesystem (expected 1420742868009, was 1420742869284
> and
> 2015-01-08 18:48:03,446 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
>  Resource 
> {{redacted}}/.sparkStaging/application_1420594669313_4687/canary.py(->/data/4/yarn/nm/usercache/klassen/filecache/11/canary.py)
>  transitioned from DOWNLOADING to FAILED
> Tracked this down to the apache hadoop code(FSDownload.java line 249) related 
> to container localization of files upon downloading. At this point thought it 
> would be best to raise the issue here and get input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5196) Add comment field in StructField

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272937#comment-14272937
 ] 

Apache Spark commented on SPARK-5196:
-

User 'OopsOutOfMemory' has created a pull request for this issue:
https://github.com/apache/spark/pull/3991

> Add comment field in StructField
> 
>
> Key: SPARK-5196
> URL: https://issues.apache.org/jira/browse/SPARK-5196
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: shengli
> Fix For: 1.3.0
>
>
> StructField should contains name, type, nullable, comment  etc...
> Add support comment field in StructField.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5196) Add comment field in StructField

2015-01-11 Thread shengli (JIRA)
shengli created SPARK-5196:
--

 Summary: Add comment field in StructField
 Key: SPARK-5196
 URL: https://issues.apache.org/jira/browse/SPARK-5196
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: shengli
 Fix For: 1.3.0


StructField should contains name, type, nullable, comment  etc...

Add support comment field in StructField.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5195) when hive table is query with alias the cache data lose effectiveness.

2015-01-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272934#comment-14272934
 ] 

Apache Spark commented on SPARK-5195:
-

User 'seayi' has created a pull request for this issue:
https://github.com/apache/spark/pull/3898

> when hive table is query with alias  the cache data  lose effectiveness.
> 
>
> Key: SPARK-5195
> URL: https://issues.apache.org/jira/browse/SPARK-5195
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: yixiaohua
>
> override the MetastoreRelation's sameresult method only compare databasename 
> and table name
> because in previous :
> cache table t1;
> select count() from t1;
> it will read data from memory but the sql below will not,instead it read from 
> hdfs:
> select count() from t1 t;
> because cache data is keyed by logical plan and compare with sameResult ,so 
> when table with alias the same table 's logicalplan is not the same logical 
> plan with out alias so modify the sameresult method only compare databasename 
> and table name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5195) when hive table is query with alias the cache data lose effectiveness.

2015-01-11 Thread yixiaohua (JIRA)
yixiaohua created SPARK-5195:


 Summary: when hive table is query with alias  the cache data  lose 
effectiveness.
 Key: SPARK-5195
 URL: https://issues.apache.org/jira/browse/SPARK-5195
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: yixiaohua


override the MetastoreRelation's sameresult method only compare databasename 
and table name

because in previous :
cache table t1;
select count() from t1;
it will read data from memory but the sql below will not,instead it read from 
hdfs:
select count() from t1 t;

because cache data is keyed by logical plan and compare with sameResult ,so 
when table with alias the same table 's logicalplan is not the same logical 
plan with out alias so modify the sameresult method only compare databasename 
and table name



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5192) Parquet fails to parse schema contains '\r'

2015-01-11 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated SPARK-5192:
-
Summary: Parquet fails to parse schema contains '\r'  (was: Parquet fails 
to parse schemas contains '\r')

> Parquet fails to parse schema contains '\r'
> ---
>
> Key: SPARK-5192
> URL: https://issues.apache.org/jira/browse/SPARK-5192
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
> Environment: windows7 + Intellj idea 13.0.2 
>Reporter: cen yuhai
>Priority: Critical
> Fix For: 1.3.0
>
>
> I think this is actually a bug in parquet, when i debuged 'ParquetTestData', 
> i found a exception as below. So i  download the source of MessageTypeParser, 
> the funtion 'isWhitespace' do not check for '\r'
> private boolean isWhitespace(String t) {
>   return t.equals(" ") || t.equals("\t") || t.equals("\n");
> }
> So I replace all '\r' to work around this issue.
>   val subTestSchema =
> """
>   message myrecord {
>   optional boolean myboolean;
>   optional int64 mylong;
>   }
> """.replaceAll("\r","")
> at line 0: message myrecord {
>   at 
> parquet.schema.MessageTypeParser.asRepetition(MessageTypeParser.java:203)
>   at parquet.schema.MessageTypeParser.addType(MessageTypeParser.java:101)
>   at 
> parquet.schema.MessageTypeParser.addGroupTypeFields(MessageTypeParser.java:96)
>   at parquet.schema.MessageTypeParser.parse(MessageTypeParser.java:89)
>   at 
> parquet.schema.MessageTypeParser.parseMessageType(MessageTypeParser.java:79)
>   at 
> org.apache.spark.sql.parquet.ParquetTestData$.writeFile(ParquetTestData.scala:221)
>   at 
> org.apache.spark.sql.parquet.ParquetQuerySuite.beforeAll(ParquetQuerySuite.scala:92)
>   at 
> org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
>   at 
> org.apache.spark.sql.parquet.ParquetQuerySuite.beforeAll(ParquetQuerySuite.scala:85)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
>   at 
> org.apache.spark.sql.parquet.ParquetQuerySuite.run(ParquetQuerySuite.scala:85)
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5194) ADD JAR doesn't update classpath until reconnect

2015-01-11 Thread Oleg Danilov (JIRA)
Oleg Danilov created SPARK-5194:
---

 Summary: ADD JAR doesn't update classpath until reconnect
 Key: SPARK-5194
 URL: https://issues.apache.org/jira/browse/SPARK-5194
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Oleg Danilov


Steps to reproduce:

beeline>  !connect jdbc:hive2://vmhost-vm0:1
   
0: jdbc:hive2://vmhost-vm0:1> add jar 
./target/nexr-hive-udf-0.2-SNAPSHOT.jar
0: jdbc:hive2://vmhost-vm0:1> CREATE TEMPORARY FUNCTION nvl AS 
'com.nexr.platform.hive.udf.GenericUDFNVL';
0: jdbc:hive2://vmhost-vm0:1> select nvl(imsi,'test') from 
ps_cei_index_1_week limit 1;
Error: java.lang.ClassNotFoundException: 
com.nexr.platform.hive.udf.GenericUDFNVL (state=,code=0)
0: jdbc:hive2://vmhost-vm0:1> !reconnect
Reconnecting to "jdbc:hive2://vmhost-vm0:1"...
Closing: org.apache.hive.jdbc.HiveConnection@3f18dc75: {1}
Connected to: Spark SQL (version 1.2.0)
Driver: null (version null)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://vmhost-vm0:1> select nvl(imsi,'test') from 
ps_cei_index_1_week limit 1;
+--+
| _c0  |
+--+
| -1   |
+--+
1 row selected (1.605 seconds)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4861) Refactory command in spark sql

2015-01-11 Thread wangfei (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272862#comment-14272862
 ] 

wangfei commented on SPARK-4861:


[~yhuai]of course if possible, but i have not find a way to remove it since in 
HiveCommandStrategy we need to distinguish hive metastore table and temporary 
table, so now still keep HiveCommandStrategy there. any idea here?

> Refactory command in spark sql
> --
>
> Key: SPARK-4861
> URL: https://issues.apache.org/jira/browse/SPARK-4861
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.1.1
>Reporter: wangfei
> Fix For: 1.3.0
>
>
> Fix a todo in spark sql:  remove ```Command``` and use ```RunnableCommand``` 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5166) Stabilize Spark SQL APIs

2015-01-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-5166:
---
Assignee: Reynold Xin

> Stabilize Spark SQL APIs
> 
>
> Key: SPARK-5166
> URL: https://issues.apache.org/jira/browse/SPARK-5166
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Before we take Spark SQL out of alpha, we need to audit the APIs and 
> stabilize them. 
> As a general rule, everything under org.apache.spark.sql.catalyst should not 
> be exposed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5166) Stabilize Spark SQL APIs

2015-01-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-5166:
---
Priority: Critical  (was: Major)

> Stabilize Spark SQL APIs
> 
>
> Key: SPARK-5166
> URL: https://issues.apache.org/jira/browse/SPARK-5166
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Critical
>
> Before we take Spark SQL out of alpha, we need to audit the APIs and 
> stabilize them. 
> As a general rule, everything under org.apache.spark.sql.catalyst should not 
> be exposed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5193) Make Spark SQL API usable in Java and remove the Java-specific API

2015-01-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-5193:
---
Description: 
Java version of the SchemaRDD API causes high maintenance burden for Spark SQL 
itself and downstream libraries (e.g. MLlib pipeline API needs to support both 
JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it usable for 
Java, and then we can remove the Java specific version. 

Things to remove include (Java version of):
- data type
- Row
- SQLContext
- HiveContext

Things to consider:
- Scala and Java have a different collection library.
- Scala and Java (8) have different closure interface.
- Scala and Java can have duplicate definitions of common classes, such as 
BigDecimal.


  was:
Java version of the SchemaRDD API causes high maintenance burden for Spark SQL 
itself and downstream libraries (e.g. MLlib pipeline API needs to support both 
JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it usable for 
Java, and then we can remove the Java specific version. 

Things to remove include (Java version of):
- data type
- Row
- SQLContext
- HiveContext

Things to consider:
- Scala and Java have a different collection library.
- Scala and Java (8) have different closure interface.




> Make Spark SQL API usable in Java and remove the Java-specific API
> --
>
> Key: SPARK-5193
> URL: https://issues.apache.org/jira/browse/SPARK-5193
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Java version of the SchemaRDD API causes high maintenance burden for Spark 
> SQL itself and downstream libraries (e.g. MLlib pipeline API needs to support 
> both JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it 
> usable for Java, and then we can remove the Java specific version. 
> Things to remove include (Java version of):
> - data type
> - Row
> - SQLContext
> - HiveContext
> Things to consider:
> - Scala and Java have a different collection library.
> - Scala and Java (8) have different closure interface.
> - Scala and Java can have duplicate definitions of common classes, such as 
> BigDecimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5193) Make Spark SQL API usable in Java and remove the Java-specific API

2015-01-11 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14272860#comment-14272860
 ] 

Reynold Xin commented on SPARK-5193:


cc [~marmbrus]

> Make Spark SQL API usable in Java and remove the Java-specific API
> --
>
> Key: SPARK-5193
> URL: https://issues.apache.org/jira/browse/SPARK-5193
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Java version of the SchemaRDD API causes high maintenance burden for Spark 
> SQL itself and downstream libraries (e.g. MLlib pipeline API needs to support 
> both JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it 
> usable for Java, and then we can remove the Java specific version. 
> Things to remove include (Java version of):
> - data type
> - Row
> - SQLContext
> - HiveContext
> Things to consider:
> - Scala and Java have a different collection library.
> - Scala and Java (8) have different closure interface.
> - Scala and Java can have duplicate definitions of common classes, such as 
> BigDecimal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5193) Make Spark SQL API usable in Java and remove the Java-specific API

2015-01-11 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-5193:
--

 Summary: Make Spark SQL API usable in Java and remove the 
Java-specific API
 Key: SPARK-5193
 URL: https://issues.apache.org/jira/browse/SPARK-5193
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


Java version of the SchemaRDD API causes high maintenance burden for Spark SQL 
itself and downstream libraries (e.g. MLlib pipeline API needs to support both 
JavaSchemaRDD and SchemaRDD). We can audit the Scala API and make it usable for 
Java, and then we can remove the Java specific version. 

Things to remove include (Java version of):
- data type
- Row
- SQLContext
- HiveContext

Things to consider:
- Scala and Java have a different collection library.
- Scala and Java (8) have different closure interface.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3299) [SQL] Public API in SQLContext to list tables

2015-01-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-3299:
---
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-5166

> [SQL] Public API in SQLContext to list tables
> -
>
> Key: SPARK-3299
> URL: https://issues.apache.org/jira/browse/SPARK-3299
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 1.0.2
>Reporter: Evan Chan
>Assignee: Bill Bejeck
>Priority: Minor
>  Labels: newbie
>
> There is no public API in SQLContext to list the current tables.  This would 
> be pretty helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5167) Move Row into sql package and make it usable for Java

2015-01-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-5167:
---
Assignee: Reynold Xin

> Move Row into sql package and make it usable for Java
> -
>
> Key: SPARK-5167
> URL: https://issues.apache.org/jira/browse/SPARK-5167
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> This will help us eliminate the duplicated Java code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-2096) Correctly parse dot notations for accessing an array of structs

2015-01-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-2096:
---
Target Version/s: 1.3.0  (was: 1.2.0)

> Correctly parse dot notations for accessing an array of structs
> ---
>
> Key: SPARK-2096
> URL: https://issues.apache.org/jira/browse/SPARK-2096
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.0.0
>Reporter: Yin Huai
>Priority: Minor
>  Labels: starter
> Fix For: 1.2.0
>
>
> For example, "arrayOfStruct" is an array of structs and every element of this 
> array has a field called "field1". "arrayOfStruct[0].field1" means to access 
> the value of "field1" for the first element of "arrayOfStruct", but the SQL 
> parser (in sql-core) treats "field1" as an alias. Also, 
> "arrayOfStruct.field1" means to access all values of "field1" in this array 
> of structs and the returns those values as an array. But, the SQL parser 
> cannot resolve it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4508) Native Date type for SQL92 Date

2015-01-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-4508:
---
Assignee: Adrian Wang

> Native Date type for SQL92 Date
> ---
>
> Key: SPARK-4508
> URL: https://issues.apache.org/jira/browse/SPARK-4508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Adrian Wang
>Assignee: Adrian Wang
>
> Store daysSinceEpoch as an Int(4 bytes), instead of using java.sql.Date(8 
> bytes as Long) in catalyst row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4508) Native Date type for SQL92 Date

2015-01-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-4508:
---
Summary: Native Date type for SQL92 Date  (was: build native date type to 
conform behavior to Hive)

> Native Date type for SQL92 Date
> ---
>
> Key: SPARK-4508
> URL: https://issues.apache.org/jira/browse/SPARK-4508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Adrian Wang
>
> Store daysSinceEpoch as an Int(4 bytes), instead of using java.sql.Date(8 
> bytes as Long) in catalyst row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4508) Native Date type for SQL92 Date

2015-01-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-4508:
---
Target Version/s: 1.3.0

> Native Date type for SQL92 Date
> ---
>
> Key: SPARK-4508
> URL: https://issues.apache.org/jira/browse/SPARK-4508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Adrian Wang
>Assignee: Adrian Wang
>
> Store daysSinceEpoch as an Int(4 bytes), instead of using java.sql.Date(8 
> bytes as Long) in catalyst row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4508) build native date type to conform behavior to Hive

2015-01-11 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-4508:
---
Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-5166

> build native date type to conform behavior to Hive
> --
>
> Key: SPARK-4508
> URL: https://issues.apache.org/jira/browse/SPARK-4508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Adrian Wang
>
> Store daysSinceEpoch as an Int(4 bytes), instead of using java.sql.Date(8 
> bytes as Long) in catalyst row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org