date:20140813


[ 
https://issues.apache.org/jira/browse/SPARK-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095207#comment-14095207
 ] 

Apache Spark commented on SPARK-3001:
-

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/1917

 Improve Spearman's correlation
 --

 Key: SPARK-3001
 URL: https://issues.apache.org/jira/browse/SPARK-3001
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng

 The current implementation requires sorting individual columns, which could 
 be done with a global sort.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3002) Reuse Netty clients

Reynold Xin created SPARK-3002:
--

 Summary: Reuse Netty clients
 Key: SPARK-3002
 URL: https://issues.apache.org/jira/browse/SPARK-3002
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin


To create a client manager that reuses clients (and connections).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3002) Reuse Netty clients


 [ 
https://issues.apache.org/jira/browse/SPARK-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-3002:
---

Description: 
To create a client manager that reuses clients (and connections).

Can also use IdleStateHandler to clean up idle connections.

http://netty.io/4.0/api/io/netty/handler/timeout/IdleStateHandler.html

  was:To create a client manager that reuses clients (and connections).


 Reuse Netty clients
 ---

 Key: SPARK-3002
 URL: https://issues.apache.org/jira/browse/SPARK-3002
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle, Spark Core
Reporter: Reynold Xin

 To create a client manager that reuses clients (and connections).
 Can also use IdleStateHandler to clean up idle connections.
 http://netty.io/4.0/api/io/netty/handler/timeout/IdleStateHandler.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2993) colStats in Statistics as wrapper around MultivariateStatisticalSummary in Scala and Python

2014-08-13 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-2993.
--

  Resolution: Implemented
   Fix Version/s: 1.1.0
Target Version/s: 1.1.0

 colStats in Statistics as wrapper around MultivariateStatisticalSummary in 
 Scala and Python
 ---

 Key: SPARK-2993
 URL: https://issues.apache.org/jira/browse/SPARK-2993
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Doris Xin
Assignee: Doris Xin
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3003) FailedStage could not be cancelled by DAGScheduler when cancelJob or cancelStage

2014-08-13 Thread YanTang Zhai (JIRA)

YanTang Zhai created SPARK-3003:
---

 Summary: FailedStage could not be cancelled by DAGScheduler when 
cancelJob or cancelStage
 Key: SPARK-3003
 URL: https://issues.apache.org/jira/browse/SPARK-3003
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: YanTang Zhai
Priority: Minor


Some stage is changed from running to failed, then DAGSCheduler could not  
cancel it when cancelJob or cancelStage. Since in failJobAndIndependentStages, 
DAGSCheduler will only cancel runningStage and post SparkListenerStageCompleted 
for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2890) Spark SQL should allow SELECT with duplicated columns

2014-08-13 Thread Jianshi Huang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095224#comment-14095224
 ] 

Jianshi Huang commented on SPARK-2890:
--

I think the fault is on my side. I should've changed project the duplicated 
columns into different names.

So the current behavior makes sense. I'll close this issue.

Jianshi

 Spark SQL should allow SELECT with duplicated columns
 -

 Key: SPARK-2890
 URL: https://issues.apache.org/jira/browse/SPARK-2890
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Jianshi Huang

 Spark reported error java.lang.IllegalArgumentException with messages:
 java.lang.IllegalArgumentException: requirement failed: Found fields with the 
 same name.
 at scala.Predef$.require(Predef.scala:233)
 at 
 org.apache.spark.sql.catalyst.types.StructType.init(dataTypes.scala:317)
 at 
 org.apache.spark.sql.catalyst.types.StructType$.fromAttributes(dataTypes.scala:310)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToString(ParquetTypes.scala:306)
 at 
 org.apache.spark.sql.parquet.ParquetTableScan.execute(ParquetTableOperations.scala:83)
 at 
 org.apache.spark.sql.execution.Filter.execute(basicOperators.scala:57)
 at 
 org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:85)
 at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:433)
 After trial and error, it seems it's caused by duplicated columns in my 
 select clause.
 I made the duplication on purpose for my code to parse correctly. I think we 
 should allow users to specify duplicated columns as return value.
 Jianshi



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-2890) Spark SQL should allow SELECT with duplicated columns

2014-08-13 Thread Jianshi Huang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianshi Huang closed SPARK-2890.


Resolution: Invalid

 Spark SQL should allow SELECT with duplicated columns
 -

 Key: SPARK-2890
 URL: https://issues.apache.org/jira/browse/SPARK-2890
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Jianshi Huang

 Spark reported error java.lang.IllegalArgumentException with messages:
 java.lang.IllegalArgumentException: requirement failed: Found fields with the 
 same name.
 at scala.Predef$.require(Predef.scala:233)
 at 
 org.apache.spark.sql.catalyst.types.StructType.init(dataTypes.scala:317)
 at 
 org.apache.spark.sql.catalyst.types.StructType$.fromAttributes(dataTypes.scala:310)
 at 
 org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToString(ParquetTypes.scala:306)
 at 
 org.apache.spark.sql.parquet.ParquetTableScan.execute(ParquetTableOperations.scala:83)
 at 
 org.apache.spark.sql.execution.Filter.execute(basicOperators.scala:57)
 at 
 org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:85)
 at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:433)
 After trial and error, it seems it's caused by duplicated columns in my 
 select clause.
 I made the duplication on purpose for my code to parse correctly. I think we 
 should allow users to specify duplicated columns as return value.
 Jianshi



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3004) HiveThriftServer2 throws exception when the result set contains NULL

2014-08-13 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-3004:
-

 Summary: HiveThriftServer2 throws exception when the result set 
contains NULL
 Key: SPARK-3004
 URL: https://issues.apache.org/jira/browse/SPARK-3004
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: Cheng Lian
Priority: Blocker


To reproduce this issue with beeline:

{code}
$ cd $SPARK_HOME
$ ./bin/beeline -u jdbc:hive2://localhost:1 -n lian
...
0: jdbc:hive2://localhost:1 create table src1 (key int, value string);
...
0: jdbc:hive2://localhost:1 load data local inpath 
'./sql/hive/src/test/resources/data/files/kv3.txt' into table src1;
...
0: jdbc:hive2://localhost:1 select * from src1 where key is null;
Error:  (state=,code=0)
{code}

Exception thrown from HiveThriftServer2:

{code}
java.lang.RuntimeException: Failed to check null bit for primitive int value.
at scala.sys.package$.error(package.scala:27)
at 
org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(Row.scala:145)
at 
org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.getNextRowSet(SparkSQLOperationManager.scala:80)
at 
org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170)
at 
org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417)
at 
org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306)
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373)
at 
org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at 
org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
at 
org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)
at 
org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}

The cause is that we didn't check {{isNullAt}} in 
{{SparkSQLOperationManager.getNextRowSet}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-08-13 Thread Debasish Das (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095232#comment-14095232
]

Debasish Das commented on SPARK-2426:
-

Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week.

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all
QuadraticMinimization.

Right now we support 5 features:

1. Least square
2. Least square with positivity
3. Least square with bounds : generalization of positivity
4. Least square with equality and positivity/bounds for LDA/PLSA
5. Least square + L1 constraint for sparse NMF

There are lot many regularization in Proximal.scala which can be re-used in
mllib updater...L1Updater is an example of Proximal algorithm.

I feel we should move NNLS into QuadraticMinimizer as well and clean ALS.scala
as you have suggested before...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based
on problem we are solving)

The CG core from NNLS should be used for iterative solve when ranks are
high...I need a different variant of CG for Formulation 4 so NNLS CG is not
sufficient for all the formulations.

Right now I am experimenting with ADMM rho and lambda values so that the NNLS
iterations are at par with Least square with positivity.

I will publish results from the comparisons.

I will also publish comparisons with PDCO, ECOS (IPM) and MOSEK with ADMM
variants used in this branch...

For recommendation use-case, I expect to produce Jellylish L1 ball projection
results on netflix/movielens dataset using Formulation 5.

Thanks.
Deb

Quadratic Minimization for MLlib ALS

Key: SPARK-2426
URL: https://issues.apache.org/jira/browse/SPARK-2426
Project: Spark
Issue Type: New Feature
Components: MLlib
Affects Versions: 1.0.0
Reporter: Debasish Das
Assignee: Debasish Das
Original Estimate: 504h
Remaining Estimate: 504h

Current ALS supports least squares and nonnegative least squares.
I presented ADMM and IPM based Quadratic Minimization solvers to be used for
the following ALS problems:
1. ALS with bounds
2. ALS with L1 regularization
3. ALS with Equality constraint and bounds
Initial runtime comparisons are presented at Spark Summit.
http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
Based on Xiangrui's feedback I am currently comparing the ADMM based
Quadratic Minimization solvers with IPM based QpSolvers and the default
ALS/NNLS. I will keep updating the runtime comparison results.
For integration the detailed plan is as follows:
1. Add ADMM and IPM based QuadraticMinimization solvers to
breeze.optimize.quadratic package.
2. Add a QpSolver object in spark mllib optimization which calls breeze
3. Add the QpSolver object in spark mllib ALS

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2973) Add a way to show tables without executing a job


 [ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2973:


Target Version/s: 1.2.0

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson

 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2973) Add a way to show tables without executing a job


[ 
https://issues.apache.org/jira/browse/SPARK-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095241#comment-14095241
 ] 

Michael Armbrust commented on SPARK-2973:
-

We can just override executeCollect() in Commands.

 Add a way to show tables without executing a job
 

 Key: SPARK-2973
 URL: https://issues.apache.org/jira/browse/SPARK-2973
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Aaron Davidson

 Right now, sql(show tables).collect() will start a Spark job which shows up 
 in the UI. There should be a way to get these without this step.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2089) With YARN, preferredNodeLocalityData isn't honored

2014-08-13 Thread Mridul Muralidharan (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095247#comment-14095247
]

Mridul Muralidharan commented on SPARK-2089:

Since I am not maintaining the code anymore, I dont have strong preference
either way.
I am not sure what the format means btw - I see multiple nodes and racks
mentioned in the same group ...

In general though, I am not convinced it is a good direction to take.
1) It is a workaround for a design issue and has non trivial performance
implications (serializing into this form to immediately deserialize it is
expensive for large inputs : not to mention, it gets shipped to executors for
no reason).
2) It locks us into a format which provides inadequate information - number of
blocks per node, size per block, etc is lost (or maybe I just did not
understand what the format is !).
3) We are currently investigating evolving in the opposite direction - add more
information so that we can be more specific about where to allocate executors.
For example: I can see the fairly near term need to associate executors with
accelerator cards (and break the OFF_HEAP - tachyon implicit assumption).
A string representation makes it fragile to evolve.

As I mentioned before, the current yarn allocation model in spark is a very
naive implementation - which I did not expect to survive this long : it was
directly from our prototype.
We really should be modifying it to consider cost of data transfer and
prioritize allocation that way (number of blocks on a node/rack, size of
blocks, number of replicas available, etc).
For small datasets on small enough clusters this is not relevant but has
implications as we grow along both axis.

With YARN, preferredNodeLocalityData isn't honored
---

Key: SPARK-2089
URL: https://issues.apache.org/jira/browse/SPARK-2089
Project: Spark
Issue Type: Bug
Components: YARN
Affects Versions: 1.0.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Critical

When running in YARN cluster mode, apps can pass preferred locality data when
constructing a Spark context that will dictate where to request executor
containers.
This is currently broken because of a race condition. The Spark-YARN code
runs the user class and waits for it to start up a SparkContext. During its
initialization, the SparkContext will create a YarnClusterScheduler, which
notifies a monitor in the Spark-YARN code that . The Spark-Yarn code then
immediately fetches the preferredNodeLocationData from the SparkContext and
uses it to start requesting containers.
But in the SparkContext constructor that takes the preferredNodeLocationData,
setting preferredNodeLocationData comes after the rest of the initialization,
so, if the Spark-YARN code comes around quickly enough after being notified,
the data that's fetched is the empty unset version. The occurred during all
of my runs.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2969) Make ScalaReflection be able to handle ArrayType.containsNull and MapType.valueContainsNull.

2014-08-13 Thread Takuya Ueshin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-2969:
-

Summary: Make ScalaReflection be able to handle ArrayType.containsNull and 
MapType.valueContainsNull.  (was: Make ScalaReflection be able to handle 
MapType.containsNull and MapType.valueContainsNull.)

 Make ScalaReflection be able to handle ArrayType.containsNull and 
 MapType.valueContainsNull.
 

 Key: SPARK-2969
 URL: https://issues.apache.org/jira/browse/SPARK-2969
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Takuya Ueshin
Assignee: Takuya Ueshin

 Make {{ScalaReflection}} be able to handle like:
 - Seq\[Int] as ArrayType(IntegerType, containsNull = false)
 - Seq\[java.lang.Integer] as ArrayType(IntegerType, containsNull = true)
 - Map\[Int, Long] as MapType(IntegerType, LongType, valueContainsNull = false)
 - Map\[Int, java.lang.Long] as MapType(IntegerType, LongType, 
 valueContainsNull = true)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

Xu Zhongxing created SPARK-3005:
---

 Summary: Spark with Mesos fine-grained mode throws 
UnsupportedOperationException in MesosSchedulerBackend.killTask()
 Key: SPARK-3005
 URL: https://issues.apache.org/jira/browse/SPARK-3005
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2
 Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
Reporter: Xu Zhongxing


I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
cassandra cluster.

During the job running, I killed the Cassandra daemon to simulate some failure 
cases. This results in task failures.

If I run the job in Mesos coarse-grained mode, the spark driver program throws 
an exception and shutdown cleanly.

But when I run the job in Mesos fine-grained mode, the spark driver program 
hangs.

The spark log is: 

 INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
Logging.scala (line 58) Cancelling stage 1
 INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
Logging.scala (line 79) Could not cancel tasks for stage 1
java.lang.UnsupportedOperationException
at 
org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
at 
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3006) Failed to execute spark-shell in Windows OS

2014-08-13 Thread Masayoshi TSUZUKI (JIRA)

Masayoshi TSUZUKI created SPARK-3006:


 Summary: Failed to execute spark-shell in Windows OS
 Key: SPARK-3006
 URL: https://issues.apache.org/jira/browse/SPARK-3006
 Project: Spark
  Issue Type: Bug
  Components: Windows
 Environment: Windows 8.1
Reporter: Masayoshi TSUZUKI
Priority: Minor


when execute {{bin\spark-shell.cmd}} in Windows OS, I got errors like folloings:
{noformat}
Error: Cannot load main class from JAR: spark-shell
Run with --help for usage help or --verbose for debug output
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3006) Failed to execute spark-shell in Windows OS

2014-08-13 Thread Masayoshi TSUZUKI (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095284#comment-14095284
 ] 

Masayoshi TSUZUKI commented on SPARK-3006:
--

This is because the option {{--class org.apache.spark.repl.Main}} follows 
argument {{spark-shell}}. Arguments should follow options. bash version of 
spark-shell is correct.

 Failed to execute spark-shell in Windows OS
 ---

 Key: SPARK-3006
 URL: https://issues.apache.org/jira/browse/SPARK-3006
 Project: Spark
  Issue Type: Bug
  Components: Windows
 Environment: Windows 8.1
Reporter: Masayoshi TSUZUKI
Priority: Minor

 when execute {{bin\spark-shell.cmd}} in Windows OS, I got errors like 
 folloings:
 {noformat}
 Error: Cannot load main class from JAR: spark-shell
 Run with --help for usage help or --verbose for debug output
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()


[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095285#comment-14095285
 ] 

Xu Zhongxing commented on SPARK-3005:
-

A related question: why does fined-grain mode and coarse-grained mode perform 
differently? Neither of them implement the killTask() method.

 Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
 MesosSchedulerBackend.killTask()
 ---

 Key: SPARK-3005
 URL: https://issues.apache.org/jira/browse/SPARK-3005
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2
 Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
Reporter: Xu Zhongxing

 I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
 cassandra cluster.
 During the job running, I killed the Cassandra daemon to simulate some 
 failure cases. This results in task failures.
 If I run the job in Mesos coarse-grained mode, the spark driver program 
 throws an exception and shutdown cleanly.
 But when I run the job in Mesos fine-grained mode, the spark driver program 
 hangs.
 The spark log is: 
  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
 Logging.scala (line 58) Cancelling stage 1
  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
 Logging.scala (line 79) Could not cancel tasks for stage 1
 java.lang.UnsupportedOperationException
   at 
 org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
   at 
 org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at

[jira] [Created] (SPARK-3007) Add Dynamic Partition support to Spark Sql hive

baishuo created SPARK-3007:
--

 Summary: Add Dynamic Partition support  to  Spark Sql hive
 Key: SPARK-3007
 URL: https://issues.apache.org/jira/browse/SPARK-3007
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: baishuo






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3006) Failed to execute spark-shell in Windows OS


[ 
https://issues.apache.org/jira/browse/SPARK-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095286#comment-14095286
 ] 

Apache Spark commented on SPARK-3006:
-

User 'tsudukim' has created a pull request for this issue:
https://github.com/apache/spark/pull/1918

 Failed to execute spark-shell in Windows OS
 ---

 Key: SPARK-3006
 URL: https://issues.apache.org/jira/browse/SPARK-3006
 Project: Spark
  Issue Type: Bug
  Components: Windows
 Environment: Windows 8.1
Reporter: Masayoshi TSUZUKI
Priority: Minor

 when execute {{bin\spark-shell.cmd}} in Windows OS, I got errors like 
 folloings:
 {noformat}
 Error: Cannot load main class from JAR: spark-shell
 Run with --help for usage help or --verbose for debug output
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3007) Add Dynamic Partition support to Spark Sql hive


[ 
https://issues.apache.org/jira/browse/SPARK-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095300#comment-14095300
 ] 

baishuo commented on SPARK-3007:


after modify the code, I can run the hiveql with dynamic partition by 
SparkSqlCLIDriver:
spark-sql insert overwrite table partition_test_spark 
partition(stat_date,province) 
select member_id2,name2,stat_date2,province2
from partition_test_input_spark2;
spark-sqlTime taken: 10.351 seconds
spark-sqlselect * from partition_test_spark;

1   11  date1   pr1
2   22  date1   pr1
3   33  date1   pr2
4   44  date1   pr2
5   55  date2   pr1
6   66  date2   pr1
7   77  date2   pr2
8   88  date2   pr2
spark-sql Time taken: 0.287 seconds
spark-sqlinsert overwrite table partition_test_spark 
partition(stat_date='date1',province) 
select member_id2,name2,province2
from partition_test_input_spark2 
where stat_date2='date2';
spark-sqlselect * from partition_test_spark;
5   55  date1   pr1
6   66  date1   pr1
7   77  date1   pr2
8   88  date1   pr2
5   55  date2   pr1
6   66  date2   pr1
7   77  date2   pr2
8   88  date2   pr2

and we can also check that data all located in exceped directionary
--
the script to create  partition_test_input_spark2 and
create table partition_test_input_spark2
(member_id2 string,
name2 string,
stat_date2 string,
province2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/root/Desktop/testpartition.txt' OVERWRITE INTO TABLE 
partition_test_input_spark2;
()
create table partition_test_spark
(member_id string,
name string
)
partitioned by (
stat_date string,
province string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';





 Add Dynamic Partition support  to  Spark Sql hive
 ---

 Key: SPARK-3007
 URL: https://issues.apache.org/jira/browse/SPARK-3007
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: baishuo





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-3007) Add Dynamic Partition support to Spark Sql hive


[ 
https://issues.apache.org/jira/browse/SPARK-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095300#comment-14095300
 ] 

baishuo edited comment on SPARK-3007 at 8/13/14 9:08 AM:
-

after modify the code, I can run the hiveql with dynamic partition by 
SparkSqlCLIDriver:
spark-sql insert overwrite table partition_test_spark 
partition(stat_date,province) 
select member_id2,name2,stat_date2,province2
from partition_test_input_spark2;
spark-sqlTime taken: 10.351 seconds
spark-sqlselect * from partition_test_spark;

1   11  date1   pr1
2   22  date1   pr1
3   33  date1   pr2
4   44  date1   pr2
5   55  date2   pr1
6   66  date2   pr1
7   77  date2   pr2
8   88  date2   pr2
spark-sql Time taken: 0.287 seconds
spark-sqlinsert overwrite table partition_test_spark 
partition(stat_date='date1',province) 
select member_id2,name2,province2
from partition_test_input_spark2 
where stat_date2='date2';
spark-sqlselect * from partition_test_spark;
5   55  date1   pr1
6   66  date1   pr1
7   77  date1   pr2
8   88  date1   pr2
5   55  date2   pr1
6   66  date2   pr1
7   77  date2   pr2
8   88  date2   pr2

and we can also check that data all located in exceped directionary
--
the script to create  partition_test_input_spark2 and
create table partition_test_input_spark2
(member_id2 string,
name2 string,
stat_date2 string,
province2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/root/Desktop/testpartition.txt' OVERWRITE INTO TABLE 
partition_test_input_spark2;
(the content of testpartition.txt
is:
1,11,date1,pr1
2,22,date1,pr1
3,33,date1,pr2
4,44,date1,pr2
5,55,date2,pr1
6,66,date2,pr1
7,77,date2,pr2
8,88,date2,pr2)
create table partition_test_spark
(member_id string,
name string
)
partitioned by (
stat_date string,
province string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';






was (Author: baishuo):
after modify the code, I can run the hiveql with dynamic partition by 
SparkSqlCLIDriver:
spark-sql insert overwrite table partition_test_spark 
partition(stat_date,province) 
select member_id2,name2,stat_date2,province2
from partition_test_input_spark2;
spark-sqlTime taken: 10.351 seconds
spark-sqlselect * from partition_test_spark;

1   11  date1   pr1
2   22  date1   pr1
3   33  date1   pr2
4   44  date1   pr2
5   55  date2   pr1
6   66  date2   pr1
7   77  date2   pr2
8   88  date2   pr2
spark-sql Time taken: 0.287 seconds
spark-sqlinsert overwrite table partition_test_spark 
partition(stat_date='date1',province) 
select member_id2,name2,province2
from partition_test_input_spark2 
where stat_date2='date2';
spark-sqlselect * from partition_test_spark;
5   55  date1   pr1
6   66  date1   pr1
7   77  date1   pr2
8   88  date1   pr2
5   55  date2   pr1
6   66  date2   pr1
7   77  date2   pr2
8   88  date2   pr2

and we can also check that data all located in exceped directionary
--
the script to create  partition_test_input_spark2 and
create table partition_test_input_spark2
(member_id2 string,
name2 string,
stat_date2 string,
province2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/root/Desktop/testpartition.txt' OVERWRITE INTO TABLE 
partition_test_input_spark2;
()
create table partition_test_spark
(member_id string,
name string
)
partitioned by (
stat_date string,
province string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';





 Add Dynamic Partition support  to  Spark Sql hive
 ---

 Key: SPARK-3007
 URL: https://issues.apache.org/jira/browse/SPARK-3007
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: baishuo





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-3007) Add Dynamic Partition support to Spark Sql hive


[ 
https://issues.apache.org/jira/browse/SPARK-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095300#comment-14095300
 ] 

baishuo edited comment on SPARK-3007 at 8/13/14 9:10 AM:
-

after modify the code, I can run the hiveql with dynamic partition by 
SparkSqlCLIDriver:
spark-sql insert overwrite table partition_test_spark 
partition(stat_date,province) 
select member_id2,name2,stat_date2,province2
from partition_test_input_spark2;
spark-sqlTime taken: 10.351 seconds
spark-sqlselect * from partition_test_spark;

1   11  date1   pr1
2   22  date1   pr1
3   33  date1   pr2
4   44  date1   pr2
5   55  date2   pr1
6   66  date2   pr1
7   77  date2   pr2
8   88  date2   pr2
spark-sql Time taken: 0.287 seconds
spark-sqlinsert overwrite table partition_test_spark 
partition(stat_date='date1',province) 
select member_id2,name2,province2
from partition_test_input_spark2 
where stat_date2='date2';
spark-sqlselect * from partition_test_spark;
5   55  date1   pr1
6   66  date1   pr1
7   77  date1   pr2
8   88  date1   pr2
5   55  date2   pr1
6   66  date2   pr1
7   77  date2   pr2
8   88  date2   pr2

and we can also check that data all located in exceped directionary
--
the script to create  partition_test_input_spark2 and
create table partition_test_input_spark2
(member_id2 string,
name2 string,
stat_date2 string,
province2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/root/Desktop/testpartition.txt' OVERWRITE INTO TABLE 
partition_test_input_spark2;

create table partition_test_spark
(member_id string,
name string
)
partitioned by (
stat_date string,
province string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

(the content of testpartition.txt
is:
1,11,date1,pr1
2,22,date1,pr1
3,33,date1,pr2
4,44,date1,pr2
5,55,date2,pr1
6,66,date2,pr1
7,77,date2,pr2
8,88,date2,pr2)




was (Author: baishuo):
after modify the code, I can run the hiveql with dynamic partition by 
SparkSqlCLIDriver:
spark-sql insert overwrite table partition_test_spark 
partition(stat_date,province) 
select member_id2,name2,stat_date2,province2
from partition_test_input_spark2;
spark-sqlTime taken: 10.351 seconds
spark-sqlselect * from partition_test_spark;

1   11  date1   pr1
2   22  date1   pr1
3   33  date1   pr2
4   44  date1   pr2
5   55  date2   pr1
6   66  date2   pr1
7   77  date2   pr2
8   88  date2   pr2
spark-sql Time taken: 0.287 seconds
spark-sqlinsert overwrite table partition_test_spark 
partition(stat_date='date1',province) 
select member_id2,name2,province2
from partition_test_input_spark2 
where stat_date2='date2';
spark-sqlselect * from partition_test_spark;
5   55  date1   pr1
6   66  date1   pr1
7   77  date1   pr2
8   88  date1   pr2
5   55  date2   pr1
6   66  date2   pr1
7   77  date2   pr2
8   88  date2   pr2

and we can also check that data all located in exceped directionary
--
the script to create  partition_test_input_spark2 and
create table partition_test_input_spark2
(member_id2 string,
name2 string,
stat_date2 string,
province2 string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
LOAD DATA LOCAL INPATH '/root/Desktop/testpartition.txt' OVERWRITE INTO TABLE 
partition_test_input_spark2;
(the content of testpartition.txt
is:
1,11,date1,pr1
2,22,date1,pr1
3,33,date1,pr2
4,44,date1,pr2
5,55,date2,pr1
6,66,date2,pr1
7,77,date2,pr2
8,88,date2,pr2)
create table partition_test_spark
(member_id string,
name string
)
partitioned by (
stat_date string,
province string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';





 Add Dynamic Partition support  to  Spark Sql hive
 ---

 Key: SPARK-3007
 URL: https://issues.apache.org/jira/browse/SPARK-3007
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: baishuo





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3004) HiveThriftServer2 throws exception when the result set contains NULL


[ 
https://issues.apache.org/jira/browse/SPARK-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095308#comment-14095308
 ] 

Apache Spark commented on SPARK-3004:
-

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/1920

 HiveThriftServer2 throws exception when the result set contains NULL
 

 Key: SPARK-3004
 URL: https://issues.apache.org/jira/browse/SPARK-3004
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: Cheng Lian
Priority: Blocker

 To reproduce this issue with beeline:
 {code}
 $ cd $SPARK_HOME
 $ ./bin/beeline -u jdbc:hive2://localhost:1 -n lian
 ...
 0: jdbc:hive2://localhost:1 create table src1 (key int, value string);
 ...
 0: jdbc:hive2://localhost:1 load data local inpath 
 './sql/hive/src/test/resources/data/files/kv3.txt' into table src1;
 ...
 0: jdbc:hive2://localhost:1 select * from src1 where key is null;
 Error:  (state=,code=0)
 {code}
 Exception thrown from HiveThriftServer2:
 {code}
 java.lang.RuntimeException: Failed to check null bit for primitive int value.
 at scala.sys.package$.error(package.scala:27)
 at 
 org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(Row.scala:145)
 at 
 org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.getNextRowSet(SparkSQLOperationManager.scala:80)
 at 
 org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170)
 at 
 org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417)
 at 
 org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
 at 
 org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)
 at 
 org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 The cause is that we didn't check {{isNullAt}} in 
 {{SparkSQLOperationManager.getNextRowSet}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-2204) Scheduler for Mesos in fine-grained mode launches tasks on wrong executors


 [ 
https://issues.apache.org/jira/browse/SPARK-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Zhongxing updated SPARK-2204:


Comment: was deleted

(was: I encountered this issue again when I use Spark 1.0.2, Mesos 0.18.1, 
spark-cassandra-connector master branch.

Maybe this is not fixed on some failure/exception paths.

I run spark in coarse-grained mode. There are some exceptions thrown at the 
executors. But the spark driver is waiting and printing repeatedly:

TRACE [spark-akka.actor.default-dispatcher-17] 2014-08-11 10:57:32,998 
Logging.scala (line 66) Checking for hosts with\
 no recent heart beats in BlockManagerMaster.

The mesos master WARNING log:
W0811 10:32:58.172175 1646 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-2 on slave 20140808-113811-858302656-505\
0-1645-2 (ndb9)
W0811 10:32:58.181217 1649 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-5 on slave 20140808-113811-858302656-505\
0-1645-5 (ndb5)
W0811 10:32:58.277014 1650 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-3 on slave 20140808-113811-858302656-505\
0-1645-3 (ndb6)
W0811 10:32:58.344130 1648 master.cpp:2103] Ignoring unknown exited executor 
20140808-113811-858302656-5050-1645-0 on slave 20140808-113811-858302656-505\
0-1645-0 (ndb0)
W0811 10:32:58.354117 1651 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-11 on slave 20140804-095254-505981120-5\
050-20258-11 (ndb2)
W0811 10:32:58.550233 1647 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-2 on slave 20140804-172212-505981120-50\
50-26571-2 (ndb3)
W0811 10:32:58.793258 1653 master.cpp:2103] Ignoring unknown exited executor 
20140804-095254-505981120-5050-20258-19 on slave 20140804-095254-505981120-5\
050-20258-19 (ndb1)
W0811 10:32:58.904842 1652 master.cpp:2103] Ignoring unknown exited executor 
20140804-172212-505981120-5050-26571-0 on slave 20140804-172212-505981120-50\
50-26571-0 (ndb4)

Some other logs are at: 
https://github.com/datastax/spark-cassandra-connector/issues/134
)

 Scheduler for Mesos in fine-grained mode launches tasks on wrong executors
 --

 Key: SPARK-2204
 URL: https://issues.apache.org/jira/browse/SPARK-2204
 Project: Spark
  Issue Type: Bug
  Components: Mesos
Affects Versions: 1.0.0
Reporter: Sebastien Rainville
Assignee: Sebastien Rainville
Priority: Blocker
 Fix For: 1.0.1, 1.1.0


 MesosSchedulerBackend.resourceOffers(SchedulerDriver, List[Offer]) is 
 assuming that TaskSchedulerImpl.resourceOffers(Seq[WorkerOffer]) is returning 
 task lists in the same order as the offers it was passed, but in the current 
 implementation TaskSchedulerImpl.resourceOffers shuffles the offers to avoid 
 assigning the tasks always to the same executors. The result is that the 
 tasks are launched on the wrong executors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3008) PySpark fails due to zipimport not able to load the assembly jar (/usr/bin/python: No module named pyspark)

2014-08-13 Thread Jai Kumar Singh (JIRA)

Jai Kumar Singh created SPARK-3008:
--

 Summary: PySpark fails due to  zipimport not able to load the 
assembly jar (/usr/bin/python: No module named pyspark)
 Key: SPARK-3008
 URL: https://issues.apache.org/jira/browse/SPARK-3008
 Project: Spark
  Issue Type: Bug
  Components: PySpark
 Environment: Assemebly Jar 
target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.2.0.jar

jar -tf 
assembly/target/scala-2.10/spark-assembly-1.1.0-SNAPSHOT-hadoop2.2.0.jar | wc -l
70441

git sha commit ba28a8fcbc3ba432e7ea4d6f0b535450a6ec96c6

Reporter: Jai Kumar Singh


PySpark is not working. It fails because zipimport not able to import assembly  
jar because that contain more than 65536 files.


Email chains in this regard are below

http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3ccamjob8kcgk0pqiogju6uokceyswcusw3xwd5wrs8ikpmgd2...@mail.gmail.com%3E

https://mail.python.org/pipermail/python-list/2014-May/671353.html


Is there any work around to bypass the issue ?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3009) ApplicationInfo doesn't get initialised after deserialisation during recovery

2014-08-13 Thread Jacek Lewandowski (JIRA)

Jacek Lewandowski created SPARK-3009:


 Summary: ApplicationInfo doesn't get initialised after 
deserialisation during recovery
 Key: SPARK-3009
 URL: https://issues.apache.org/jira/browse/SPARK-3009
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.1
Reporter: Jacek Lewandowski


The {{readObject}} method has been removed from {{ApplicationInfo}} so that it 
does not initialise its transient fields properly after deserialisation. It 
follows throwing NPE during recovery of an application in 
{{MetricSystem.registerSource}}. As [~andrewor14] said, he removed 
{{readObject}} method by accident. 




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3003) FailedStage could not be cancelled by DAGScheduler when cancelJob or cancelStage


[ 
https://issues.apache.org/jira/browse/SPARK-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095474#comment-14095474
 ] 

Apache Spark commented on SPARK-3003:
-

User 'YanTangZhai' has created a pull request for this issue:
https://github.com/apache/spark/pull/1921

 FailedStage could not be cancelled by DAGScheduler when cancelJob or 
 cancelStage
 

 Key: SPARK-3003
 URL: https://issues.apache.org/jira/browse/SPARK-3003
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: YanTang Zhai
Priority: Minor

 Some stage is changed from running to failed, then DAGSCheduler could not  
 cancel it when cancelJob or cancelStage. Since in 
 failJobAndIndependentStages, DAGSCheduler will only cancel runningStage and 
 post SparkListenerStageCompleted for it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3009) ApplicationInfo doesn't get initialised after deserialisation during recovery


[ 
https://issues.apache.org/jira/browse/SPARK-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095479#comment-14095479
 ] 

Apache Spark commented on SPARK-3009:
-

User 'jacek-lewandowski' has created a pull request for this issue:
https://github.com/apache/spark/pull/1922

 ApplicationInfo doesn't get initialised after deserialisation during recovery
 -

 Key: SPARK-3009
 URL: https://issues.apache.org/jira/browse/SPARK-3009
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.1
Reporter: Jacek Lewandowski

 The {{readObject}} method has been removed from {{ApplicationInfo}} so that 
 it does not initialise its transient fields properly after deserialisation. 
 It follows throwing NPE during recovery of an application in 
 {{MetricSystem.registerSource}}. As [~andrewor14] said, he removed 
 {{readObject}} method by accident. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3009) ApplicationInfo doesn't get initialised after deserialisation during recovery

2014-08-13 Thread Jacek Lewandowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095483#comment-14095483
 ] 

Jacek Lewandowski commented on SPARK-3009:
--

[~andrewor14] could you review it please?

 ApplicationInfo doesn't get initialised after deserialisation during recovery
 -

 Key: SPARK-3009
 URL: https://issues.apache.org/jira/browse/SPARK-3009
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.1
Reporter: Jacek Lewandowski

 The {{readObject}} method has been removed from {{ApplicationInfo}} so that 
 it does not initialise its transient fields properly after deserialisation. 
 It follows throwing NPE during recovery of an application in 
 {{MetricSystem.registerSource}}. As [~andrewor14] said, he removed 
 {{readObject}} method by accident. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-08-13 Thread Debasish Das (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Debasish Das updated SPARK-2426:

Description:
Current ALS supports least squares and nonnegative least squares.

I presented ADMM and IPM based Quadratic Minimization solvers to be used for
the following ALS problems:

1. ALS with bounds
2. ALS with L1 regularization
3. ALS with Equality constraint and bounds

Initial runtime comparisons are presented at Spark Summit.

http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark

Based on Xiangrui's feedback I am currently comparing the ADMM based Quadratic
Minimization solvers with IPM based QpSolvers and the default ALS/NNLS. I will
keep updating the runtime comparison results.

For integration the detailed plan is as follows:

1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
2. Integrate QuadraticMinimizer in mllib ALS

was:
Current ALS supports least squares and nonnegative least squares.

I presented ADMM and IPM based Quadratic Minimization solvers to be used for
the following ALS problems:

1. ALS with bounds
2. ALS with L1 regularization
3. ALS with Equality constraint and bounds

Initial runtime comparisons are presented at Spark Summit.

http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark

For integration the detailed plan is as follows:

1. Add ADMM and IPM based QuadraticMinimization solvers to
breeze.optimize.quadratic package.
2. Add a QpSolver object in spark mllib optimization which calls breeze
3. Add the QpSolver object in spark mllib ALS

Quadratic Minimization for MLlib ALS

Current ALS supports least squares and nonnegative least squares.
I presented ADMM and IPM based Quadratic Minimization solvers to be used for
the following ALS problems:
1. ALS with bounds
2. ALS with L1 regularization
3. ALS with Equality constraint and bounds
Initial runtime comparisons are presented at Spark Summit.
http://spark-summit.org/2014/talk/quadratic-programing-solver-for-non-negative-matrix-factorization-with-spark
Based on Xiangrui's feedback I am currently comparing the ADMM based
Quadratic Minimization solvers with IPM based QpSolvers and the default
ALS/NNLS. I will keep updating the runtime comparison results.
For integration the detailed plan is as follows:
1. Add QuadraticMinimizer and Proximal algorithms in mllib.optimization
2. Integrate QuadraticMinimizer in mllib ALS

--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3010) fix redundant conditional

2014-08-13 Thread wangfei (JIRA)

wangfei created SPARK-3010:
--

 Summary: fix redundant conditional
 Key: SPARK-3010
 URL: https://issues.apache.org/jira/browse/SPARK-3010
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: wangfei
 Fix For: 1.1.0


there are some redundant conditional in spark, such as 

1.
private[spark] def codegenEnabled: Boolean =
  if (getConf(CODEGEN_ENABLED, false) == true) true else false
2.
x = if (x == 2) true else false

... etc



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-2426) Quadratic Minimization for MLlib ALS

2014-08-13 Thread Debasish Das (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095232#comment-14095232
]

Debasish Das edited comment on SPARK-2426 at 8/13/14 3:31 PM:
--

Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week.

I need some advice on whether we should bring the additional ALS features first
or integrate NNLS with QuadraticMinimizer so that we can handle large ranks as
well.

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all
QuadraticMinimization.

Right now we support 5 features:

There are lot many regularization in Proximal.scala which can be re-used in
mllib updater...L1Updater in mllib is an example of Proximal algorithm...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based
on problem we are solving)

The CG core from NNLS should be used for iterative solve when ranks are
high...I need a different variant of CG for Formulation 4 so NNLS CG is not
sufficient for all the formulations this branch supports...

Right now I am experimenting with ADMM rho and lambda values so that the NNLS
iterations are at par with Least square with positivity. The idea for rho and
lambda tuning are the following:

1. Derive an optimal value of lambda for quadratic problems, similar to idea of
Nesterov's acceleration being used in algorithms like FISTA and accelerated
ADMM from UCLA
2. Equilibrate/Scale the gram matrix such that rho can always be set at 1.0

For Matlab based experiments within PDCO, ECOS(IPM), MOSEK and ADMM variants,
ADMM is faster with producing result quality within 1e-4 of MOSEK. I will
publish the numbers and the matlab script through the ECOS jnilib open source
(GPL licensed). I did not add any of ECOS code here so that everything stays
Apache.

For recommendation use-case, I expect to produce Jellylish L1 ball projection
results on netflix/movielens dataset using Formulation 5.

Example runs:

Least square with equality and positivity for topic modeling, all topics sum to
1.0:

bin/spark-submit --class org.apache.spark.examples.mllib.MovieLensALS \
| examples/target/scala-*/spark-examples-*.jar \
| --rank 25 --numIterations 10 --lambda 1.0 --kryo --qpProblem 4\
| data/mllib/sample_movielens_data.txt

Least square with L1 regularization:

bin/spark-submit --class org.apache.spark.examples.mllib.MovieLensALS \
| examples/target/scala-*/spark-examples-*.jar \
| --rank 25 --numIterations 10 --lambda 1.0 --lambdaL1 1e-2 --kryo
--qpProblem 5\
| data/mllib/sample_movielens_data.txt

Thanks.
Deb

was (Author: debasish83):
Hi Xiangrui,

The branch is ready for an initial review. I will do lot of clean-up this week.

https://github.com/debasish83/spark/commits/qp-als

optimization/QuadraticMinimizer.scala is the placeholder for all
QuadraticMinimization.

Right now we support 5 features:

There are lot many regularization in Proximal.scala which can be re-used in
mllib updater...L1Updater is an example of Proximal algorithm.

I feel we should move NNLS into QuadraticMinimizer as well and clean ALS.scala
as you have suggested before...

QuadraticMinimizer is optimized for direct solve right now (cholesky / lu based
on problem we are solving)

The CG core from NNLS should be used for iterative solve when ranks are
high...I need a different variant of CG for Formulation 4 so NNLS CG is not
sufficient for all the formulations.

Right now I am experimenting with ADMM rho and lambda values so that the NNLS
iterations are at par with Least square with positivity.

I will publish results from the comparisons.

I will also publish comparisons with PDCO, ECOS (IPM) and MOSEK with ADMM
variants used in this branch...

For recommendation use-case, I expect to produce Jellylish L1 ball projection
results on netflix/movielens dataset using Formulation 5.

Thanks.
Deb

Quadratic Minimization for MLlib ALS

Current ALS supports least squares

[jira] [Commented] (SPARK-3010) fix redundant conditional


[ 
https://issues.apache.org/jira/browse/SPARK-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095587#comment-14095587
 ] 

Apache Spark commented on SPARK-3010:
-

User 'scwf' has created a pull request for this issue:
https://github.com/apache/spark/pull/1923

 fix redundant conditional
 -

 Key: SPARK-3010
 URL: https://issues.apache.org/jira/browse/SPARK-3010
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.0.2
Reporter: wangfei
 Fix For: 1.1.0


 there are some redundant conditional in spark, such as 
 1.
 private[spark] def codegenEnabled: Boolean =
   if (getConf(CODEGEN_ENABLED, false) == true) true else false
 2.
 x = if (x == 2) true else false
 ... etc



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3011) _temporary directory should be filtered out by sqlContext.parquetFile

2014-08-13 Thread Joseph Su (JIRA)

Joseph Su created SPARK-3011:


 Summary: _temporary directory should be filtered out by 
sqlContext.parquetFile
 Key: SPARK-3011
 URL: https://issues.apache.org/jira/browse/SPARK-3011
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Joseph Su


Sometimes _temporary directory is not removed after the file committed on S3. 
sqlContext.parquetFile will raise because it is trying to read the metadata in  
_temporary .sqlContext.parquetFile should just ignore the directory.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3011) _temporary directory should be filtered out by sqlContext.parquetFile

2014-08-13 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095636#comment-14095636
 ] 

Sean Owen commented on SPARK-3011:
--

Duplicate, or very closely related: 
https://issues.apache.org/jira/browse/SPARK-2700

 _temporary directory should be filtered out by sqlContext.parquetFile
 -

 Key: SPARK-3011
 URL: https://issues.apache.org/jira/browse/SPARK-3011
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Joseph Su

 Sometimes _temporary directory is not removed after the file committed on S3. 
 sqlContext.parquetFile will raise because it is trying to read the metadata 
 in  _temporary .sqlContext.parquetFile should just ignore the directory.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3011) _temporary directory should be filtered out by sqlContext.parquetFile


[ 
https://issues.apache.org/jira/browse/SPARK-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095641#comment-14095641
 ] 

Apache Spark commented on SPARK-3011:
-

User 'joesu' has created a pull request for this issue:
https://github.com/apache/spark/pull/1924

 _temporary directory should be filtered out by sqlContext.parquetFile
 -

 Key: SPARK-3011
 URL: https://issues.apache.org/jira/browse/SPARK-3011
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Joseph Su

 Sometimes _temporary directory is not removed after the file committed on S3. 
 sqlContext.parquetFile will raise because it is trying to read the metadata 
 in  _temporary .sqlContext.parquetFile should just ignore the directory.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3011) _temporary directory should be filtered out by sqlContext.parquetFile

2014-08-13 Thread Joseph Su (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095640#comment-14095640
 ] 

Joseph Su commented on SPARK-3011:
--

Pull request is here: https://github.com/apache/spark/pull/1924

SPARK-2700 did not filter out temp dir. 

 _temporary directory should be filtered out by sqlContext.parquetFile
 -

 Key: SPARK-3011
 URL: https://issues.apache.org/jira/browse/SPARK-3011
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Joseph Su

 Sometimes _temporary directory is not removed after the file committed on S3. 
 sqlContext.parquetFile will raise because it is trying to read the metadata 
 in  _temporary .sqlContext.parquetFile should just ignore the directory.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3012) Standardized Distance Functions between two Vectors for MLlib

2014-08-13 Thread Yu Ishikawa (JIRA)

Yu Ishikawa created SPARK-3012:
--

 Summary: Standardized Distance Functions between two Vectors for 
MLlib
 Key: SPARK-3012
 URL: https://issues.apache.org/jira/browse/SPARK-3012
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Reporter: Yu Ishikawa
Priority: Minor


Most of the clustering algorithms need distance functions between two Vectors.

We should include the standardized distance function library in MLlib.
I think that the standardized distance functions help us to implement more 
machine learning algorithms efficiently.

h3. For example

- Chebyshev Distance
- Cosine Distance
- Euclidean Distance
- Mahalanobis Distance
- Manhattan Distance
- Minkowski Distance
- SquaredEuclidean Distance
- Tanimoto Distance
- Weighted Distance
- WeightedEuclidean Distance
- WeightedManhattan Distance



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3013) Doctest of inferSchema in Spark SQL Python API fails

2014-08-13 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-3013:
-

 Summary: Doctest of inferSchema in Spark SQL Python API fails
 Key: SPARK-3013
 URL: https://issues.apache.org/jira/browse/SPARK-3013
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: Cheng Lian
Priority: Blocker


Doctest of `inferSchema` in `sql.py` keeps failing and makes Jenkins crazy:

{code}
File /home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py, 
line 1021, in pyspark.sql.SQLContext.inferSchema
Failed example:
srdd.collect()
Exception raised:
Traceback (most recent call last):
  File /usr/lib64/python2.6/doctest.py, line 1253, in __run
compileflags, 1) in test.globs
  File doctest pyspark.sql.SQLContext.inferSchema[6], line 1, in 
module
srdd.collect()
  File 
/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py, line 
1613, in collect
rows = RDD.collect(self)
  File 
/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py, line 
724, in collect
bytesInJava = self._jrdd.collect().iterator()
  File 
/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
 line 538, in __call__
self.target_id, self.name)
  File 
/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py,
 line 300, in get_return_value
format(target_id, '.', name), value)
Py4JJavaError: An error occurred while calling o399.collect.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 
in stage 35.0 failed 1 times, most recent failure: Lost task 1.0 in stage 35.0 
(TID 72, localhost): java.lang.ClassCastException: java.lang.String cannot be 
cast to java.util.ArrayList

net.razorvine.pickle.objects.ArrayConstructor.construct(ArrayConstructor.java:33)
net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:617)
net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:170)
net.razorvine.pickle.Unpickler.load(Unpickler.java:84)
net.razorvine.pickle.Unpickler.loads(Unpickler.java:97)

org.apache.spark.api.python.PythonRDD$$anonfun$pythonToJavaArray$1$$anonfun$apply$4.apply(PythonRDD.scala:722)

org.apache.spark.api.python.PythonRDD$$anonfun$pythonToJavaArray$1$$anonfun$apply$4.apply(PythonRDD.scala:721)
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)

scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
scala.collection.AbstractIterator.to(Iterator.scala:1157)

scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)

org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executSLF4J: Failed to load class 
org.slf4j.impl.StaticLoggerBinder.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
or.Executor$TaskRunner.run(Executor.scala:199)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

[jira] [Commented] (SPARK-2140) yarn stable client doesn't properly handle MEMORY_OVERHEAD for AM

2014-08-13 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095721#comment-14095721
 ] 

Thomas Graves commented on SPARK-2140:
--

ah it seems things have changed. Its now actually the opposite problem now 
where yarn alpha is getting less then it should be. 

line 87 is what its actually asking from YARN.  the calculateAMMemory is what 
its using for heap.  So it appears yarn stable is correct right now.  It 
appears yarn alpha in calculateAMmemory subtracts out the memory overhead when 
it shouldn't.

 yarn stable client doesn't properly handle MEMORY_OVERHEAD for AM
 -

 Key: SPARK-2140
 URL: https://issues.apache.org/jira/browse/SPARK-2140
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Thomas Graves
 Fix For: 1.0.1, 1.1.0


 The yarn stable client doesn't properly remove the MEMORY_OVERHEAD amount 
 from the java heap size, the code to handle that is commented out (see 
 function calculateAMMemory).  We should fix this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3013) Doctest of inferSchema in Spark SQL Python API fails


 [ 
https://issues.apache.org/jira/browse/SPARK-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-3013:


Assignee: Davies Liu

 Doctest of inferSchema in Spark SQL Python API fails
 

 Key: SPARK-3013
 URL: https://issues.apache.org/jira/browse/SPARK-3013
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: Cheng Lian
Assignee: Davies Liu
Priority: Blocker

 Doctest of `inferSchema` in `sql.py` keeps failing and makes Jenkins crazy:
 {code}
 File /home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py, 
 line 1021, in pyspark.sql.SQLContext.inferSchema
 Failed example:
 srdd.collect()
 Exception raised:
 Traceback (most recent call last):
   File /usr/lib64/python2.6/doctest.py, line 1253, in __run
 compileflags, 1) in test.globs
   File doctest pyspark.sql.SQLContext.inferSchema[6], line 1, in 
 module
 srdd.collect()
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py, line 
 1613, in collect
 rows = RDD.collect(self)
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py, line 
 724, in collect
 bytesInJava = self._jrdd.collect().iterator()
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
  line 538, in __call__
 self.target_id, self.name)
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py,
  line 300, in get_return_value
 format(target_id, '.', name), value)
 Py4JJavaError: An error occurred while calling o399.collect.
 : org.apache.spark.SparkException: Job aborted due to stage failure: Task 
 1 in stage 35.0 failed 1 times, most recent failure: Lost task 1.0 in stage 
 35.0 (TID 72, localhost): java.lang.ClassCastException: java.lang.String 
 cannot be cast to java.util.ArrayList
 
 net.razorvine.pickle.objects.ArrayConstructor.construct(ArrayConstructor.java:33)
 net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:617)
 net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:170)
 net.razorvine.pickle.Unpickler.load(Unpickler.java:84)
 net.razorvine.pickle.Unpickler.loads(Unpickler.java:97)
 
 org.apache.spark.api.python.PythonRDD$$anonfun$pythonToJavaArray$1$$anonfun$apply$4.apply(PythonRDD.scala:722)
 
 org.apache.spark.api.python.PythonRDD$$anonfun$pythonToJavaArray$1$$anonfun$apply$4.apply(PythonRDD.scala:721)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
 
 scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
 
 scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 scala.collection.AbstractIterator.to(Iterator.scala:1157)
 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
 scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
 org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executSLF4J: Failed to load class 
 org.slf4j.impl.StaticLoggerBinder.
 SLF4J: Defaulting to no-operation (NOP) logger implementation
 SLF4J: See

[jira] [Commented] (SPARK-1442) Add Window function support

2014-08-13 Thread Adam Nowak (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095861#comment-14095861
 ] 

Adam Nowak commented on SPARK-1442:
---

Does the Spark SQLContext support windowing functions with the support added 
into Hive?

 Add Window function support
 ---

 Key: SPARK-1442
 URL: https://issues.apache.org/jira/browse/SPARK-1442
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Chengxiang Li
 Fix For: 1.1.0


 similiar to Hive, add window function support for catalyst.
 https://issues.apache.org/jira/browse/HIVE-4197
 https://issues.apache.org/jira/browse/HIVE-896



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2846) Spark SQL hive implementation bypass StorageHandler which breaks any customized StorageHandler

2014-08-13 Thread Alex Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095879#comment-14095879
 ] 

Alex Liu commented on SPARK-2846:
-

pull @ https://github.com/apache/spark/pull/1927

 Spark SQL hive implementation bypass StorageHandler which breaks any 
 customized StorageHandler
 --

 Key: SPARK-2846
 URL: https://issues.apache.org/jira/browse/SPARK-2846
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Alex Liu
 Attachments: 2846.txt


 The existing implementation bypass StorageHandler and other Hive integration 
 API. I test CassandraStorageHandler on the latest Spark Sql, it fails due to 
 some job properties configuration in StorageHandler API are bypassed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2969) Make ScalaReflection be able to handle ArrayType.containsNull and MapType.valueContainsNull.


 [ 
https://issues.apache.org/jira/browse/SPARK-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-2969:


Priority: Critical  (was: Major)

 Make ScalaReflection be able to handle ArrayType.containsNull and 
 MapType.valueContainsNull.
 

 Key: SPARK-2969
 URL: https://issues.apache.org/jira/browse/SPARK-2969
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Takuya Ueshin
Assignee: Takuya Ueshin
Priority: Critical

 Make {{ScalaReflection}} be able to handle like:
 - Seq\[Int] as ArrayType(IntegerType, containsNull = false)
 - Seq\[java.lang.Integer] as ArrayType(IntegerType, containsNull = true)
 - Map\[Int, Long] as MapType(IntegerType, LongType, valueContainsNull = false)
 - Map\[Int, java.lang.Long] as MapType(IntegerType, LongType, 
 valueContainsNull = true)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2846) Spark SQL hive implementation bypass StorageHandler which breaks any customized StorageHandler


[ 
https://issues.apache.org/jira/browse/SPARK-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095925#comment-14095925
 ] 

Apache Spark commented on SPARK-2846:
-

User 'alexliu68' has created a pull request for this issue:
https://github.com/apache/spark/pull/1927

 Spark SQL hive implementation bypass StorageHandler which breaks any 
 customized StorageHandler
 --

 Key: SPARK-2846
 URL: https://issues.apache.org/jira/browse/SPARK-2846
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Alex Liu
 Attachments: 2846.txt


 The existing implementation bypass StorageHandler and other Hive integration 
 API. I test CassandraStorageHandler on the latest Spark Sql, it fails due to 
 some job properties configuration in StorageHandler API are bypassed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3013) Doctest of inferSchema in Spark SQL Python API fails


[ 
https://issues.apache.org/jira/browse/SPARK-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14095942#comment-14095942
 ] 

Apache Spark commented on SPARK-3013:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/1928

 Doctest of inferSchema in Spark SQL Python API fails
 

 Key: SPARK-3013
 URL: https://issues.apache.org/jira/browse/SPARK-3013
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: Cheng Lian
Assignee: Davies Liu
Priority: Blocker

 Doctest of `inferSchema` in `sql.py` keeps failing and makes Jenkins crazy:
 {code}
 File /home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py, 
 line 1021, in pyspark.sql.SQLContext.inferSchema
 Failed example:
 srdd.collect()
 Exception raised:
 Traceback (most recent call last):
   File /usr/lib64/python2.6/doctest.py, line 1253, in __run
 compileflags, 1) in test.globs
   File doctest pyspark.sql.SQLContext.inferSchema[6], line 1, in 
 module
 srdd.collect()
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py, line 
 1613, in collect
 rows = RDD.collect(self)
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py, line 
 724, in collect
 bytesInJava = self._jrdd.collect().iterator()
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
  line 538, in __call__
 self.target_id, self.name)
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py,
  line 300, in get_return_value
 format(target_id, '.', name), value)
 Py4JJavaError: An error occurred while calling o399.collect.
 : org.apache.spark.SparkException: Job aborted due to stage failure: Task 
 1 in stage 35.0 failed 1 times, most recent failure: Lost task 1.0 in stage 
 35.0 (TID 72, localhost): java.lang.ClassCastException: java.lang.String 
 cannot be cast to java.util.ArrayList
 
 net.razorvine.pickle.objects.ArrayConstructor.construct(ArrayConstructor.java:33)
 net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:617)
 net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:170)
 net.razorvine.pickle.Unpickler.load(Unpickler.java:84)
 net.razorvine.pickle.Unpickler.loads(Unpickler.java:97)
 
 org.apache.spark.api.python.PythonRDD$$anonfun$pythonToJavaArray$1$$anonfun$apply$4.apply(PythonRDD.scala:722)
 
 org.apache.spark.api.python.PythonRDD$$anonfun$pythonToJavaArray$1$$anonfun$apply$4.apply(PythonRDD.scala:721)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
 
 scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
 
 scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 scala.collection.AbstractIterator.to(Iterator.scala:1157)
 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
 scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
 org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executSLF4J: Failed to load class

[jira] [Updated] (SPARK-2846) Add configureInputJobPropertiesForStorageHandler to initialization of job conf

2014-08-13 Thread Alex Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Liu updated SPARK-2846:


Summary: Add configureInputJobPropertiesForStorageHandler to initialization 
of job conf  (was: Spark SQL hive implementation bypass StorageHandler which 
breaks any customized StorageHandler)

 Add configureInputJobPropertiesForStorageHandler to initialization of job conf
 --

 Key: SPARK-2846
 URL: https://issues.apache.org/jira/browse/SPARK-2846
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Alex Liu
 Attachments: 2846.txt


 The existing implementation bypass StorageHandler and other Hive integration 
 API. I test CassandraStorageHandler on the latest Spark Sql, it fails due to 
 some job properties configuration in StorageHandler API are bypassed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1391) BlockManager cannot transfer blocks larger than 2G in size


 [ 
https://issues.apache.org/jira/browse/SPARK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-1391:
---

Issue Type: Improvement  (was: Bug)

 BlockManager cannot transfer blocks larger than 2G in size
 --

 Key: SPARK-1391
 URL: https://issues.apache.org/jira/browse/SPARK-1391
 Project: Spark
  Issue Type: Improvement
  Components: Block Manager, Shuffle
Affects Versions: 1.0.0
Reporter: Shivaram Venkataraman
 Attachments: SPARK-1391.diff


 If a task tries to remotely access a cached RDD block, I get an exception 
 when the block size is  2G. The exception is pasted below.
 Memory capacities are huge these days ( 60G), and many workflows depend on 
 having large blocks in memory, so it would be good to fix this bug.
 I don't know if the same thing happens on shuffles if one transfer (from 
 mapper to reducer) is  2G.
 {noformat}
 14/04/02 02:33:10 ERROR storage.BlockManagerWorker: Exception handling buffer 
 message
 java.lang.ArrayIndexOutOfBoundsException
 at 
 it.unimi.dsi.fastutil.io.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:96)
 at 
 it.unimi.dsi.fastutil.io.FastBufferedOutputStream.dumpBuffer(FastBufferedOutputStream.java:134)
 at 
 it.unimi.dsi.fastutil.io.FastBufferedOutputStream.write(FastBufferedOutputStream.java:164)
 at 
 java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
 at 
 java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
 at 
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188)
 at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
 at 
 org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:38)
 at 
 org.apache.spark.serializer.SerializationStream$class.writeAll(Serializer.scala:93)
 at 
 org.apache.spark.serializer.JavaSerializationStream.writeAll(JavaSerializer.scala:26)
 at 
 org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:913)
 at 
 org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:922)
 at 
 org.apache.spark.storage.MemoryStore.getBytes(MemoryStore.scala:102)
 at 
 org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:348)
 at 
 org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:323)
 at 
 org.apache.spark.storage.BlockManagerWorker.getBlock(BlockManagerWorker.scala:90)
 at 
 org.apache.spark.storage.BlockManagerWorker.processBlockMessage(BlockManagerWorker.scala:69)
 at 
 org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44)
 at 
 org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at 
 org.apache.spark.storage.BlockMessageArray.foreach(BlockMessageArray.scala:28)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at 
 org.apache.spark.storage.BlockMessageArray.map(BlockMessageArray.scala:28)
 at 
 org.apache.spark.storage.BlockManagerWorker.onBlockMessageReceive(BlockManagerWorker.scala:44)
 at 
 org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34)
 at 
 org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34)
 at 
 org.apache.spark.network.ConnectionManager.org$apache$spark$network$ConnectionManager$$handleMessage(ConnectionManager.scala:661)
 at 
 org.apache.spark.network.ConnectionManager$$anon$9.run(ConnectionManager.scala:503)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1391) BlockManager cannot transfer blocks larger than 2G in size


 [ 
https://issues.apache.org/jira/browse/SPARK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-1391:
---

Assignee: (was: Min Zhou)

 BlockManager cannot transfer blocks larger than 2G in size
 --

 Key: SPARK-1391
 URL: https://issues.apache.org/jira/browse/SPARK-1391
 Project: Spark
  Issue Type: Bug
  Components: Block Manager, Shuffle
Affects Versions: 1.0.0
Reporter: Shivaram Venkataraman
 Attachments: SPARK-1391.diff


 If a task tries to remotely access a cached RDD block, I get an exception 
 when the block size is  2G. The exception is pasted below.
 Memory capacities are huge these days ( 60G), and many workflows depend on 
 having large blocks in memory, so it would be good to fix this bug.
 I don't know if the same thing happens on shuffles if one transfer (from 
 mapper to reducer) is  2G.
 {noformat}
 14/04/02 02:33:10 ERROR storage.BlockManagerWorker: Exception handling buffer 
 message
 java.lang.ArrayIndexOutOfBoundsException
 at 
 it.unimi.dsi.fastutil.io.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:96)
 at 
 it.unimi.dsi.fastutil.io.FastBufferedOutputStream.dumpBuffer(FastBufferedOutputStream.java:134)
 at 
 it.unimi.dsi.fastutil.io.FastBufferedOutputStream.write(FastBufferedOutputStream.java:164)
 at 
 java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876)
 at 
 java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785)
 at 
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188)
 at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
 at 
 org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:38)
 at 
 org.apache.spark.serializer.SerializationStream$class.writeAll(Serializer.scala:93)
 at 
 org.apache.spark.serializer.JavaSerializationStream.writeAll(JavaSerializer.scala:26)
 at 
 org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:913)
 at 
 org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:922)
 at 
 org.apache.spark.storage.MemoryStore.getBytes(MemoryStore.scala:102)
 at 
 org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:348)
 at 
 org.apache.spark.storage.BlockManager.getLocalBytes(BlockManager.scala:323)
 at 
 org.apache.spark.storage.BlockManagerWorker.getBlock(BlockManagerWorker.scala:90)
 at 
 org.apache.spark.storage.BlockManagerWorker.processBlockMessage(BlockManagerWorker.scala:69)
 at 
 org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44)
 at 
 org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:44)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at 
 org.apache.spark.storage.BlockMessageArray.foreach(BlockMessageArray.scala:28)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at 
 org.apache.spark.storage.BlockMessageArray.map(BlockMessageArray.scala:28)
 at 
 org.apache.spark.storage.BlockManagerWorker.onBlockMessageReceive(BlockManagerWorker.scala:44)
 at 
 org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34)
 at 
 org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:34)
 at 
 org.apache.spark.network.ConnectionManager.org$apache$spark$network$ConnectionManager$$handleMessage(ConnectionManager.scala:661)
 at 
 org.apache.spark.network.ConnectionManager$$anon$9.run(ConnectionManager.scala:503)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-1297) Upgrade HBase dependency to 0.98.0


 [ 
https://issues.apache.org/jira/browse/SPARK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-1297:
---

Assignee: Ted Yu

 Upgrade HBase dependency to 0.98.0
 --

 Key: SPARK-1297
 URL: https://issues.apache.org/jira/browse/SPARK-1297
 Project: Spark
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: spark-1297-v2.txt, spark-1297-v4.txt


 HBase 0.94.6 was released 11 months ago.
 Upgrade HBase dependency to 0.98.0



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3014) Log a more informative message when yarn-cluster app fails because SparkContext wasn't initialized

2014-08-13 Thread Sandy Ryza (JIRA)

Sandy Ryza created SPARK-3014:
-

 Summary: Log a more informative message when yarn-cluster app 
fails because SparkContext wasn't initialized
 Key: SPARK-3014
 URL: https://issues.apache.org/jira/browse/SPARK-3014
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Priority: Minor


This is what shows up currently:
{code}
Exception in thread Thread-4 java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:187)
Exception in thread main java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:223)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:112)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:470)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:469)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3014) Log a more informative messages in a couple failure scenarios

2014-08-13 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated SPARK-3014:
--

Summary: Log a more informative messages in a couple failure scenarios  
(was: Log a more informative message when yarn-cluster app fails because 
SparkContext wasn't initialized)

 Log a more informative messages in a couple failure scenarios
 -

 Key: SPARK-3014
 URL: https://issues.apache.org/jira/browse/SPARK-3014
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Priority: Minor

 This is what shows up currently:
 {code}
 Exception in thread Thread-4 java.lang.NullPointerException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:187)
 Exception in thread main java.lang.AssertionError: assertion failed
   at scala.Predef$.assert(Predef.scala:165)
   at 
 org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:223)
   at 
 org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:112)
   at 
 org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:470)
   at 
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
   at 
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
   at 
 org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
   at 
 org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:469)
   at 
 org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3014) Log a more informative messages in a couple failure scenarios

2014-08-13 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated SPARK-3014:
--

Description: 
This is what shows up currently when the user code fails to initialize a 
SparkContext when running in yarn-cluster mode:
{code}
Exception in thread Thread-4 java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:187)
Exception in thread main java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:223)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:112)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:470)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:469)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
{code}

This is what shows up when the main method isn't static:
{code}
Exception in thread main java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

  was:
This is what shows up currently:
{code}
Exception in thread Thread-4 java.lang.NullPointerException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:187)
Exception in thread main java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:223)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:112)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:470)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
at 
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:469)
at 
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
{code}


 Log a more informative messages in a couple failure scenarios
 -

 Key: SPARK-3014
 URL: https://issues.apache.org/jira/browse/SPARK-3014
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.0.2
Reporter: Sandy Ryza
Priority: Minor

 This is what shows up currently when the user code fails to initialize a 
 SparkContext when running in yarn-cluster mode:
 {code}
 Exception in thread Thread-4 java.lang.NullPointerException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at

[jira] [Created] (SPARK-3015) Removing broadcast in quick successions causes Akka timeout

2014-08-13 Thread Andrew Or (JIRA)

Andrew Or created SPARK-3015:


 Summary: Removing broadcast in quick successions causes Akka 
timeout
 Key: SPARK-3015
 URL: https://issues.apache.org/jira/browse/SPARK-3015
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2
 Environment: Standalone EC2 Spark shell
Reporter: Andrew Or
Priority: Blocker
 Fix For: 1.1.0


This issue is originally reported in SPARK-2916 in the context of MLLib, but we 
were able to reproduce it using a simple Spark shell command:

{code}
(1 to 1).foreach { i = sc.parallelize(1 to 1000, 48).sum }
{code}

We still do not have a full understanding of the issue, but we have gleaned the 
following information so far. When the driver runs a GC, it attempts to clean 
up all the broadcast blocks that go out of scope at once. This causes the 
driver to send out many blocking RemoveBroadcast messages to the executors, 
which in turn send out blocking UpdateBlockInfo messages back to the driver. 
Both of these calls block until they receive the expected responses. We suspect 
that the high frequency at which we send these blocking messages is the cause 
of either dropped messages or internal deadlock somewhere.

Unfortunately, it is highly difficult to reproduce depending on the 
environment. We have been able to reproduce it on a 6-node cluster in 
us-west-2, but not in us-west-1, for instance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2983) improve performance of sortByKey()

2014-08-13 Thread Matei Zaharia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia resolved SPARK-2983.
--

   Resolution: Fixed
Fix Version/s: 1.1.0

 improve performance of sortByKey()
 --

 Key: SPARK-2983
 URL: https://issues.apache.org/jira/browse/SPARK-2983
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 0.9.0, 1.1.0, 1.0.2
Reporter: Davies Liu
Assignee: Davies Liu
 Fix For: 1.1.0


 For large datasets with many partitions (N), sortByKey() will be very slow, 
 because it will take O(N) time in rangePartitioner.
 This could be improved by using binary search, the time will be reduced to 
 O(logN).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3016) Client should be able to put blocks in addition to fetch blocks

Reynold Xin created SPARK-3016:
--

 Summary: Client should be able to put blocks in addition to fetch 
blocks
 Key: SPARK-3016
 URL: https://issues.apache.org/jira/browse/SPARK-3016
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin


If we ever want the Netty module to replace the existing ConnectionManager, 
we'd need to implement the ability for the client to put blocks to servers. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3017) Implement unit/integration tests for connection failures

Reynold Xin created SPARK-3017:
--

 Summary: Implement unit/integration tests for connection failures
 Key: SPARK-3017
 URL: https://issues.apache.org/jira/browse/SPARK-3017
 Project: Spark
  Issue Type: Sub-task
Reporter: Reynold Xin






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3015) Removing broadcast in quick successions causes Akka timeout


[ 
https://issues.apache.org/jira/browse/SPARK-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096251#comment-14096251
 ] 

Apache Spark commented on SPARK-3015:
-

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/1931

 Removing broadcast in quick successions causes Akka timeout
 ---

 Key: SPARK-3015
 URL: https://issues.apache.org/jira/browse/SPARK-3015
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2
 Environment: Standalone EC2 Spark shell
Reporter: Andrew Or
Priority: Blocker
 Fix For: 1.1.0


 This issue is originally reported in SPARK-2916 in the context of MLLib, but 
 we were able to reproduce it using a simple Spark shell command:
 {code}
 (1 to 1).foreach { i = sc.parallelize(1 to 1000, 48).sum }
 {code}
 We still do not have a full understanding of the issue, but we have gleaned 
 the following information so far. When the driver runs a GC, it attempts to 
 clean up all the broadcast blocks that go out of scope at once. This causes 
 the driver to send out many blocking RemoveBroadcast messages to the 
 executors, which in turn send out blocking UpdateBlockInfo messages back to 
 the driver. Both of these calls block until they receive the expected 
 responses. We suspect that the high frequency at which we send these blocking 
 messages is the cause of either dropped messages or internal deadlock 
 somewhere.
 Unfortunately, it is highly difficult to reproduce depending on the 
 environment. We have been able to reproduce it on a 6-node cluster in 
 us-west-2, but not in us-west-1, for instance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3018) Release all BlockFetcherIterator upon task completion/failure


 [ 
https://issues.apache.org/jira/browse/SPARK-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-3018:
---

Description: BlockFetcherIterator retains ReferenceCountedBuffers returned 
by client.fetchBlocks. Those buffers are released when the iterators are 
traversed fully. In the case of task failures or completion without exhausting 
the iterator, this could lead to memory leak.  (was: BlockFetcherIterator 
retains ReferenceCountedBuffers returned by client.fetchBlocks. Those buffers 
are released when the iterators are traversed fully. In the case of task 
failures or completion without depleting the iterator, this could lead to 
memory leak.)

 Release all BlockFetcherIterator upon task completion/failure
 -

 Key: SPARK-3018
 URL: https://issues.apache.org/jira/browse/SPARK-3018
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle, Spark Core
Reporter: Reynold Xin

 BlockFetcherIterator retains ReferenceCountedBuffers returned by 
 client.fetchBlocks. Those buffers are released when the iterators are 
 traversed fully. In the case of task failures or completion without 
 exhausting the iterator, this could lead to memory leak.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2907) Use mutable.HashMap to represent Model in Word2Vec


[ 
https://issues.apache.org/jira/browse/SPARK-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096301#comment-14096301
 ] 

Apache Spark commented on SPARK-2907:
-

User 'Ishiihara' has created a pull request for this issue:
https://github.com/apache/spark/pull/1932

 Use mutable.HashMap to represent Model in Word2Vec
 --

 Key: SPARK-2907
 URL: https://issues.apache.org/jira/browse/SPARK-2907
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.1.0
Reporter: Liquan Pei
Assignee: Liquan Pei

 Use mutable.HashMap to represent Word2Vec to reduce memory footprint and 
 shuffle size. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3020) Print completed indices rather than tasks in web UI

Patrick Wendell created SPARK-3020:
--

 Summary: Print completed indices rather than tasks in web UI
 Key: SPARK-3020
 URL: https://issues.apache.org/jira/browse/SPARK-3020
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker


When speculation is used, it's confusing to print the number of completed 
tasks, since it can exceed the number of total tasks. Instead we should just 
report the number of unique indices that are completed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2817) add show create table support


 [ 
https://issues.apache.org/jira/browse/SPARK-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2817.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

 add  show create table support
 

 Key: SPARK-2817
 URL: https://issues.apache.org/jira/browse/SPARK-2817
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.0
Reporter: Yi Tian
Priority: Minor
 Fix For: 1.1.0


 In spark sql component, the show create table syntax had been disabled.
 We thought it is a useful funciton to describe a hive table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3004) HiveThriftServer2 throws exception when the result set contains NULL


 [ 
https://issues.apache.org/jira/browse/SPARK-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3004.
-

   Resolution: Fixed
Fix Version/s: 1.1.0
 Assignee: Cheng Lian

 HiveThriftServer2 throws exception when the result set contains NULL
 

 Key: SPARK-3004
 URL: https://issues.apache.org/jira/browse/SPARK-3004
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Blocker
 Fix For: 1.1.0


 To reproduce this issue with beeline:
 {code}
 $ cd $SPARK_HOME
 $ ./bin/beeline -u jdbc:hive2://localhost:1 -n lian
 ...
 0: jdbc:hive2://localhost:1 create table src1 (key int, value string);
 ...
 0: jdbc:hive2://localhost:1 load data local inpath 
 './sql/hive/src/test/resources/data/files/kv3.txt' into table src1;
 ...
 0: jdbc:hive2://localhost:1 select * from src1 where key is null;
 Error:  (state=,code=0)
 {code}
 Exception thrown from HiveThriftServer2:
 {code}
 java.lang.RuntimeException: Failed to check null bit for primitive int value.
 at scala.sys.package$.error(package.scala:27)
 at 
 org.apache.spark.sql.catalyst.expressions.GenericRow.getInt(Row.scala:145)
 at 
 org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.getNextRowSet(SparkSQLOperationManager.scala:80)
 at 
 org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:170)
 at 
 org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:417)
 at 
 org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:306)
 at 
 org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:386)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1373)
 at 
 org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1358)
 at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
 at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
 at 
 org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
 at 
 org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)
 at 
 org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 The cause is that we didn't check {{isNullAt}} in 
 {{SparkSQLOperationManager.getNextRowSet}}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2963) The description about building to use HiveServer and CLI is incomplete


 [ 
https://issues.apache.org/jira/browse/SPARK-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2963.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

 The description about building to use HiveServer and CLI is incomplete
 --

 Key: SPARK-2963
 URL: https://issues.apache.org/jira/browse/SPARK-2963
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
Reporter: Kousuke Saruta
 Fix For: 1.1.0


 Currently, if we'd like to use HiveServer or CLI for SparkSQL, we need to use 
 -Phive-thriftserver option when building but it's description is incomplete.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3013) Doctest of inferSchema in Spark SQL Python API fails


 [ 
https://issues.apache.org/jira/browse/SPARK-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-3013.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

 Doctest of inferSchema in Spark SQL Python API fails
 

 Key: SPARK-3013
 URL: https://issues.apache.org/jira/browse/SPARK-3013
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: Cheng Lian
Assignee: Davies Liu
Priority: Blocker
 Fix For: 1.1.0


 Doctest of `inferSchema` in `sql.py` keeps failing and makes Jenkins crazy:
 {code}
 File /home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py, 
 line 1021, in pyspark.sql.SQLContext.inferSchema
 Failed example:
 srdd.collect()
 Exception raised:
 Traceback (most recent call last):
   File /usr/lib64/python2.6/doctest.py, line 1253, in __run
 compileflags, 1) in test.globs
   File doctest pyspark.sql.SQLContext.inferSchema[6], line 1, in 
 module
 srdd.collect()
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py, line 
 1613, in collect
 rows = RDD.collect(self)
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py, line 
 724, in collect
 bytesInJava = self._jrdd.collect().iterator()
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py,
  line 538, in __call__
 self.target_id, self.name)
   File 
 /home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py,
  line 300, in get_return_value
 format(target_id, '.', name), value)
 Py4JJavaError: An error occurred while calling o399.collect.
 : org.apache.spark.SparkException: Job aborted due to stage failure: Task 
 1 in stage 35.0 failed 1 times, most recent failure: Lost task 1.0 in stage 
 35.0 (TID 72, localhost): java.lang.ClassCastException: java.lang.String 
 cannot be cast to java.util.ArrayList
 
 net.razorvine.pickle.objects.ArrayConstructor.construct(ArrayConstructor.java:33)
 net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:617)
 net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:170)
 net.razorvine.pickle.Unpickler.load(Unpickler.java:84)
 net.razorvine.pickle.Unpickler.loads(Unpickler.java:97)
 
 org.apache.spark.api.python.PythonRDD$$anonfun$pythonToJavaArray$1$$anonfun$apply$4.apply(PythonRDD.scala:722)
 
 org.apache.spark.api.python.PythonRDD$$anonfun$pythonToJavaArray$1$$anonfun$apply$4.apply(PythonRDD.scala:721)
 scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:966)
 
 scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972)
 scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 scala.collection.Iterator$class.foreach(Iterator.scala:727)
 scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 
 scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
 
 scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
 
 scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
 scala.collection.AbstractIterator.to(Iterator.scala:1157)
 
 scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
 scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
 
 scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
 scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
 org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
 org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 
 org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executSLF4J: Failed to load class 
 org.slf4j.impl.StaticLoggerBinder.
 SLF4J: Defaulting to no-operation

[jira] [Created] (SPARK-3021) Job remains in Active Stages after failing

Michael Armbrust created SPARK-3021:
---

 Summary: Job remains in Active Stages after failing
 Key: SPARK-3021
 URL: https://issues.apache.org/jira/browse/SPARK-3021
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.1.0
Reporter: Michael Armbrust


It died with the following exception, but i still hanging out in the UI.

{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in 
stage 8.1 failed 4 times, most recent failure: Lost task 20.3 in stage 8.1 (TID 
710, ip-10-0-166-165.us-west-2.compute.internal): ExecutorLostFailure (executor 
lost)
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1153)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1142)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1141)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2994) Support for Hive UDFs that take arrays of structs as arguments


 [ 
https://issues.apache.org/jira/browse/SPARK-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2994.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

 Support for Hive UDFs that take arrays of structs as arguments
 --

 Key: SPARK-2994
 URL: https://issues.apache.org/jira/browse/SPARK-2994
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust
Priority: Critical
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2935) Failure with push down of conjunctive parquet predicates


 [ 
https://issues.apache.org/jira/browse/SPARK-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2935.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

 Failure with push down of conjunctive parquet predicates
 

 Key: SPARK-2935
 URL: https://issues.apache.org/jira/browse/SPARK-2935
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.0.2
Reporter: Michael Armbrust
Assignee: Michael Armbrust
Priority: Blocker
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-2970) spark-sql script ends with IOException when EventLogging is enabled


 [ 
https://issues.apache.org/jira/browse/SPARK-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-2970.
-

   Resolution: Fixed
Fix Version/s: 1.1.0

 spark-sql script ends with IOException when EventLogging is enabled
 ---

 Key: SPARK-2970
 URL: https://issues.apache.org/jira/browse/SPARK-2970
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.1.0
 Environment: CDH5.1.0 (Hadoop 2.3.0)
Reporter: Kousuke Saruta
 Fix For: 1.1.0


 When spark-sql script run with spark.eventLog.enabled set true, it ends with 
 IOException because FileLogger can not create APPLICATION_COMPLETE file in 
 HDFS.
 It's is because a shutdown hook of SparkSQLCLIDriver is executed after a 
 shutdown hook of org.apache.hadoop.fs.FileSystem is executed.
 When spark.eventLog.enabled is true, the hook of SparkSQLCLIDriver finally 
 try to create a file to mark the application finished but the hook of 
 FileSystem try to close FileSystem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3020) Print completed indices rather than tasks in web UI


 [ 
https://issues.apache.org/jira/browse/SPARK-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-3020.


   Resolution: Fixed
Fix Version/s: 1.1.0

 Print completed indices rather than tasks in web UI
 ---

 Key: SPARK-3020
 URL: https://issues.apache.org/jira/browse/SPARK-3020
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Reporter: Patrick Wendell
Assignee: Patrick Wendell
Priority: Blocker
 Fix For: 1.1.0


 When speculation is used, it's confusing to print the number of completed 
 tasks, since it can exceed the number of total tasks. Instead we should just 
 report the number of unique indices that are completed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-2625) Fix ShuffleReadMetrics for NettyBlockFetcherIterator


 [ 
https://issues.apache.org/jira/browse/SPARK-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin reassigned SPARK-2625:
--

Assignee: Reynold Xin

 Fix ShuffleReadMetrics for NettyBlockFetcherIterator
 

 Key: SPARK-2625
 URL: https://issues.apache.org/jira/browse/SPARK-2625
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 1.0.0
Reporter: Sandy Ryza
Assignee: Reynold Xin
Priority: Minor

 NettyBlockFetcherIterator doesn't report fetchWaitTime and has some race 
 conditions where multiple threads can be incrementing bytes read at the same 
 time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2625) Fix ShuffleReadMetrics for NettyBlockFetcherIterator


 [ 
https://issues.apache.org/jira/browse/SPARK-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-2625:
---

Component/s: Spark Core

 Fix ShuffleReadMetrics for NettyBlockFetcherIterator
 

 Key: SPARK-2625
 URL: https://issues.apache.org/jira/browse/SPARK-2625
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Affects Versions: 1.0.0
Reporter: Sandy Ryza
Assignee: Reynold Xin
Priority: Minor

 NettyBlockFetcherIterator doesn't report fetchWaitTime and has some race 
 conditions where multiple threads can be incrementing bytes read at the same 
 time.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3022) FindBinsForLevel in decision tree should call findBin only once for each feature

2014-08-13 Thread Qiping Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qiping Li updated SPARK-3022:
-

Description: `findbinsForLevel` is applied to every `LabeledPoint` to find 
bins for all nodes at a given level. Given a specific `LabeledPoint` and a 
specific feature, the bin to put this labeled point should always be same.But 
in current implementation, `findBin` on a (labeledpoint, feature) pair is 
called for every node at a given level, which is a waste of computation. I 
proposed to call `findBin` only once and if a `LabeledPoint` is valid on a 
node, this result can be reused.  (was: `findbinsForLevel` is applied to every 
`LabeledPoint` to find bins for all nodes at a given level. Given a specific 
`LabeledPoint` and a specific feature,
the bin to put this labeled point should always be same.But in current 
implementation, `findBin` on a (labeledpoint, feature) pair is called for every 
node at a given level, which is a waste of computation. I proposed to call 
`findBin` only once and if a `LabeledPoint` is valid on a node, this result can 
be reused.)

 FindBinsForLevel in decision tree should call findBin only once for each 
 feature
 

 Key: SPARK-3022
 URL: https://issues.apache.org/jira/browse/SPARK-3022
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.0.2
Reporter: Qiping Li
   Original Estimate: 4h
  Remaining Estimate: 4h

 `findbinsForLevel` is applied to every `LabeledPoint` to find bins for all 
 nodes at a given level. Given a specific `LabeledPoint` and a specific 
 feature, the bin to put this labeled point should always be same.But in 
 current implementation, `findBin` on a (labeledpoint, feature) pair is called 
 for every node at a given level, which is a waste of computation. I proposed 
 to call `findBin` only once and if a `LabeledPoint` is valid on a node, this 
 result can be reused.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3022) FindBinsForLevel in decision tree should call findBin only once for each feature

2014-08-13 Thread Qiping Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096526#comment-14096526
 ] 

Qiping Li commented on SPARK-3022:
--

What's more, there's no need to store `feature2bins` array(which bin to put 
labeled point for this feature) for every node, All nodes can reuse this every 
if labeledpoint is valid on this node.This `feature2bins` array can be 
precomputed before level-wise training.Each level can use this array.

 FindBinsForLevel in decision tree should call findBin only once for each 
 feature
 

 Key: SPARK-3022
 URL: https://issues.apache.org/jira/browse/SPARK-3022
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.0.2
Reporter: Qiping Li
   Original Estimate: 4h
  Remaining Estimate: 4h

 `findbinsForLevel` is applied to every `LabeledPoint` to find bins for all 
 nodes at a given level. Given a specific `LabeledPoint` and a specific 
 feature, the bin to put this labeled point should always be same.But in 
 current implementation, `findBin` on a (labeledpoint, feature) pair is called 
 for every node at a given level, which is a waste of computation. I proposed 
 to call `findBin` only once and if a `LabeledPoint` is valid on a node, this 
 result can be reused.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()

2014-08-13 Thread OuyangJin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

OuyangJin updated SPARK-3005:
-

Attachment: SPARK-3005_1.diff

a quick fix for fine grained killTask


 Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
 MesosSchedulerBackend.killTask()
 ---

 Key: SPARK-3005
 URL: https://issues.apache.org/jira/browse/SPARK-3005
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2
 Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
Reporter: Xu Zhongxing
 Attachments: SPARK-3005_1.diff


 I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
 cassandra cluster.
 During the job running, I killed the Cassandra daemon to simulate some 
 failure cases. This results in task failures.
 If I run the job in Mesos coarse-grained mode, the spark driver program 
 throws an exception and shutdown cleanly.
 But when I run the job in Mesos fine-grained mode, the spark driver program 
 hangs.
 The spark log is: 
  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
 Logging.scala (line 58) Cancelling stage 1
  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
 Logging.scala (line 79) Could not cancel tasks for stage 1
 java.lang.UnsupportedOperationException
   at 
 org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
   at 
 org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at

[jira] [Created] (SPARK-3024) CLI interface to Driver

Jeff Hammerbacher created SPARK-3024:


 Summary: CLI interface to Driver
 Key: SPARK-3024
 URL: https://issues.apache.org/jira/browse/SPARK-3024
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Jeff Hammerbacher






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3023) SIGINT to driver with yarn-client should release containers on the cluster

Jeff Hammerbacher created SPARK-3023:


 Summary: SIGINT to driver with yarn-client should release 
containers on the cluster
 Key: SPARK-3023
 URL: https://issues.apache.org/jira/browse/SPARK-3023
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.0.0
Reporter: Jeff Hammerbacher






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3023) SIGINT to driver with yarn-client should release containers on the cluster


 [ 
https://issues.apache.org/jira/browse/SPARK-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Hammerbacher updated SPARK-3023:
-

Issue Type: Improvement  (was: Bug)

 SIGINT to driver with yarn-client should release containers on the cluster
 --

 Key: SPARK-3023
 URL: https://issues.apache.org/jira/browse/SPARK-3023
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.0.0
Reporter: Jeff Hammerbacher





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3024) CLI interface to Driver


[ 
https://issues.apache.org/jira/browse/SPARK-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096541#comment-14096541
 ] 

Patrick Wendell commented on SPARK-3024:


Hey Jeff - mind giving a bit more color on what you mean here?

 CLI interface to Driver
 ---

 Key: SPARK-3024
 URL: https://issues.apache.org/jira/browse/SPARK-3024
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Jeff Hammerbacher





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()


[ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096539#comment-14096539
 ] 

Xu Zhongxing commented on SPARK-3005:
-

Could adding an empty killTask method to  MesosSchedulerBackend fix this 
problem?

override def killTask(taskId: Long, executorId: String, interruptThread: 
Boolean) {}




 Spark with Mesos fine-grained mode throws UnsupportedOperationException in 
 MesosSchedulerBackend.killTask()
 ---

 Key: SPARK-3005
 URL: https://issues.apache.org/jira/browse/SPARK-3005
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2
 Environment: Spark 1.0.2, Mesos 0.18.1, spark-cassandra-connector
Reporter: Xu Zhongxing
 Attachments: SPARK-3005_1.diff


 I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
 cassandra cluster.
 During the job running, I killed the Cassandra daemon to simulate some 
 failure cases. This results in task failures.
 If I run the job in Mesos coarse-grained mode, the spark driver program 
 throws an exception and shutdown cleanly.
 But when I run the job in Mesos fine-grained mode, the spark driver program 
 hangs.
 The spark log is: 
  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
 Logging.scala (line 58) Cancelling stage 1
  INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
 Logging.scala (line 79) Could not cancel tasks for stage 1
 java.lang.UnsupportedOperationException
   at 
 org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
   at 
 org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
   at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
   at 
 org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
   at 
 scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
   at 
 org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
   at 
 org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
   at scala.Option.foreach(Option.scala:236)
   at 
 org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
   at 
 org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at

[jira] [Commented] (SPARK-3024) CLI interface to Driver


[ 
https://issues.apache.org/jira/browse/SPARK-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096547#comment-14096547
 ] 

Jeff Hammerbacher commented on SPARK-3024:
--

It would be nice to be able to list the contents of the executors tab, for 
example, from the command line. After seeing 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala#L53,
 I thought I might be able to just set the Content-Type header and curl the 
contents down, but that doesn't seem to work.

I can, of course, parse the content out of the HTML for now. Moving forward, 
however, it would be nice to have a service interface that returned JSON, and 
perhaps even a bundled utility for manipulating the results.

 CLI interface to Driver
 ---

 Key: SPARK-3024
 URL: https://issues.apache.org/jira/browse/SPARK-3024
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Reporter: Jeff Hammerbacher





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3025) Allow JDBC clients to set a fair scheduler pool

Patrick Wendell created SPARK-3025:
--

 Summary: Allow JDBC clients to set a fair scheduler pool
 Key: SPARK-3025
 URL: https://issues.apache.org/jira/browse/SPARK-3025
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Patrick Wendell
Assignee: Patrick Wendell






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3026) Provide a good error message if JDBC server is used but Spark is not compiled with -Pthriftserver


 [ 
https://issues.apache.org/jira/browse/SPARK-3026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3026:
---

Priority: Critical  (was: Major)

 Provide a good error message if JDBC server is used but Spark is not compiled 
 with -Pthriftserver
 -

 Key: SPARK-3026
 URL: https://issues.apache.org/jira/browse/SPARK-3026
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Patrick Wendell
Assignee: Cheng Lian
Priority: Critical

 Instead of giving a ClassNotFoundException we should detect this case and 
 just tell the user to build with -Phiveserver.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-3026) Provide a good error message if JDBC server is used but Spark is not compiled with -Pthriftserver

Patrick Wendell created SPARK-3026:
--

 Summary: Provide a good error message if JDBC server is used but 
Spark is not compiled with -Pthriftserver
 Key: SPARK-3026
 URL: https://issues.apache.org/jira/browse/SPARK-3026
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Patrick Wendell
Assignee: Cheng Lian


Instead of giving a ClassNotFoundException we should detect this case and just 
tell the user to build with -Phiveserver.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3019) Pluggable block transfer (data plane communication) interface


 [ 
https://issues.apache.org/jira/browse/SPARK-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-3019:
---

Attachment: PluggableBlockTransferServiceProposalforSpark.pdf

Design Doc - draft 1

 Pluggable block transfer (data plane communication) interface
 -

 Key: SPARK-3019
 URL: https://issues.apache.org/jira/browse/SPARK-3019
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Reporter: Reynold Xin
Assignee: Reynold Xin
 Attachments: PluggableBlockTransferServiceProposalforSpark.pdf


 This is a ticket to track progress to have an internal interface for block 
 transfers (used in shuffles, broadcasts, as well as remote block reads for 
 tasks).
 More details coming later.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3019) Pluggable block transfer (data plane communication) interface


 [ 
https://issues.apache.org/jira/browse/SPARK-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-3019:
---

Description: 
The attached design doc proposes a standard interface for block transferring, 
which will make future engineering of this functionality easier, allowing the 
Spark community to provide alternative implementations.

Block transferring is a critical function in Spark. All of the following depend 
on it:
* shuffle
* torrent broadcast
* block replication in BlockManager
* remote block reads for tasks scheduled without locality


  was:
This is a ticket to track progress to have an internal interface for block 
transfers (used in shuffles, broadcasts, as well as remote block reads for 
tasks).

More details coming later.


 Pluggable block transfer (data plane communication) interface
 -

 Key: SPARK-3019
 URL: https://issues.apache.org/jira/browse/SPARK-3019
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Reporter: Reynold Xin
Assignee: Reynold Xin
 Attachments: PluggableBlockTransferServiceProposalforSpark.pdf


 The attached design doc proposes a standard interface for block transferring, 
 which will make future engineering of this functionality easier, allowing the 
 Spark community to provide alternative implementations.
 Block transferring is a critical function in Spark. All of the following 
 depend on it:
 * shuffle
 * torrent broadcast
 * block replication in BlockManager
 * remote block reads for tasks scheduled without locality



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3019) Pluggable block transfer (data plane communication) interface


 [ 
https://issues.apache.org/jira/browse/SPARK-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-3019:
---

Attachment: (was: PluggableBlockTransferServiceProposalforSpark.pdf)

 Pluggable block transfer (data plane communication) interface
 -

 Key: SPARK-3019
 URL: https://issues.apache.org/jira/browse/SPARK-3019
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle, Spark Core
Reporter: Reynold Xin
Assignee: Reynold Xin
 Attachments: PluggableBlockTransferServiceProposalforSpark - draft 
 1.pdf


 The attached design doc proposes a standard interface for block transferring, 
 which will make future engineering of this functionality easier, allowing the 
 Spark community to provide alternative implementations.
 Block transferring is a critical function in Spark. All of the following 
 depend on it:
 * shuffle
 * torrent broadcast
 * block replication in BlockManager
 * remote block reads for tasks scheduled without locality



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()


 [ 
https://issues.apache.org/jira/browse/SPARK-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-3005:
---

Description: 
I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
cassandra cluster.

During the job running, I killed the Cassandra daemon to simulate some failure 
cases. This results in task failures.

If I run the job in Mesos coarse-grained mode, the spark driver program throws 
an exception and shutdown cleanly.

But when I run the job in Mesos fine-grained mode, the spark driver program 
hangs.

The spark log is: 

{code}
 INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
Logging.scala (line 58) Cancelling stage 1
 INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,797 
Logging.scala (line 79) Could not cancel tasks for stage 1
java.lang.UnsupportedOperationException
at 
org.apache.spark.scheduler.SchedulerBackend$class.killTask(SchedulerBackend.scala:32)
at 
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerBackend.killTask(MesosSchedulerBackend.scala:41)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply$mcVJ$sp(TaskSchedulerImpl.scala:185)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3$$anonfun$apply$1.apply(TaskSchedulerImpl.scala:183)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:183)
at 
org.apache.spark.scheduler.TaskSchedulerImpl$$anonfun$cancelTasks$3.apply(TaskSchedulerImpl.scala:176)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.TaskSchedulerImpl.cancelTasks(TaskSchedulerImpl.scala:176)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply$mcVI$sp(DAGScheduler.scala:1075)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages$1.apply(DAGScheduler.scala:1061)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1061)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1033)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1031)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1031)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:635)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:635)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1234)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{code}

  was:
I am using Spark, Mesos, spark-cassandra-connector to do some work on a 
cassandra cluster.

During the job running, I killed the Cassandra daemon to simulate some failure 
cases. This results in task failures.

If I run the job in Mesos coarse-grained mode, the spark driver program throws 
an exception and shutdown cleanly.

But when I run the job in Mesos fine-grained mode, the spark driver program 
hangs.

The spark log is: 

 INFO [spark-akka.actor.default-dispatcher-4] 2014-08-13 15:58:15,794 
Logging.scala (line 58) Cancelling stage 1
 INFO [spark-akka.actor.default-dispatcher-4]

[jira] [Updated] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop


 [ 
https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-2356:
---

Description: 
I'm trying to run some transformation on Spark, it works fine on cluster (YARN, 
linux machines). However, when I'm trying to run it on local machine (Windows 
7) under unit test, I got errors (I don't use Hadoop, I'm read file from local 
filesystem):

{code}
14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
at org.apache.hadoop.security.Groups.init(Groups.java:77)
at 
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
at 
org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
at 
org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
at 
org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
at org.apache.spark.SparkContext.init(SparkContext.scala:228)
at org.apache.spark.SparkContext.init(SparkContext.scala:97)
{code}

It's happened because Hadoop config is initialized each time when spark context 
is created regardless is hadoop required or not.

I propose to add some special flag to indicate if hadoop config is required (or 
start this configuration manually)

  was:
I'm trying to run some transformation on Spark, it works fine on cluster (YARN, 
linux machines). However, when I'm trying to run it on local machine (Windows 
7) under unit test, I got errors (I don't use Hadoop, I'm read file from local 
filesystem):
14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
at org.apache.hadoop.security.Groups.init(Groups.java:77)
at 
org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
at 
org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
at 
org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
at 
org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
at 
org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
at org.apache.spark.SparkContext.init(SparkContext.scala:228)
at org.apache.spark.SparkContext.init(SparkContext.scala:97)

It's happend because Hadoop config is initialised each time when spark context 
is created regardless is hadoop required or not.

I propose to add some special flag to indicate if hadoop config is required (or 
start this configuration manually)


 Exception: Could not locate executable null\bin\winutils.exe in the Hadoop 
 ---

 Key: SPARK-2356
 URL: https://issues.apache.org/jira/browse/SPARK-2356
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Kostiantyn Kudriavtsev
Priority: Critical

 I'm trying to run some transformation on Spark, it works fine on cluster 
 (YARN, linux machines). However, when I'm trying to run it on local machine 
 (Windows 7) under unit test, I got errors (I don't use Hadoop, I'm read file 
 from local filesystem):
 {code}
 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library 
 for your platform... using builtin-java

[jira] [Commented] (SPARK-3025) Allow JDBC clients to set a fair scheduler pool


[ 
https://issues.apache.org/jira/browse/SPARK-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096580#comment-14096580
 ] 

Apache Spark commented on SPARK-3025:
-

User 'pwendell' has created a pull request for this issue:
https://github.com/apache/spark/pull/1937

 Allow JDBC clients to set a fair scheduler pool
 ---

 Key: SPARK-3025
 URL: https://issues.apache.org/jira/browse/SPARK-3025
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Patrick Wendell
Assignee: Patrick Wendell





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2356) Exception: Could not locate executable null\bin\winutils.exe in the Hadoop

2014-08-13 Thread Guoqiang Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14096601#comment-14096601
 ] 

Guoqiang Li commented on SPARK-2356:


This should be problems caused by not set HADOOP_HOME or hadoop.home.dir.

 Exception: Could not locate executable null\bin\winutils.exe in the Hadoop 
 ---

 Key: SPARK-2356
 URL: https://issues.apache.org/jira/browse/SPARK-2356
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Kostiantyn Kudriavtsev
Priority: Critical

 I'm trying to run some transformation on Spark, it works fine on cluster 
 (YARN, linux machines). However, when I'm trying to run it on local machine 
 (Windows 7) under unit test, I got errors (I don't use Hadoop, I'm read file 
 from local filesystem):
 {code}
 14/07/02 19:59:31 WARN NativeCodeLoader: Unable to load native-hadoop library 
 for your platform... using builtin-java classes where applicable
 14/07/02 19:59:31 ERROR Shell: Failed to locate the winutils binary in the 
 hadoop binary path
 java.io.IOException: Could not locate executable null\bin\winutils.exe in the 
 Hadoop binaries.
   at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
   at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
   at org.apache.hadoop.util.Shell.clinit(Shell.java:326)
   at org.apache.hadoop.util.StringUtils.clinit(StringUtils.java:76)
   at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
   at org.apache.hadoop.security.Groups.init(Groups.java:77)
   at 
 org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
   at 
 org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
   at 
 org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:283)
   at 
 org.apache.spark.deploy.SparkHadoopUtil.init(SparkHadoopUtil.scala:36)
   at 
 org.apache.spark.deploy.SparkHadoopUtil$.init(SparkHadoopUtil.scala:109)
   at 
 org.apache.spark.deploy.SparkHadoopUtil$.clinit(SparkHadoopUtil.scala)
   at org.apache.spark.SparkContext.init(SparkContext.scala:228)
   at org.apache.spark.SparkContext.init(SparkContext.scala:97)
 {code}
 It's happened because Hadoop config is initialized each time when spark 
 context is created regardless is hadoop required or not.
 I propose to add some special flag to indicate if hadoop config is required 
 (or start this configuration manually)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-2926) Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle

2014-08-13 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-2926:
---

Attachment: Spark Shuffle Test Report.pdf

 Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle
 --

 Key: SPARK-2926
 URL: https://issues.apache.org/jira/browse/SPARK-2926
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 1.1.0
Reporter: Saisai Shao
 Attachments: SortBasedShuffleRead.pdf, Spark Shuffle Test Report.pdf


 Currently Spark has already integrated sort-based shuffle write, which 
 greatly improve the IO performance and reduce the memory consumption when 
 reducer number is very large. But for the reducer side, it still adopts the 
 implementation of hash-based shuffle reader, which neglects the ordering 
 attributes of map output data in some situations.
 Here we propose a MR style sort-merge like shuffle reader for sort-based 
 shuffle to better improve the performance of sort-based shuffle.
 Working in progress code and performance test report will be posted later 
 when some unit test bugs are fixed.
 Any comments would be greatly appreciated. 
 Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-3005) Spark with Mesos fine-grained mode throws UnsupportedOperationException in MesosSchedulerBackend.killTask()