[jira] [Commented] (SPARK-12883) 1.6 Dynamic allocation document for removing executors with cached data differs in different sections

2016-01-19 Thread Manoj Samel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107057#comment-15107057
 ] 

Manoj Samel commented on SPARK-12883:
-

Updated Jira subject for more accurate reflection of the issue

> 1.6 Dynamic allocation document for removing executors with cached data 
> differs in different sections
> -
>
> Key: SPARK-12883
> URL: https://issues.apache.org/jira/browse/SPARK-12883
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Manoj Samel
>Priority: Trivial
>
> Spark 1.6 dynamic allocation documentation still refers to 1.2. 
> See text "There is currently not yet a solution for this in Spark 1.2. In 
> future releases, the cached data may be preserved through an off-heap storage 
> similar in spirit to how shuffle files are preserved through the external 
> shuffle service"
> It appears 1.6 has parameter to address cache executor 
> spark.dynamicAllocation.cachedExecutorIdleTimeout with default value as 
> infinity.
> Pl update 1.6 documentation to refer to latest release and features



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12883) 1.6 Dynamic allocation document for removing executors with cached data differs in different sections

2016-01-19 Thread Manoj Samel (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Samel updated SPARK-12883:

Summary: 1.6 Dynamic allocation document for removing executors with cached 
data differs in different sections  (was: 1.6 Dynamic allocation doc still 
refers to 1.2)

> 1.6 Dynamic allocation document for removing executors with cached data 
> differs in different sections
> -
>
> Key: SPARK-12883
> URL: https://issues.apache.org/jira/browse/SPARK-12883
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Manoj Samel
>Priority: Trivial
>
> Spark 1.6 dynamic allocation documentation still refers to 1.2. 
> See text "There is currently not yet a solution for this in Spark 1.2. In 
> future releases, the cached data may be preserved through an off-heap storage 
> similar in spirit to how shuffle files are preserved through the external 
> shuffle service"
> It appears 1.6 has parameter to address cache executor 
> spark.dynamicAllocation.cachedExecutorIdleTimeout with default value as 
> infinity.
> Pl update 1.6 documentation to refer to latest release and features



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12883) 1.6 Dynamic allocation doc still refers to 1.2

2016-01-19 Thread Manoj Samel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107042#comment-15107042
 ] 

Manoj Samel commented on SPARK-12883:
-

Information about removing executors with cached data appears inconsistent - 
hence the jira

Reading 
http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
 - "When an executor is removed, however, all cached data will no longer be 
accessible. There is currently not yet a solution for this in Spark 1.2. In 
future releases, the cached data may be preserved through an off-heap storage 
similar in spirit to how shuffle files are preserved through the external 
shuffle service." 

However, looking @ 
http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation - 
"spark.dynamicAllocation.cachedExecutorIdleTimeout   infinityIf 
dynamic allocation is enabled and an executor which has cached data blocks has 
been idle for more than this duration, the executor will be removed."

So if you only read the first link, it seems 1.6 does remove executor with 
cache data. However, reading the 2nd link about configuration gives a idea that 
such a executor will never be removed by default. May be I am missing something 
?


> 1.6 Dynamic allocation doc still refers to 1.2
> --
>
> Key: SPARK-12883
> URL: https://issues.apache.org/jira/browse/SPARK-12883
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 1.6.0
>Reporter: Manoj Samel
>Priority: Trivial
>
> Spark 1.6 dynamic allocation documentation still refers to 1.2. 
> See text "There is currently not yet a solution for this in Spark 1.2. In 
> future releases, the cached data may be preserved through an off-heap storage 
> similar in spirit to how shuffle files are preserved through the external 
> shuffle service"
> It appears 1.6 has parameter to address cache executor 
> spark.dynamicAllocation.cachedExecutorIdleTimeout with default value as 
> infinity.
> Pl update 1.6 documentation to refer to latest release and features



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12883) 1.6 Dynamic allocation doc still refers to 1.2

2016-01-18 Thread Manoj Samel (JIRA)
Manoj Samel created SPARK-12883:
---

 Summary: 1.6 Dynamic allocation doc still refers to 1.2
 Key: SPARK-12883
 URL: https://issues.apache.org/jira/browse/SPARK-12883
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.6.0
Reporter: Manoj Samel
Priority: Minor


Spark 1.6 dynamic allocation documentation still refers to 1.2. 

See text "There is currently not yet a solution for this in Spark 1.2. In 
future releases, the cached data may be preserved through an off-heap storage 
similar in spirit to how shuffle files are preserved through the external 
shuffle service"

It appears 1.6 has parameter to address cache executor 
spark.dynamicAllocation.cachedExecutorIdleTimeout with default value as 
infinity.

Pl update 1.6 documentation to refer to latest release and features



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6653) New configuration property to specify port for sparkYarnAM actor system

2015-04-01 Thread Manoj Samel (JIRA)
Manoj Samel created SPARK-6653:
--

 Summary: New configuration property to specify port for 
sparkYarnAM actor system
 Key: SPARK-6653
 URL: https://issues.apache.org/jira/browse/SPARK-6653
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.3.0
 Environment: Spark On Yarn
Reporter: Manoj Samel


In 1.3.0 code line sparkYarnAM actor system is started on random port. See 
org.apache.spark.deploy.yarn ApplicationMaster.scala:282

actorSystem = AkkaUtils.createActorSystem("sparkYarnAM", Utils.localHostName, 
0, conf = sparkConf, securityManager = securityMgr)._1

This may be issue when ports between Spark client and the Yarn cluster are 
limited by firewall and not all ports are open between client and Yarn cluster.

Proposal is to introduce new property spark.am.actor.port and change code to

val port = sparkConf.getInt("spark.am.actor.port", 0)
actorSystem = AkkaUtils.createActorSystem("sparkYarnAM", 
Utils.localHostName, port,
  conf = sparkConf, securityManager = securityMgr)._1








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5903) Add support for Decimal type in SQL Table caching

2015-02-18 Thread Manoj Samel (JIRA)
Manoj Samel created SPARK-5903:
--

 Summary: Add support for Decimal type in SQL Table caching
 Key: SPARK-5903
 URL: https://issues.apache.org/jira/browse/SPARK-5903
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Manoj Samel


Spark SQL table caching does not have support for Decimal type (as of 1.2). 
This JIRA proposes to add support for Decimal type caching in 
org.apache.spark.sql.columnar package and any related changes.

This thread originally started on spark user group. Michael Armbrust & 
Cheng Lian have provided initial guidance. Moving to JIRA for further work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2243) Support multiple SparkContexts in the same JVM

2015-01-20 Thread Manoj Samel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284335#comment-14284335
 ] 

Manoj Samel commented on SPARK-2243:


Is there a target release for this ?

> Support multiple SparkContexts in the same JVM
> --
>
> Key: SPARK-2243
> URL: https://issues.apache.org/jira/browse/SPARK-2243
> Project: Spark
>  Issue Type: New Feature
>  Components: Block Manager, Spark Core
>Affects Versions: 0.7.0, 1.0.0, 1.1.0
>Reporter: Miguel Angel Fernandez Diaz
>
> We're developing a platform where we create several Spark contexts for 
> carrying out different calculations. Is there any restriction when using 
> several Spark contexts? We have two contexts, one for Spark calculations and 
> another one for Spark Streaming jobs. The next error arises when we first 
> execute a Spark calculation and, once the execution is finished, a Spark 
> Streaming job is launched:
> {code}
> 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0
> java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
>   at 
> org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156)
>   at 
> org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>   at 
> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
>   at 
> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
>   at 
> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
> 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Loss was due to 
> java.io.FileNotFoundException
> java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624)
>   at 
> org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156)
>   at 
> org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrd

[jira] [Commented] (SPARK-2926) Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle

2015-01-20 Thread Manoj Samel (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284300#comment-14284300
 ] 

Manoj Samel commented on SPARK-2926:


Which release will have this change available ?

> Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle
> --
>
> Key: SPARK-2926
> URL: https://issues.apache.org/jira/browse/SPARK-2926
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 1.1.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
> Attachments: SortBasedShuffleRead.pdf, Spark Shuffle Test 
> Report(contd).pdf, Spark Shuffle Test Report.pdf
>
>
> Currently Spark has already integrated sort-based shuffle write, which 
> greatly improve the IO performance and reduce the memory consumption when 
> reducer number is very large. But for the reducer side, it still adopts the 
> implementation of hash-based shuffle reader, which neglects the ordering 
> attributes of map output data in some situations.
> Here we propose a MR style sort-merge like shuffle reader for sort-based 
> shuffle to better improve the performance of sort-based shuffle.
> Working in progress code and performance test report will be posted later 
> when some unit test bugs are fixed.
> Any comments would be greatly appreciated. 
> Thanks a lot.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5290) Executing functions in sparkSQL registered in sqlcontext gives scala.reflect.internal.MissingRequirementError: class org.apache.spark.sql.catalyst.ScalaReflection

2015-01-16 Thread Manoj Samel (JIRA)
Manoj Samel created SPARK-5290:
--

 Summary: Executing functions in sparkSQL registered in sqlcontext 
gives scala.reflect.internal.MissingRequirementError: class 
org.apache.spark.sql.catalyst.ScalaReflection
 Key: SPARK-5290
 URL: https://issues.apache.org/jira/browse/SPARK-5290
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
 Environment: Spark 1.2 on centos or Mac
Reporter: Manoj Samel


Register a function using sqlContext.registerFunction and then use that 
function in sparkSQL. The execution gives following stack trace in Spark 1.2 - 
this works in Spark 1.1.1

at 
scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
at 
scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
at 
scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
at 
scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
at 
scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
at 
org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115)
at 
scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335)
at scala.reflect.api.Universe.typeOf(Universe.scala:59)
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
at 
org.apache.spark.sql.UDFRegistration$class.builder$2(UdfRegistration.scala:91)
at 
org.apache.spark.sql.UDFRegistration$$anonfun$registerFunction$1.apply(UdfRegistration.scala:92)
at 
org.apache.spark.sql.UDFRegistration$$anonfun$registerFunction$1.apply(UdfRegistration.scala:92)
at 
org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:53)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$2.applyOrElse(Analyzer.scala:220)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$2.applyOrElse(Analyzer.scala:218)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionDown$1(QueryPlan.scala:71)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1$$anonfun$apply$1.apply(QueryPlan.scala:85)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at 
org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:84)
at scala.