[jira] [Commented] (SPARK-12883) 1.6 Dynamic allocation document for removing executors with cached data differs in different sections
[ https://issues.apache.org/jira/browse/SPARK-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107057#comment-15107057 ] Manoj Samel commented on SPARK-12883: - Updated Jira subject for more accurate reflection of the issue > 1.6 Dynamic allocation document for removing executors with cached data > differs in different sections > - > > Key: SPARK-12883 > URL: https://issues.apache.org/jira/browse/SPARK-12883 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Manoj Samel >Priority: Trivial > > Spark 1.6 dynamic allocation documentation still refers to 1.2. > See text "There is currently not yet a solution for this in Spark 1.2. In > future releases, the cached data may be preserved through an off-heap storage > similar in spirit to how shuffle files are preserved through the external > shuffle service" > It appears 1.6 has parameter to address cache executor > spark.dynamicAllocation.cachedExecutorIdleTimeout with default value as > infinity. > Pl update 1.6 documentation to refer to latest release and features -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12883) 1.6 Dynamic allocation document for removing executors with cached data differs in different sections
[ https://issues.apache.org/jira/browse/SPARK-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manoj Samel updated SPARK-12883: Summary: 1.6 Dynamic allocation document for removing executors with cached data differs in different sections (was: 1.6 Dynamic allocation doc still refers to 1.2) > 1.6 Dynamic allocation document for removing executors with cached data > differs in different sections > - > > Key: SPARK-12883 > URL: https://issues.apache.org/jira/browse/SPARK-12883 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Manoj Samel >Priority: Trivial > > Spark 1.6 dynamic allocation documentation still refers to 1.2. > See text "There is currently not yet a solution for this in Spark 1.2. In > future releases, the cached data may be preserved through an off-heap storage > similar in spirit to how shuffle files are preserved through the external > shuffle service" > It appears 1.6 has parameter to address cache executor > spark.dynamicAllocation.cachedExecutorIdleTimeout with default value as > infinity. > Pl update 1.6 documentation to refer to latest release and features -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12883) 1.6 Dynamic allocation doc still refers to 1.2
[ https://issues.apache.org/jira/browse/SPARK-12883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107042#comment-15107042 ] Manoj Samel commented on SPARK-12883: - Information about removing executors with cached data appears inconsistent - hence the jira Reading http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation - "When an executor is removed, however, all cached data will no longer be accessible. There is currently not yet a solution for this in Spark 1.2. In future releases, the cached data may be preserved through an off-heap storage similar in spirit to how shuffle files are preserved through the external shuffle service." However, looking @ http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation - "spark.dynamicAllocation.cachedExecutorIdleTimeout infinityIf dynamic allocation is enabled and an executor which has cached data blocks has been idle for more than this duration, the executor will be removed." So if you only read the first link, it seems 1.6 does remove executor with cache data. However, reading the 2nd link about configuration gives a idea that such a executor will never be removed by default. May be I am missing something ? > 1.6 Dynamic allocation doc still refers to 1.2 > -- > > Key: SPARK-12883 > URL: https://issues.apache.org/jira/browse/SPARK-12883 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 1.6.0 >Reporter: Manoj Samel >Priority: Trivial > > Spark 1.6 dynamic allocation documentation still refers to 1.2. > See text "There is currently not yet a solution for this in Spark 1.2. In > future releases, the cached data may be preserved through an off-heap storage > similar in spirit to how shuffle files are preserved through the external > shuffle service" > It appears 1.6 has parameter to address cache executor > spark.dynamicAllocation.cachedExecutorIdleTimeout with default value as > infinity. > Pl update 1.6 documentation to refer to latest release and features -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12883) 1.6 Dynamic allocation doc still refers to 1.2
Manoj Samel created SPARK-12883: --- Summary: 1.6 Dynamic allocation doc still refers to 1.2 Key: SPARK-12883 URL: https://issues.apache.org/jira/browse/SPARK-12883 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 1.6.0 Reporter: Manoj Samel Priority: Minor Spark 1.6 dynamic allocation documentation still refers to 1.2. See text "There is currently not yet a solution for this in Spark 1.2. In future releases, the cached data may be preserved through an off-heap storage similar in spirit to how shuffle files are preserved through the external shuffle service" It appears 1.6 has parameter to address cache executor spark.dynamicAllocation.cachedExecutorIdleTimeout with default value as infinity. Pl update 1.6 documentation to refer to latest release and features -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-6653) New configuration property to specify port for sparkYarnAM actor system
Manoj Samel created SPARK-6653: -- Summary: New configuration property to specify port for sparkYarnAM actor system Key: SPARK-6653 URL: https://issues.apache.org/jira/browse/SPARK-6653 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 1.3.0 Environment: Spark On Yarn Reporter: Manoj Samel In 1.3.0 code line sparkYarnAM actor system is started on random port. See org.apache.spark.deploy.yarn ApplicationMaster.scala:282 actorSystem = AkkaUtils.createActorSystem("sparkYarnAM", Utils.localHostName, 0, conf = sparkConf, securityManager = securityMgr)._1 This may be issue when ports between Spark client and the Yarn cluster are limited by firewall and not all ports are open between client and Yarn cluster. Proposal is to introduce new property spark.am.actor.port and change code to val port = sparkConf.getInt("spark.am.actor.port", 0) actorSystem = AkkaUtils.createActorSystem("sparkYarnAM", Utils.localHostName, port, conf = sparkConf, securityManager = securityMgr)._1 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5903) Add support for Decimal type in SQL Table caching
Manoj Samel created SPARK-5903: -- Summary: Add support for Decimal type in SQL Table caching Key: SPARK-5903 URL: https://issues.apache.org/jira/browse/SPARK-5903 Project: Spark Issue Type: Improvement Components: SQL Reporter: Manoj Samel Spark SQL table caching does not have support for Decimal type (as of 1.2). This JIRA proposes to add support for Decimal type caching in org.apache.spark.sql.columnar package and any related changes. This thread originally started on spark user group. Michael Armbrust & Cheng Lian have provided initial guidance. Moving to JIRA for further work. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2243) Support multiple SparkContexts in the same JVM
[ https://issues.apache.org/jira/browse/SPARK-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284335#comment-14284335 ] Manoj Samel commented on SPARK-2243: Is there a target release for this ? > Support multiple SparkContexts in the same JVM > -- > > Key: SPARK-2243 > URL: https://issues.apache.org/jira/browse/SPARK-2243 > Project: Spark > Issue Type: New Feature > Components: Block Manager, Spark Core >Affects Versions: 0.7.0, 1.0.0, 1.1.0 >Reporter: Miguel Angel Fernandez Diaz > > We're developing a platform where we create several Spark contexts for > carrying out different calculations. Is there any restriction when using > several Spark contexts? We have two contexts, one for Spark calculations and > another one for Spark Streaming jobs. The next error arises when we first > execute a Spark calculation and, once the execution is finished, a Spark > Streaming job is launched: > {code} > 14/06/23 16:40:08 ERROR executor.Executor: Exception in task ID 0 > java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) > at > org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) > at > org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) > at > org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63) > at > org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139) > at > java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40) > at > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0) > 14/06/23 16:40:08 WARN scheduler.TaskSetManager: Loss was due to > java.io.FileNotFoundException > java.io.FileNotFoundException: http://172.19.0.215:47530/broadcast_0 > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1624) > at > org.apache.spark.broadcast.HttpBroadcast$.read(HttpBroadcast.scala:156) > at > org.apache.spark.broadcast.HttpBroadcast.readObject(HttpBroadcast.scala:56) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) > at > java.io.ObjectInputStream.readOrd
[jira] [Commented] (SPARK-2926) Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284300#comment-14284300 ] Manoj Samel commented on SPARK-2926: Which release will have this change available ? > Add MR-style (merge-sort) SortShuffleReader for sort-based shuffle > -- > > Key: SPARK-2926 > URL: https://issues.apache.org/jira/browse/SPARK-2926 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 1.1.0 >Reporter: Saisai Shao >Assignee: Saisai Shao > Attachments: SortBasedShuffleRead.pdf, Spark Shuffle Test > Report(contd).pdf, Spark Shuffle Test Report.pdf > > > Currently Spark has already integrated sort-based shuffle write, which > greatly improve the IO performance and reduce the memory consumption when > reducer number is very large. But for the reducer side, it still adopts the > implementation of hash-based shuffle reader, which neglects the ordering > attributes of map output data in some situations. > Here we propose a MR style sort-merge like shuffle reader for sort-based > shuffle to better improve the performance of sort-based shuffle. > Working in progress code and performance test report will be posted later > when some unit test bugs are fixed. > Any comments would be greatly appreciated. > Thanks a lot. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-5290) Executing functions in sparkSQL registered in sqlcontext gives scala.reflect.internal.MissingRequirementError: class org.apache.spark.sql.catalyst.ScalaReflection
Manoj Samel created SPARK-5290: -- Summary: Executing functions in sparkSQL registered in sqlcontext gives scala.reflect.internal.MissingRequirementError: class org.apache.spark.sql.catalyst.ScalaReflection Key: SPARK-5290 URL: https://issues.apache.org/jira/browse/SPARK-5290 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.2.0 Environment: Spark 1.2 on centos or Mac Reporter: Manoj Samel Register a function using sqlContext.registerFunction and then use that function in sparkSQL. The execution gives following stack trace in Spark 1.2 - this works in Spark 1.1.1 at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) at scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72) at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119) at scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21) at org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115) at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231) at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231) at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335) at scala.reflect.api.Universe.typeOf(Universe.scala:59) at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33) at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33) at org.apache.spark.sql.UDFRegistration$class.builder$2(UdfRegistration.scala:91) at org.apache.spark.sql.UDFRegistration$$anonfun$registerFunction$1.apply(UdfRegistration.scala:92) at org.apache.spark.sql.UDFRegistration$$anonfun$registerFunction$1.apply(UdfRegistration.scala:92) at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:53) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$2.applyOrElse(Analyzer.scala:220) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$10$$anonfun$applyOrElse$2.applyOrElse(Analyzer.scala:218) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:162) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildrenDown(TreeNode.scala:191) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:147) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionDown$1(QueryPlan.scala:71) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1$$anonfun$apply$1.apply(QueryPlan.scala:85) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$1.apply(QueryPlan.scala:84) at scala.