[ https://issues.apache.org/jira/browse/SPARK-17334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647392#comment-15647392 ]
Thomas Sebastian commented on SPARK-17334: ------------------------------------------ This looks to be a good feature. I would take a look into this and come back. > Provide management tools for broadcasted variables > -------------------------------------------------- > > Key: SPARK-17334 > URL: https://issues.apache.org/jira/browse/SPARK-17334 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Reporter: Assaf Mendelson > Priority: Minor > > I propose to provide some management tools to manage broadcasted variables. > The main issue today is that broadcast must contain a reference which should > be saved and used and we need to know if we already unpersisted it and we do > not know where it takes memory and how much. > Consider the following: > Today we can create a broadcast variable, use it and destroy it later by > saving the reference. > Consider the example from the documentation > >>> from pyspark.context import SparkContext > >>> sc = SparkContext('local', 'test') > >>> b = sc.broadcast([1, 2, 3, 4, 5]) > >>> b.value > [1, 2, 3, 4, 5] > >>> sc.parallelize([0, 0]).flatMap(lambda x: b.value).collect() > [1, 2, 3, 4, 5, 1, 2, 3, 4, 5] > >>> b.unpersist() > The problem is that b needs to be saved and passed along. > Instead I would like to see something like: > >>> sc.broadcast("b",[1, 2, 3, 4, 5]) > >>> sc.getBroadcasted() > ["a", "b", "c"] > >>> sc.getBroadcastInfo("b") > {"mem[bytes]":10, "type": List, "materializedExecutors" : [1,2,3,6,7]} > >>> b = sc.getBroadcastRef("b") > >>> print b.value > [1, 2, 3, 4, 5] > >>> sc.unpersist("b") > maybe also add some per executor map to see what each executor contains. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org