[ https://issues.apache.org/jira/browse/SPARK-3963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230878#comment-14230878 ]
Shivaram Venkataraman commented on SPARK-3963: ---------------------------------------------- [~pwendell] This looks pretty useful -- Was this postponed from 1.2 ? I have a use case that needs Hadoop file names and was wondering if there was a workaround before this is implemented. > Support getting task-scoped properties from TaskContext > ------------------------------------------------------- > > Key: SPARK-3963 > URL: https://issues.apache.org/jira/browse/SPARK-3963 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Reporter: Patrick Wendell > > This is a proposal for a minor feature. Given stabilization of the > TaskContext API, it would be nice to have a mechanism for Spark jobs to > access properties that are defined based on task-level scope by Spark RDD's. > I'd like to propose adding a simple properties hash map with some standard > spark properties that users can access. Later it would be nice to support > users setting these properties, but for now to keep it simple in 1.2. I'd > prefer users not be able to set them. > The main use case is providing the file name from Hadoop RDD's, a very common > request. But I'd imagine us using this for other things later on. We could > also use this to expose some of the taskMetrics, such as e.g. the input bytes. > {code} > val data = sc.textFile("s3n//..2014/*/*/*.json") > data.mapPartitions { > val tc = TaskContext.get > val filename = tc.getProperty(TaskContext.HADOOP_FILE_NAME) > val parts = fileName.split("/") > val (year, month, day) = (parts[3], parts[4], parts[5]) > ... > } > {code} > Internally we'd have a method called setProperty, but this wouldn't be > exposed initially. This is structured as a simple (String, String) hash map > for ease of porting to python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org