You can create a new Configuration object in something like a mapPartitions method and use that. It will pick up local Hadoop configuration from the node, but presumably the Spark workers and HDFS data nodes are colocated in this case, so the machines have the correct Hadoop config locally.
On Tue, Oct 7, 2014 at 7:01 PM, Steve Lewis <lordjoe2...@gmail.com> wrote: > I am porting a Hadoop job to Spark - One issue is that the workers need to > read files from hdfs reading a different file based on the key or in some > cases reading an object that is expensive to serialize. > This is easy if the worker has access to the JavaSparkContext (I am working > in Java) but this cannot be serialized - > how can a worker read from a Path - assume hdfs --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org