You can create a new Configuration object in something like a
mapPartitions method and use that. It will pick up local Hadoop
configuration from the node, but presumably the Spark workers and HDFS
data nodes are colocated in this case, so the machines have the
correct Hadoop config locally.

On Tue, Oct 7, 2014 at 7:01 PM, Steve Lewis <lordjoe2...@gmail.com> wrote:
>  I am porting a Hadoop job to Spark - One issue is that the workers need to
> read files from hdfs reading a different file based on the key or in some
> cases reading an object that is expensive to serialize.
> This is easy if the worker has  access to the JavaSparkContext (I am working
> in Java) but this cannot be serialized -
> how can a worker read from a Path - assume hdfs

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to