RE: About the DistributedCache

Segel, Mike Wed, 27 Jul 2011 08:13:27 -0700

I think you're making it harder than you have to ...
First you don't need to alias your file name so you don't need the # and the 
alias after it.
So your lines:
    String path = "/user/Li/model/model.txt"; Path filePath = new Path(path); 
String uriWithLink = filePath.toUri().toString() + "#" + "model.txt";      
    System.out.println(uriWithLink); DistributedCache.addCacheFile(new 
URI(uriWithLink), conf);
Become:
    DistributedCache.addCacheFile(new URI(path+"model.txt",conf));


Your code is that you're taking a string, making it a path, back to a string to 
a new URI.


Then in your mapper...
       private Path[] localFiles = 
DistributedCache.getLocalCacheFiles(context.getConfiguration());
        boolean exitProcess = false;
       int i=0;
        while (!exit){ 
            fileName = localFiles[i].getName();
           if (fileName.equalsIgnoreCase("model.txt")){
                 // Build your input file reader on localFiles[i].toString() 
                 exitProcess = true;
           }
            i++;
        } 


Note that this is SAMPLE code. I didn't trap the exit condition if the file 
isn't there and you go beyond the size of the array localFiles[].
Also I set exit to false because its easier to read this as "Do this loop until 
the condition exitProcess is true".

When you build your file reader you need the full path, not just the file name. 
The path will vary when the job runs.

HTH

-Mike


-----Original Message-----
From: Weiwei Li [mailto:hadoop...@gmail.com] 
Sent: Wednesday, July 27, 2011 2:03 AM
To: general@hadoop.apache.org
Subject: About the DistributedCache

Hi,
I have met some problem about the DistributedCache.

There is a document called 'model.txt',  I want every mapper can read it 
because there are some public data in it.
So, I use the DistributedCache.

1.In the main()
DistributedCache.createSymlink(conf);
String path = "/user/Li/model/model.txt"; Path filePath = new Path(path); 
String uriWithLink = filePath.toUri().toString() + "#" + "model.txt"; 
System.out.println(uriWithLink); DistributedCache.addCacheFile(new 
URI(uriWithLink), conf);

2.In the Mapper()
protected void setup(Context context) throws IOException,InterruptedException {
  System.out.println("Now, use the distributed cache and syslink"); try {


                FileReader reader = new FileReader("model.txt"); BufferedReader 
br = new BufferedReader(reader); String s1 = null; while ((s1 = br.readLine()) 
!= null) { System.out.println(s1); } br.close(); reader.close();



} catch (Exception e) {
e.printStackTrace();
}
}

3.When run it, in the Task logs.
java.io.FileNotFoundException: model.txt (拒绝访问。)
at java.io.FileInputStream.open(Native Method) at 
java.io.FileInputStream.<init>(FileInputStream.java:120)
at java.io.FileInputStream.<init>(FileInputStream.java:79)
at java.io.FileReader.<init>(FileReader.java:41)
at
NB.NBClusterTrain.UseDistributedCacheBySymbolicLink(NBClusterTrain.java:24)
at NB.NBClusterTrain$NBClusterTrainMapper.setup(NBClusterTrain.java:45)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

4.When I use /bin/hadoop fs -cat /user/Li/model/model.txt, This can be read.

What do you think can I do?
Thank you!


The information contained in this communication may be CONFIDENTIAL and is 
intended only for the use of the recipient(s) named above.  If you are not the 
intended recipient, you are hereby notified that any dissemination, 
distribution, or copying of this communication, or any of its contents, is 
strictly prohibited.  If you have received this communication in error, please 
notify the sender and delete/destroy the original message and any copy of it 
from your computer or paper files.

RE: About the DistributedCache

Reply via email to