I think you're making it harder than you have to ... First you don't need to alias your file name so you don't need the # and the alias after it. So your lines: String path = "/user/Li/model/model.txt"; Path filePath = new Path(path); String uriWithLink = filePath.toUri().toString() + "#" + "model.txt"; System.out.println(uriWithLink); DistributedCache.addCacheFile(new URI(uriWithLink), conf); Become: DistributedCache.addCacheFile(new URI(path+"model.txt",conf));
Your code is that you're taking a string, making it a path, back to a string to a new URI. Then in your mapper... private Path[] localFiles = DistributedCache.getLocalCacheFiles(context.getConfiguration()); boolean exitProcess = false; int i=0; while (!exit){ fileName = localFiles[i].getName(); if (fileName.equalsIgnoreCase("model.txt")){ // Build your input file reader on localFiles[i].toString() exitProcess = true; } i++; } Note that this is SAMPLE code. I didn't trap the exit condition if the file isn't there and you go beyond the size of the array localFiles[]. Also I set exit to false because its easier to read this as "Do this loop until the condition exitProcess is true". When you build your file reader you need the full path, not just the file name. The path will vary when the job runs. HTH -Mike -----Original Message----- From: Weiwei Li [mailto:hadoop...@gmail.com] Sent: Wednesday, July 27, 2011 2:03 AM To: general@hadoop.apache.org Subject: About the DistributedCache Hi, I have met some problem about the DistributedCache. There is a document called 'model.txt', I want every mapper can read it because there are some public data in it. So, I use the DistributedCache. 1.In the main() DistributedCache.createSymlink(conf); String path = "/user/Li/model/model.txt"; Path filePath = new Path(path); String uriWithLink = filePath.toUri().toString() + "#" + "model.txt"; System.out.println(uriWithLink); DistributedCache.addCacheFile(new URI(uriWithLink), conf); 2.In the Mapper() protected void setup(Context context) throws IOException,InterruptedException { System.out.println("Now, use the distributed cache and syslink"); try { FileReader reader = new FileReader("model.txt"); BufferedReader br = new BufferedReader(reader); String s1 = null; while ((s1 = br.readLine()) != null) { System.out.println(s1); } br.close(); reader.close(); } catch (Exception e) { e.printStackTrace(); } } 3.When run it, in the Task logs. java.io.FileNotFoundException: model.txt (拒绝访问。) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.<init>(FileInputStream.java:120) at java.io.FileInputStream.<init>(FileInputStream.java:79) at java.io.FileReader.<init>(FileReader.java:41) at NB.NBClusterTrain.UseDistributedCacheBySymbolicLink(NBClusterTrain.java:24) at NB.NBClusterTrain$NBClusterTrainMapper.setup(NBClusterTrain.java:45) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) 4.When I use /bin/hadoop fs -cat /user/Li/model/model.txt, This can be read. What do you think can I do? Thank you! The information contained in this communication may be CONFIDENTIAL and is intended only for the use of the recipient(s) named above. If you are not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited. If you have received this communication in error, please notify the sender and delete/destroy the original message and any copy of it from your computer or paper files.